the llm in my shell, and where it drew blood

A stylised robot head representing an AI model

I put a language model in my shell a few months ago, the kind that turns "find every file over a gigabyte modified this week" into an actual command, and most days it's a small quiet win. But this is not a love letter. It's a list of the times it bit me, because those are the part worth writing down, and the part the demos never show.

The setup is unremarkable. A little wrapper that takes a plain-English request, sends it off with some context about my system, and prints back a command for me to run. Bound to a keystroke. The pitch is obvious: I stop context-switching to look up the exact flags for find or tar or awk for the four-hundredth time, and I get on with the actual work.

The reality is more interesting, and a bit sharper-edged.

the good part, briefly

For commands I half-remember, it's excellent. The shape of a find invocation with a -mtime and a -size and an -exec, the precise incantation to extract one file from a tarball, the rsync flags I always have to look up. It gets these right often enough that the keystroke has become muscle memory. When it works, the friction of "I know what I want, I don't remember the syntax" just evaporates.

It's also good at the reverse: paste an opaque command from a Stack Overflow answer and ask what it does. That's lower-stakes and the model is genuinely strong at it, because reading is easier than writing and the consequences of a wrong explanation are just confusion, not data loss.

A close-up of a circuit board

the first time it bit me

I asked for a command to clean up some build artefacts in a project directory. It gave me back something with an rm -rf and a path that, on a tired evening, looked plausible. It wasn't. The path it had constructed would have walked one level too high, into a directory I very much wanted to keep. I caught it because I have a hard rule of reading every rm before I run it, and the rule held. But it was close, and the model was utterly confident. There was no hedge, no "you might want to check this". Just a clean, wrong, dangerous command presented exactly as cleanly as a correct one.

That's the thing that no amount of accuracy fixes. The model's confidence is uncorrelated with its correctness, and a destructive command delivered with total assurance is worse than no answer, because it lowers your guard.

the second time was subtler

The second bite drew less blood but taught me more. I asked it to construct a find command to delete old log files, and it produced something correct in form but wrong in scope: the glob it chose matched more than I'd described, including files I'd have wanted to keep. Nothing about the command looked wrong. It would have run cleanly, exited zero, and quietly deleted the wrong set of files. No error, no signal, just the slow realisation days later that something was missing.

That's the failure mode I now fear more than the loud one. The loud rm -rf you catch because it looks dangerous. The quietly-too-broad glob you don't, because it looks fine, runs fine, and is wrong only in a way you discover later.

A close-up of a circuit board

the rules i run it under now

I didn't remove it. I changed how I use it, which is probably the honest outcome for most of these tools.

Nothing it suggests runs unread. The model proposes, I dispose. It prints to the prompt, it never executes. That one design decision is the whole safety story.
Anything destructive gets a dry run first where the command supports one, and a careful read where it doesn't. rm, find -delete, mv over existing files, anything with a wildcard: read every character.
I treat its confidence as noise. The tone tells me nothing. A wrong command and a right one arrive identically.
For the genuinely irreversible, I just don't ask it. Some things I'd rather look up properly and own the mistake myself.

what it's actually taught me

The useful framing, after a few months, is that the model is a very fast junior who is never tired, never bored, and never says "I'm not sure". The first three are gifts. The fourth is the hazard. A real junior would hesitate before an rm -rf near your home directory. The model won't, because it has no skin and no fear, and the absence of hesitation is exactly the signal you've learned to rely on from a human.

So I keep it. It saves me real time on the boring middle of the difficulty curve, the commands I know but don't memorise. I just never let it close the loop. The keystroke gives me a suggestion, and the last, most important key in the sequence is still me, reading the thing before I press return. The day I stop doing that is the day it bites me for real.