letting an agent touch the code, with the safety on

A robot rendered from glowing circuitry

I've started letting an agent edit my codebase, and the only reason I'm comfortable with it is that I treat its output exactly like a pull request from a stranger who is fast, tireless, occasionally brilliant and entirely without judgement. Supervised is doing a lot of work in that sentence. An agent loose in your repo with no guardrails is not a productivity tool, it's a way to generate plausible-looking damage at speed.

The setup matters more than the model. My rule is that the agent never edits anything I can't trivially review and reverse. In practice that means it works on a branch, never on main, and every change it makes is a diff I read before it goes anywhere near a merge. This is not paranoia, it's the same standard I'd hold a junior to, except the agent produces volume a junior couldn't, so the review discipline has to be tighter, not looser.

A circuit board in close detail

Three things make supervision actually work rather than just being a word I say to feel responsible.

First, the test suite is the real reviewer. The agent is allowed to run the tests, and a change that breaks them never reaches me, because the loop rejects it and tries again. This means my test coverage suddenly matters in a way it didn't before. A weak suite gives the agent enough rope to produce code that passes and is still wrong; a strong suite turns the agent into something that genuinely converges on working solutions instead of confident nonsense.

Second, small scoped tasks beat big vague ones by a mile. "Refactor the auth module" gives me a sprawling diff I have to reconstruct the reasoning for from scratch, which is slower than doing it myself. "Extract this validation into a function and add a test for the empty-input case" gives me a tight diff I can verify in seconds. The skill is in the decomposition, and that part is still firmly mine.

Third, I keep the blast radius small with the obvious mechanical controls. It runs in a working tree I can throw away. It doesn't have credentials to anything that matters. It cannot push, deploy, or touch production, and the commands it's allowed to run are on a short list rather than open season on my shell. None of this is exotic. It's just the difference between "an assistant that drafts changes for me" and "a process with my permissions and a language model's confidence," and only one of those is something I want anywhere near a repo I care about.

What I've landed on after a few weeks is that the agent is excellent at the work I find tedious and the work is well-specified: the boilerplate, the test scaffolding, the mechanical refactor, the "do this same change in fourteen files" job that I'd otherwise put off. It's poor, still, at the work that needs taste and a model of why the system is shaped the way it is, which is most of the interesting work. So I supervise. The agent drafts, I decide, the tests adjudicate, and nothing it produces is trusted until it's earned it. That arrangement has made me faster. The version where I trust it blindly would, I'm fairly sure, eventually make me famous for the wrong reasons.