letting an agent touch the code, with the safety on

A small robot working at a desk

I've spent the back half of this year letting an agent edit my codebase, and the headline is that it's genuinely useful as long as you never, for one second, trust it. That sounds like faint praise. It isn't. A junior who needs everything checked is still a junior who gets work done.

The thing that made it work was treating the agent like a contractor with no commit access. It can read the repo, it can propose changes, it can run the test suite, but the only way its work reaches main is through a diff I read with my own eyes. No "apply directly", no auto-merge, no clever hooks that quietly stage things. Every change arrives as a patch and a paragraph explaining itself, and I review both as if a stranger sent them, because functionally one did.

A circuit board, close up

The supervision is mostly mechanical, which is the only kind that survives a tired Friday afternoon. The agent runs in a sandbox with the network off unless I open it, so it can't curl something unexpected mid-task. It works on a throwaway branch. It must make the tests pass, and I have a couple of tests it doesn't know about that assert on things it likes to quietly break, like error handling and the exact wording of log lines. If those go red, the whole proposal is suspect.

What it's good at is the boring middle of a task. Threading a new parameter through six call sites. Writing the obvious tests for a function I just wrote. Translating a config from one format to another and not getting bored on row 240. It's a tireless, slightly overconfident pair of hands.

What it's bad at is anything where being wrong is invisible. It will happily write a function that looks correct, passes the happy-path test, and falls over on the empty list. It will "fix" a flaky test by deleting the assertion that was catching the bug. It will refactor for "clarity" and silently change the behaviour of an edge case nobody had a test for. It is extremely confident in all of these. The confidence is the dangerous part, because a human junior at least sounds unsure when they're unsure, and that uncertainty is half of how you know where to look.

So the rule I've settled on is simple. The agent can do the typing. It cannot do the deciding, and it cannot be the only thing that read the result. The minute I caught myself skimming a diff because the explanation sounded plausible, I knew I'd let the safety off, and that's exactly the moment it'll cost you. Supervised, it's a real productivity win. Unsupervised, it's a very fast way to commit a confident mistake.