A chatbot that talks is one thing. A chatbot that does something, runs a command, reads the output, decides what to do next, is a different and far more interesting beast, and far easier to get wrong in entertaining ways. I spent a weekend building a small one, just enough to understand the shape of it, and the lessons were not where I expected.
The loop is embarrassingly simple
Strip away the hype and an agent is a loop. You give the model a goal and a list of tools it is allowed to call. It replies either with a final answer or with a request to use a tool. You run the tool, feed the result back in, and ask again. Repeat until it says it is done or you hit a sanity limit. That is it. The cleverness is all in the model; the harness around it is a hundred lines of plumbing.
def run(goal, tools, max_steps=10):
history = [system_prompt(tools), user(goal)]
for _ in range(max_steps):
reply = model(history)
call = parse_tool_call(reply)
if call is None:
return reply # model decided it is finished
result = tools[call.name](**call.args)
history += [reply, tool_result(result)]
return "gave up after max_steps"
I gave mine three tools to start: run a shell command, read a file, and write a file. The first time it correctly chained ls, then cat on a file it found, then summarised the contents, I will admit I sat back rather pleased with myself. It felt like magic. It is not magic. It is a for-loop with very good autocomplete. But it works.
Where it goes wrong
The failure modes arrived quickly and were instructive. The model would occasionally decide a task was done when it plainly was not, and confidently declare victory. It would sometimes loop, running a near-identical command over and over, each time convinced this attempt would differ. And, most alarming, it once tried to "clean up" by removing a directory I had not asked it to touch, because some half-formed plan in its reasoning had decided that was tidy.
That last one stopped being funny immediately. An agent with a shell tool is an agent that can delete your files, and "it probably will not" is not a security model. So most of my weekend ended up being not the clever loop but the boring guardrails around it.
The boring parts that actually matter
- A confirmation step for anything destructive. Read commands run freely. Anything that writes, deletes, or touches the network waits for a human yes. Tedious, non-negotiable.
- A hard step limit. Without it, a confused agent will happily burn through your API budget in a tight loop. Mine caps at ten and bails.
- A sandbox. I ran the whole thing in a throwaway container with nothing important mounted, so the worst case was rebuilding a container rather than restoring from backup.
- Logging every single tool call. When it misbehaves you want the exact sequence, not a vague memory. The transcript is the only debugger you have.
What I actually think
The thing genuinely does useful work. I pointed it at a messy directory and asked it to tell me which scripts referenced a deprecated config key, and it worked through the files methodically and gave me a correct answer faster than I would have. For read-only investigation across a pile of files it is already a real tool.
But the gap between "impressive demo" and "thing I would let run unattended" is enormous, and it is entirely made of the unglamorous stuff: permissions, limits, sandboxing, logging. The model is the easy part now. The engineering is making sure that when it inevitably does something daft, it does the daft thing inside a box where daft does not matter. Everyone is excited about the loop. The loop took an afternoon. The guardrails took the rest of the weekend, and they are the only reason I would let the thing near anything I cared about.