I wanted to see how little it takes to build an agent that does something real, rather than a chatbot that talks about doing something real. The answer, slightly annoyingly, is "not much code and quite a lot of judgement".
The core is embarrassingly simple. You give the model a system prompt, a list of tools it's allowed to call, and a loop. The model responds either with text for me or with a request to call a tool. If it's a tool call, you run the tool, feed the result back in, and go round again. That's it. That's the agent. The loop fits on a screen:
while True:
response = client.chat(messages, tools=tools)
if response.tool_calls:
for call in response.tool_calls:
result = run_tool(call.name, call.arguments)
messages.append(tool_result(call.id, result))
else:
return response.text
The tools I gave it were deliberately mundane: read a file, list a directory, run a read-only shell command, search some notes. The interesting part isn't the model, it's the wiring around the model, and that's where all my actual time went.
Three things mattered more than I expected. First, the tool descriptions are the real prompt. A vague description gets you a confused agent that calls the wrong thing or invents arguments. I rewrote them several times until they read like documentation for a careful junior, and the behaviour improved every time. Second, you need a hard ceiling on the loop. Without a maximum iteration count, a stuck agent will happily spin forever, calling the same tool and getting the same unhelpful result, burning tokens and your patience. Mine stops at fifteen and tells me it gave up, which is far better than not stopping.
Third, and this is the one I'd underline: the moment a tool can change the world rather than just observe it, you want a human in the loop. The version that can read files is something I'll let run unattended. The version that can run arbitrary shell commands gets a confirmation prompt before anything that writes, deletes, or reaches the network. I built the let-it-rip mode, ran it once on a throwaway directory, watched it confidently rm something it had misunderstood, and quietly added the prompt back. The model is genuinely good at deciding what to do and exactly as bad as you'd fear at being trusted to do it without supervision.
So: a working agent is a weekend, not a moonshot. The loop is trivial. What separates a toy from something you'd actually run is the unglamorous stuff: precise tool descriptions, a loop ceiling, and a confirmation step in front of anything irreversible. The intelligence was the easy part to bolt on. The restraint was the part worth building.