Ramblings of an aging IT geek
← Ramblings of an aging IT geek
ai

a small agent that does the boring bit for me

I wired an LLM up to a couple of real tools so it could triage my inbox of alerts, and learned where the line between useful and dangerous sits.

A small robot arm beside a circuit board

Most of the "AI agent" demos I had seen by the summer were a model talking to itself about a task it never actually performed. I wanted the opposite: something small that did one real, boring job end to end, with hands on actual tools, and that I could trust enough to leave running.

The job I picked was alert triage. Every morning I have a folder of overnight alerts, most of them noise, a few that matter. I wanted something to read them, group the duplicates, and tell me the two or three I should actually look at. Not fix anything. Just sort the inbox so I do not have to.

The shape of it

The loop is unglamorous and that is the point. The model gets a system prompt describing the tools it has, then I hand it the alerts and let it call functions until it is done. The function-calling support that landed in the OpenAI API back in June made this tidy, because the model returns a structured tool call rather than me parsing prose and praying.

The tools were deliberately tiny:

  • fetch_alerts(since) returns the raw alert objects.
  • lookup_runbook(service) returns the wiki page for a service, if one exists.
  • post_summary(text) writes a message to a Slack channel.

That is the whole toolbox. Three functions, each one a normal bit of code I had already written, with a JSON schema bolted on the front so the model knows how to call it.

A close-up of a circuit board

Where it earned its keep

The grouping is the part that genuinely surprised me. Classic dedup logic struggles when the same underlying fault shows up as five different alerts with different wording from different systems. The model is good at this precisely because it does not need an exact match. "These four are all the same database being unreachable, worded by four different exporters" is the kind of judgement that is annoying to write rules for and easy for a language model to make.

So every morning I now get a Slack message: here are the three things that matter, here is why, here are the runbook links. The forty lines of noise underneath them I never read.

Where I drew the line

The thing I did not do is give it the power to act. No restart_service, no silence_alert, nothing that changes state. The moment an agent can take an irreversible action on the strength of a probabilistic guess, you have built a very confident intern with root and no supervision. I am not ready for that, and I am not sure I ever will be for production.

The honest summary is that the value is not autonomy. The value is summarisation and judgement applied to a firehose, with the actual doing left to deterministic code and to me. Keep the tools small, keep the dangerous ones out, and a little agent that "does things" turns out to be quietly, genuinely useful. Hand it the keys and you have built a liability with good manners.