I needed a daemon. Not a service mesh, not a controller, not anything that wants a Helm chart. Just a long-running process that wakes up every thirty seconds, checks a handful of internal endpoints, and writes the result somewhere I can scrape it. The sort of thing that used to be a cron job and a shell script, until the shell script grew three flags and a state file and started lying to me about whether it was still running.
So I wrote it in Go, and the whole point of this post is how little I had to write. One binary, no dependencies beyond the standard library, and it does the job.
The shape of it
The bones of a daemon like this are always the same: a context that gets cancelled on signal, a ticker, and a worker that respects the context. Everything else is detail.
func main() {
ctx, stop := signal.NotifyContext(context.Background(),
syscall.SIGINT, syscall.SIGTERM)
defer stop()
t := time.NewTicker(30 * time.Second)
defer t.Stop()
if err := checkOnce(ctx); err != nil {
log.Printf("initial check failed: %v", err)
}
for {
select {
case <-ctx.Done():
log.Println("shutting down")
return
case <-t.C:
if err := checkOnce(ctx); err != nil {
log.Printf("check failed: %v", err)
}
}
}
}
signal.NotifyContext is the bit that made this pleasant. Before it existed you wired up a channel, caught the signal, cancelled a context yourself, and inevitably got the ordering subtly wrong on the second signal. Now Ctrl-C does the obvious thing, and a second Ctrl-C restores the default behaviour and kills it outright, which is exactly what you want when the first one didn't take.
Health, because systemd asks
I run this under systemd, so it gets a tiny HTTP server on a loopback port with /healthz returning the timestamp of the last successful check. Nothing clever. A handler, a mutex around the last-good time, and http.Server with a sane ReadHeaderTimeout so it isn't a slowloris target even on localhost.
srv := &http.Server{
Addr: "127.0.0.1:9101",
ReadHeaderTimeout: 5 * time.Second,
}
The thing I keep relearning: give every outbound request a timeout via the context, and give the server its timeouts explicitly. The zero values are not your friends. A daemon that hangs forever on a single dead endpoint is worse than no daemon, because at least no daemon tells you the truth.
The build is CGO_ENABLED=0 go build, the result is about six megabytes, and it copies to the box with scp. No runtime to install, no virtualenv to rot. It has been up for a week now and I have thought about it precisely zero times since, which is the highest praise I have for any piece of software I wrote myself.