Ramblings of an aging IT geek
← Ramblings of an aging IT geek
golang

the smallest useful daemon i've written this year

Building and shipping a tiny Go daemon to replace a flaky cron-and-bash health check, with notes on signals, graceful shutdown, and why a single static binary is such a relief to deploy.

A code editor showing Go source

I replaced a tangle of cron jobs and bash with one small Go daemon this week, and the relief of deploying a single static binary instead of a script that depends on six things being installed is hard to overstate. The job is trivial: poll a handful of internal endpoints every thirty seconds, and if any of them stay unhealthy for long enough, post a message to a chat webhook. That's it. It did not need to be a daemon. But the cron-and-bash version had been quietly rotting, and rewriting it properly was less work than fixing it again.

The old version was the usual accretion. A cron line every minute, a bash script that curled each endpoint, some grep over the output, a temp file to remember state between runs because cron has no memory, and a lock file to stop overlapping runs that didn't always get cleaned up. Every part of that was fine on its own. Together they'd become the kind of thing where a failure could mean the script broke, or cron didn't fire, or the temp file got corrupted, or curl wasn't where the script thought it was after an OS upgrade. Diagnosing which took longer than the check itself ran.

Why Go, specifically

Two reasons, and neither is performance, this thing does nothing a Raspberry Pi couldn't do in its sleep. The first is the single static binary. No runtime to install, no virtualenv, no "but it worked on my machine because I had jq". You build it, you scp it, it runs. The second is that the concurrency and signal handling I actually wanted are part of the standard library rather than something I'd bolt on nervously in shell.

The shape of it is a loop on a ticker, with proper shutdown wired in from the start, because a daemon that ignores SIGTERM is a daemon that gets SIGKILLed mid-write eventually.

func main() {
	ctx, stop := signal.NotifyContext(context.Background(),
		syscall.SIGINT, syscall.SIGTERM)
	defer stop()

	ticker := time.NewTicker(30 * time.Second)
	defer ticker.Stop()

	for {
		select {
		case <-ctx.Done():
			log.Println("shutting down")
			return
		case <-ticker.C:
			runChecks(ctx)
		}
	}
}

signal.NotifyContext is the bit I want to point at, because it's newer than a lot of the daemon-in-Go examples floating around. It hands you a context that cancels when the signal arrives, so the same ctx.Done() you use to stop the loop also threads down into your HTTP calls and cancels in-flight requests cleanly. No global flag, no channel you forget to close. When systemd sends SIGTERM, the loop exits, the current checks unwind, and the process is gone before systemd's patience runs out. That's graceful shutdown for free, more or less.

The check itself, and the one bit of state worth keeping

Each check is a context-aware HTTP request with its own timeout, so a hung endpoint can't wedge the whole cycle:

func check(ctx context.Context, url string) error {
	ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
	defer cancel()

	req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		return err
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		return fmt.Errorf("status %d", resp.StatusCode)
	}
	return nil
}

The only state I keep is "how long has this endpoint been unhealthy", held in a map in memory. I deliberately did not reach for a database or a file. If the daemon restarts, it forgets, and that's fine, the worst case is one slightly late alert, and a daemon that needs persistent state to do a thirty-second health check has lost the plot. The whole point of this rewrite was to remove moving parts, not add a database to a job that pings some URLs.

A close-up of Go code on a screen

Shipping it

CGO_ENABLED=0 go build gives a static binary with no libc dependency, which means it runs on whatever I throw it at, including a stripped-down container, without a second thought. No glibc version mismatch, no scrambling for the right base image, no dynamic linker surprises after an OS upgrade moves a library out from under you. A small systemd unit with Restart=on-failure and it's done. The unit file is longer than the interesting part of the program.

[Service]
ExecStart=/usr/local/bin/healthd
Restart=on-failure
RestartSec=5

The honest reckoning: this is maybe a hundred and fifty lines of Go, and it replaced perhaps forty lines of bash and a cron entry. On a pure line count I lost. But the bash version had three failure modes that weren't about the check at all, they were about the plumbing, and the Go version has none of them. It either runs or systemd restarts it, and when it runs it does exactly one thing. There's a real satisfaction in deleting a category of problem rather than fixing an instance of it.

There's also a maintainability angle I didn't appreciate until I came back to it a week later. The bash version's logic was spread across the script, the crontab, a lock file, and an unwritten assumption about which utilities were installed. To understand it you had to hold all four in your head at once. The Go version is one file, top to bottom, with the control flow on the page in front of you. The signal handling, the ticker, the per-check timeout, the alerting, it's all there in order, and a future me, or someone else entirely, can read it without first reconstructing the environment it grew up in. That legibility is worth more than the line count makes it look.

I'm not going to pretend every shell script wants to become a Go daemon, most don't, and reaching for a compiler every time a script annoys you is its own kind of disease. But when a script has quietly grown state, locking, and a dependency on its environment that you can't quite enumerate, that's the signal. At that point the script is a daemon in denial, and you may as well write the real one. This one's been running for a few days without a murmur, which is the highest praise a health check can earn.