I shipped a daemon this week. It does one thing, it's about a hundred and twenty lines of Go, and replacing the cron job and shell script it superseded has made my evenings measurably quieter. This is a post in praise of the small thing, because nobody writes those and they're most of what actually keeps systems alive.
The job was unglamorous. Every few minutes, check a directory for new files, do a bit of processing, push the result somewhere, and tidy up. It had lived for a couple of years as a cron entry calling a shell script, and like all such arrangements it worked right up until it didn't. The classic failure modes had all visited at least once: two runs overlapping because one took longer than the cron interval, a run dying halfway and leaving a half-processed file that the next run choked on, and my personal favourite, the script silently doing nothing for a week because a path changed and cron eats stdout by default so nobody noticed.
I'd patched around each of these as they happened. A lockfile for the overlap. A trap for the cleanup. A line of logging that went to a file I then forgot to check. The script had become a small museum of past incidents, each one commemorated by a defensive line of bash. It was time to stop adding exhibits.
why a daemon, and why Go
The honest reason for a long-running daemon over cron is control. Cron gives you a process that's born, does its thing, and dies, with no memory and no say over when it runs relative to itself. A daemon stays up, holds its own state, decides for itself when it's safe to start the next cycle, and can shut down cleanly when asked. The overlap problem just evaporates, because there's only ever one of it.
Go for a couple of reasons that have nothing to do with fashion. A single static binary I can scp onto a box with no runtime to install. Goroutines and channels that make "do work on a timer, but stop cleanly when told" almost trivial to express. And, as of a fortnight ago, Go 1.11 landed with modules, so I got to set this up with go mod from the start rather than fighting GOPATH, which alone made the afternoon more pleasant than it had any right to be.
the shape of it
The whole thing is a ticker, a signal handler, and a select:
func main() {
log.SetFlags(log.LstdFlags | log.LUTC)
ctx, cancel := context.WithCancel(context.Background())
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
ticker := time.NewTicker(2 * time.Minute)
defer ticker.Stop()
log.Println("started")
for {
select {
case <-ticker.C:
if err := runOnce(ctx); err != nil {
log.Printf("cycle failed: %v", err)
}
case s := <-sigs:
log.Printf("got %s, shutting down", s)
cancel()
ticker.Stop()
log.Println("stopped cleanly")
return
}
}
}
That's the spine. The ticker fires every couple of minutes, runOnce does the actual work, and because it's all sequential within the loop, two cycles can never overlap. The next tick simply waits for the current one to finish. The lockfile I'd been carrying around in the shell version was solving a problem that this structure doesn't have.
the bit that actually matters: shutting down
The single most important thing in that loop is the signal handling, and it's the part the shell script never did properly. When something restarts this service, systemd sends a SIGTERM and then waits a few seconds before it loses patience and sends SIGKILL. The whole point of catching SIGTERM is to use those few seconds to stop at a clean boundary rather than being shot mid-write.
So runOnce takes the context, and the long or interruptible parts of it check ctx.Done(). If a shutdown comes in while a cycle is running, I let the current file finish, because half-processing a file is exactly the mess I was trying to escape, but I don't start the next one. The result is that a restart never leaves a half-done file behind. That property, "always safe to kill between files", is the entire reason the daemon is more reliable than the cron job, and it's maybe ten lines of work.
I'll admit I got the first version of this wrong. My initial runOnce ignored the context entirely, caught the signal, and exited immediately, which on a bad day would chop a file in half just as surely as the old script. The fix was to thread the context all the way down to the loop over files and check it between each one:
func runOnce(ctx context.Context) error {
files, err := pending()
if err != nil {
return err
}
for _, f := range files {
select {
case <-ctx.Done():
log.Println("shutdown requested, stopping after current batch")
return nil
default:
}
if err := process(f); err != nil {
return fmt.Errorf("processing %s: %w", f, err)
}
}
return nil
}
The default in that select is the trick: it makes the context check non-blocking, so it's a cheap "have we been told to stop?" between each file rather than something that waits.
the unglamorous wins
A few things I'd underrated until I had them.
Logging to stdout and letting systemd's journal catch it. No more logfile I forget to check. journalctl -u thedaemon and it's all there, with the UTC timestamps I set on the logger, which has saved me twice already when comparing against timestamps from another box.
A real exit code and a real error path. The shell script's idea of error handling was to keep going and hope. The daemon returns errors up the stack, logs them with context, and carries on to the next cycle rather than dying, so one bad file no longer poisons the run.
And the systemd unit itself, which is almost embarrassingly short:
[Unit]
Description=The tiny daemon
After=network.target
[Service]
ExecStart=/usr/local/bin/thedaemon
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Restart=on-failure means that if it does fall over for some reason I haven't anticipated, it comes back on its own, which is the one genuinely good thing the cron approach had and I wasn't going to give up.
the actual point
None of this is clever. There's no concurrency to speak of, no generics, no architecture worth the name. It's a loop, a ticker, and a signal handler, the most boring possible Go programme. That's exactly why I'm pleased with it. The old arrangement worked most of the time and failed in quiet, annoying ways that always seemed to surface at the worst moment. This one is dull, legible, and fails safely, and I can read the whole thing in one screen and know what it does.
I think we undervalue shipping the small, boring replacement for the fragile thing. It's not a project. It doesn't go on a roadmap. It's an afternoon and a hundred lines, and the reward is purely the absence of future irritation, which is the least visible kind of win there is. But it's a real one. My pager has been quiet since Tuesday, and I'd take a hundred boring lines of Go that let me sleep over a clever thing that doesn't, every single time.