Ramblings of an aging IT geek
← Ramblings of an aging IT geek
homelab

a backup you have not restored is a rumour

How I went from a homelab full of untested backups to a monthly restore drill that has actually caught problems.

A rack of servers

I used to have backups in the way most people have a fire extinguisher: present, comforting, never once tested. Restic was dutifully snapshotting everything to a NAS and an offsite bucket, the cron jobs were green, the repo size grew. I felt responsible. I was not.

The thing that changed me was a restore that did not work. Not a disaster, thankfully, just a config volume I wanted back. The snapshot was there, the data was there, and the application would not start because I had backed up the data but never the half-page of environment that told it what the data meant. Green ticks all the way down, and still useless.

Now there is a monthly job that picks a random service, restores its latest snapshot into a throwaway container on a different host, and starts it. If it comes up and answers, the test passes. If it does not, I get an email and a slightly bad evening, which is exactly when I want to find out, rather than at 2am during an actual fire.

It has caught two more gaps since: a database that needed a consistent dump rather than a file copy, and a secret that lived only in my head. The backups were never the hard part. Proving they restore is the whole job, and it is the part I had been quietly skipping.