Ramblings of an aging IT geek
← Ramblings of an aging IT geek
homelab

backups i actually test now

Moving the homelab to restic with a scheduled restore-and-verify job, because a backup you have never restored is just a hopeful directory of files.

A server rack with cabling

For years my backup strategy was an rsync cron job and a quiet faith that it was working. The job ran, the destination directory grew, nothing errored. I called that "having backups". What I actually had was an untested copy of some files and a strong opinion about restores I'd never performed.

The thing nobody tells you is that an untested backup isn't a backup, it's a hypothesis. You don't know if it restores until you restore it, and the day you find out is, by tradition, the worst possible day to learn the answer is no.

So I moved everything to restic. The wins are well documented: deduplication, encryption at rest, snapshots, and a repository format that doesn't fall apart if a sync gets interrupted halfway. The backup side is unremarkable, which is exactly what you want from a backup:

restic backup /srv /etc /home \
  --exclude-file=/etc/restic/excludes.txt \
  --tag nightly

A homelab rack with blinking lights

The part that changed how I sleep is the second job. Once a week, a systemd timer restores a single known file into a scratch directory and checks it matches a stored checksum, then runs restic check to verify the repository's own integrity.

restic restore latest --target /tmp/restore-test --include /etc/hostname
sha256sum -c /etc/restic/canary.sha256
restic check

It's deliberately small. I'm not restoring the whole estate every week, that would be daft and slow. I'm proving three things: the repository is readable, restic can pull a file back out of it, and the file that comes out is byte-for-byte what went in. If any of those break, the timer fails and I get an email whilst it's still a curiosity rather than a crisis.

It has, once, caught a problem. A repository on a flaky USB disk threw a check error about a missing pack file. The nightly backups had been "succeeding" for a fortnight on top of a repo that was already quietly corrupt. Without the weekly check I'd have discovered that the next time I needed it, which is to say: too late.

None of this is clever. restic does the hard parts; I just added a small loop that proves the backups are real before I have to bet anything on them. That's the whole change. A backup you've never restored is a directory of files you feel good about, and feelings aren't a recovery plan.