The mistake was mine, and it was the kind you make at half past eleven on a Friday when you should have stopped an hour earlier. A large package set update, a reboot to pick up a new kernel, and a machine that came back to a flashing cursor and nothing else. The initramfs was unhappy, a library had moved, and the dependency chain had eaten itself in a way I had no patience to untangle by hand.
Two years ago this would have been a reinstall, or a long evening with a rescue USB. Instead it was about five minutes, because the root filesystem is btrfs and I take a snapshot before anything touches the package manager.
the layout
The trick is to not snapshot your live root in place and hope. The subvolumes are laid out so that / is its own subvolume mounted from a named subvol, and snapshots live alongside it rather than nested inside. Roughly:
@ -> the live root
@home -> /home, kept separate so rollbacks don't touch user data
@snapshots -> where the read-only snapshots live
I have a small hook that runs before package transactions and drops a read-only snapshot:
btrfs subvolume snapshot -r / /.snapshots/pre-update-$(date +%Y%m%d-%H%M)
It costs almost nothing. Btrfs snapshots are copy-on-write, so a fresh one takes no real space until blocks start to diverge. You only pay for what changes afterwards.
the recovery
From a live USB I mounted the top-level volume (subvolid 5), which gives you the whole tree including the subvolumes and snapshots rather than just the booted root. Then it was the boring, reliable dance: move the broken @ out of the way, take a writable snapshot of the last good pre-update snapshot, and name it @.
mount -o subvolid=5 /dev/sda2 /mnt
mv /mnt/@ /mnt/@broken
btrfs subvolume snapshot /mnt/@snapshots/pre-update-20221216-2304 /mnt/@
Reboot, and the machine was back exactly as it had been on Wednesday afternoon, before I broke it. Home untouched because it lives in its own subvolume. The whole thing took longer to type up here than it did to do.
A couple of things I would press on anyone setting this up. Keep /home on its own subvolume, or your rollbacks will quietly revert user data along with the system and you will be cross. Test the recovery once, deliberately, while nothing is on fire, so the steps are muscle memory rather than panic-googling. And do not let the snapshots accumulate forever; a tiny prune job that keeps the last dozen is plenty.
I am not evangelical about filesystems. But a five-minute undo button on a machine I had genuinely bricked, on a Friday night, with zero data loss, has bought btrfs an enormous amount of goodwill from me. It saved my weekend. The plan had been to spend Saturday reinstalling. I spent it doing nothing of the sort, which is the whole point.