Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

moving my root filesystem onto zfs without losing my nerve

Migrating a home server's root onto ZFS for snapshots and checksums, and the boot-loader and initramfs faff that nobody warns you about.

A server with several disks, terminal open mid-installation

I have run ZFS on data pools for years and trusted it completely. Root has always stayed on plain ext4, partly out of caution and partly because rooting on ZFS used to be genuinely fiddly. The thing that finally pushed me was a botched package upgrade that left a box in a sulk for an evening. With ZFS I could have snapshotted before the upgrade and rolled back in seconds. That is the whole pitch: cheap snapshots before anything risky, and checksums catching bit rot before it spreads.

The actual work

The pool itself is the easy bit. The awkwardness is everything around boot. Your initramfs needs the ZFS modules and the import logic, and your boot loader needs to find a root that is no longer a normal partition.

zpool create -o ashift=12 -O compression=lz4 \
  -O mountpoint=none rpool /dev/disk/by-id/...
zfs create -o mountpoint=/ rpool/ROOT
zfs create -o mountpoint=/home rpool/home

Using by-id paths rather than /dev/sda matters more than it looks: device names shuffle, and a pool that imports cleanly today will refuse to at the worst possible moment if it is pinned to a name that moved.

A rack of disks with status lights

The bit that bites

The part that cost me an hour was the initramfs. On a Debian-flavoured box the ZFS hook has to be present and the root needs to be passed in a form the import understands, and update-initramfs -u has to actually rebuild with the module included. The failure mode is a kernel that boots, can't find root, and dumps you in the initramfs shell with a blinking cursor and no obvious clue. The fix is almost always that the module wasn't baked in, or the pool wasn't set to be imported at boot. A quick zpool import from that shell confirms the pool is fine and it is purely a boot-time discovery problem.

I also set rpool/ROOT to not auto-mount and let the boot process handle it, because two things racing to mount your root is exactly the sort of intermittent horror you do not want to debug at 2am.

Was it worth it

Yes, but be honest about the cost. The day-to-day win is real: zfs snapshot rpool/ROOT@before-upgrade takes no space and no time, and rolling back is instant. The price is that boot is now slightly more magical than it was, and future-you needs to remember that ZFS lives between you and a working system. I keep a rescue USB with the modules on it, tested, because a clever filesystem you can't recover is just a slower way to lose data.