Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

moving root onto zfs without losing my nerve

Notes from migrating a Linux box to ZFS on root, why I bothered, and the boot-loader detail that nearly caught me out.

A Linux terminal mid-installation

I've been running ZFS on data pools for a while and trusting it completely. What I hadn't done was put root itself on ZFS, partly out of nerves and partly because the tooling used to be fiddly. With ZFS on Linux now packaged sensibly on Ubuntu 16.04, I finally did it, and the headline is: snapshots of your entire operating system are exactly as good as they sound.

The reason to bother is boot environments. Before I touch anything risky now, an upgrade, a kernel, a configuration change I'm not sure about, I take a snapshot of the root dataset. If it goes wrong I roll back and reboot, and the machine is precisely as it was, system files and all. That safety net changes how willing you are to experiment on a box you care about.

A server's internals, drives visible

The layout I settled on follows the common pattern: a small pool for root with sensible datasets split out, so the things you want to snapshot separately are separate.

rpool/ROOT/ubuntu      mounted at /
rpool/home             mounted at /home
rpool/var/log          mounted at /var/log

Splitting /var/log out matters more than it looks. You almost never want to roll your logs back when you roll the system back, so keeping them on their own dataset means a system rollback doesn't lose the very records that tell you why you rolled back.

A couple of properties earn their keep immediately:

zfs set compression=lz4 rpool
zfs set atime=off rpool

lz4 compression is effectively free on modern CPUs and a root filesystem full of text, binaries and logs compresses well, so you get capacity and often a little throughput for nothing. Turning atime off stops every read rewriting metadata, which is pure overhead for a root pool.

The bit that nearly caught me out is the boot loader, because GRUB has to understand the pool to find the kernel. You cannot enable every ZFS feature flag and still expect GRUB to read the pool. The safe move is to keep the boot pool's features to the set GRUB supports, or to carry a separate small /boot so the loader never has to parse the fancy pool at all. I went with a conservative feature set on root and it booted first time, which after a root filesystem migration is a genuinely lovely thing to watch.

One more thing worth doing on day one rather than day thirty: a scrub schedule and scrub it monthly. ZFS will tell you about silent corruption, but only if you ask it to look.

Is it worth it for a single laptop? Maybe not. For anything I'd be sad to rebuild by hand, the answer is now plainly yes. The first time you fat-finger an upgrade and fix it with zfs rollback and a reboot, you stop asking the question.