moving twelve terabytes without losing my nerve

A server rack with storage shelves

The NAS was full. Not nearly-full in the way that nags you for a month, but actually full, the kind where a backup job fails at 2am and you wake up to a wall of red in the dashboard. The pool was four 4TB drives in a single raidz2 vdev, bought years ago when 4TB felt enormous, and it was time to grow it.

The plan with ZFS, when you can't just add a vdev, is the disk shuffle: replace each drive in the vdev one at a time with a larger one, let the pool resilver onto the new disk, and once the last old drive is gone the pool expands to fill the new capacity. Four 12TB drives, swapped in sequence. Simple on paper.

A homelab setup with drives and cabling

In practice it is an exercise in patience. Each resilver took the better part of a day because raidz2 has to reconstruct from parity across the whole vdev, and I refused to run two replacements at once. With raidz2 you can survive two simultaneous failures, but during a resilver you've effectively spent one of those tickets, and I was not about to gamble the other one to save myself a few hours. So: pull a drive, insert the new one, zpool replace, wait, verify, repeat. Four times.

The part that genuinely matters and that people skip: I checked the SMART data on every old drive before I trusted the array to survive the operation, and I made sure the most recent backup had actually completed and was restorable before I pulled a single disk. A disk shuffle is the moment a marginal drive decides to die, and a degraded raidz2 mid-resilver is exactly when you find out whether your backups are real or theoretical.

A couple of notes from the far side:

autoexpand was already on, so the moment the last old disk finished resilvering the pool grew on its own. If yours is off, the pool will sit at the old size looking smug until you turn it on or do it by hand.
Resilver speed is bounded by the slowest disk and by how much the pool is doing otherwise. I paused the scrub schedule and the heavier scheduled jobs for the duration. No point making the array fight itself.
TrueNAS reported each step honestly in the UI, but I kept a terminal open with zpool status anyway. Watching the percentage tick up is oddly soothing, and it's the source of truth.

Three days, mostly spent waiting, with the actual hands-on time measured in minutes. The pool went from 8TB usable to 24, the backups have somewhere to breathe again, and I learned, once more, that the storage layer rewards the boring approach. Do one thing, confirm it worked, then do the next. The drives that came out are now cold spares in an anti-static bag, which is to say they will sit in a drawer until I throw them away in 2031.