the great disk shuffle, or how i rebuilt a pool without losing my nerve

A server rack with drive bays

My main pool was full. Not "getting full," not "might want to think about it," but the kind of full where TrueNAS starts gently warning you and the snapshots stop behaving and you realise you have been deleting things to make room for other things for a fortnight. The disks were 4TB drives I had bought when 4TB felt enormous, in a six-wide RAIDZ2, and I had decided to swap them out for 12TB drives. Three times the capacity, same layout, and a procedure I find genuinely nerve-wracking every single time despite having done it before.

This is the write-up I wish I had reread before I started, because halfway through I forgot half of it and had to look it up at the worst possible moment.

the plan, such as it is

ZFS will let you grow a RAIDZ vdev by replacing every disk in it with a larger one, one at a time, resilvering between each, and only expanding the capacity once the last small disk is gone. The good part is you never drop redundancy below the design unless something else fails mid-resilver. The bad part is the waiting. Each resilver is hours, and you do this once per disk, so a six-wide pool is the better part of a week of evenings spent watching a percentage tick upward.

The procedure per disk is the same dull dance:

# identify the disk you're about to pull, by serial, not by slot
zpool status tank
# offline the old disk so you don't yank a live one
zpool offline tank gptid/<old-disk>
# physically swap, then replace in the pool
zpool replace tank gptid/<old-disk> /dev/<new-disk>
# then wait, and resist the urge to do anything else to the pool
zpool status tank -v

The single most important line there is "by serial, not by slot." I label every drive bay, but labels lie, drives get reseated, and the only thing that does not lie is the serial number etched on the drive matched against what ZFS thinks is in which slot. Pull the wrong disk during a resilver and you have turned a routine upgrade into a restore-from-backup evening. I keep a printed sheet mapping serial to bay, taped inside the rack door, and I update it the moment anything changes.

Drives laid out during a homelab rebuild

the resilver waits, and what i did with them

Resilvering is where the nerve goes. The pool is degraded for the duration, running on the rest of its redundancy, and a RAIDZ2 down one disk is fine right up until a second disk picks that exact window to develop a sense of humour. This is not paranoia. Resilvering is the single most stressful thing you can ask of an array, because it reads every block on every remaining disk, which is precisely the workload most likely to find the latent bad sector that has been sitting quietly on an old drive for two years.

So the rule I hold to is: do not start a disk replacement on a pool you have not just backed up. I ran a fresh replication to the backup box before I pulled the first drive, verified it, and only then began. If a second disk had died mid-resilver, I would have lost an evening to a restore, not a year of data. That distinction is the whole reason the shuffle is survivable.

The other thing the waits taught me is patience with the throughput. A resilver on a busy pool crawls, because it politely yields to your actual I/O. I learned to do the swaps on a quiet evening, leave the pool alone, and let it run flat out overnight rather than half-throttling it for a day whilst I used the array normally. The total wall-clock time is much shorter if you simply leave it be.

A few hard-won notes:

Check zpool status for read/write/checksum errors on the other disks before each swap. If one of them is throwing checksum errors, sort that out first. Do not stack a second problem on top of a degraded pool.
Scrub before you start the whole exercise. You want to discover any lurking corruption whilst you are at full redundancy, not whilst you are one disk down.
Old drives lie about being healthy. Two of my outgoing 4TB drives showed SMART pending-sector counts I had never noticed, which is exactly the sort of drive that fails under resilver load. Better to know before, not during.

the expansion that does not happen until the end

This is the bit that trips people up, so it is worth saying plainly. The pool does not grow as you replace disks. You can swap five of your six drives and the pool is still its original size, because RAIDZ is constrained by its smallest member. The capacity only appears when the last small disk is gone, and even then only if autoexpand is on:

zpool set autoexpand=on tank
zpool online -e tank gptid/<new-disk>

When the final resilver finished and the new space materialised, the relief was disproportionate. A week of evenings, six nervous swaps, and then the available figure quietly tripled as if nothing had happened. Which, if you have done it carefully, is exactly the point. The best storage migrations are boring. The data never noticed, the services never went down, and the only evidence anything happened is six 4TB drives in an anti-static bag and a number on a dashboard that is now comfortably large again.

The whole exercise is an argument for backups dressed up as an argument for ZFS. The reason I could do this with a manageable amount of nerve rather than a paralysing amount is that the backup existed and was verified. Everything else, the replaces and the resilvers and the autoexpand, is just procedure. The backup is the thing that lets you treat your data as something you can take risks with, which is a slightly mad sentence, but it is the truest one here. Mind the serials, run the scrub, take the backup, then go and do something else whilst it works.