Every couple of years I rebuild a box, get to the partitioning step, and freeze on the same question: how much swap? The internet has been arguing about this since before I had a beard, and the answers range from "twice your RAM" (a rule from when 256MB was a lot) to "zero, swap is evil, just buy more RAM". Both camps are loud and both are mostly wrong for the machine in front of me.
So I sat down over the holidays and actually settled it for my homelab, then wrote it here so I stop having the argument with myself.
what swap is actually for
The unhelpful framing is "swap is overflow RAM for when you run out". That's the case you least want to hit, because once you're actively swapping the working set, the machine grinds and you'd rather it just told you the truth and OOM-killed something.
The useful framing is that swap lets the kernel evict cold anonymous pages that are never going to be touched again, freeing real RAM for page cache. A long-running daemon allocates a load of memory at startup, uses a fraction of it steadily, and leaves the rest sitting there cold. With a little swap, those cold pages go to disk and your cache gets bigger. With zero swap, they squat in RAM forever. You can see the cold pages the kernel wants to move with vmstat: a non-zero so (swap-out) early on, then nothing.
what I landed on
For every machine in the homelab, regardless of role:
- A small swap file. 2GB on the small nodes, 4GB on the ones running databases. Not a partition, a file, because resizing a file is
fallocateand resizing a partition is a bad afternoon. - zram on top, sized to roughly half of RAM, at a higher priority than the disk swap.
vm.swappinessleft at a low-ish value, around 10, so the kernel reaches for swap reluctantly rather than eagerly.
zram is the bit that changed my mind. It's a compressed block device living in RAM, so "swapping" to it is really just compressing cold pages in place. lz4 gets me roughly 3:1 on typical daemon memory, it's faster than any SSD, and it means the disk swap file is purely a last-resort safety net rather than a hot path. The kernel prefers the higher-priority zram device and only spills to the file when zram itself is full.
# /etc/systemd/zram-generator.conf
[zram0]
zram-size = ram / 2
compression-algorithm = lz4
# swap file, low priority, the genuine backstop
fallocate -l 4G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile -p 5
The disk file gets priority 5, zram comes out higher by default, so pressure goes to zram first.
the bit that matters
The honest answer to "how much swap" is "enough that the kernel can evict genuinely cold pages, and not a byte more, because if you're leaning on disk swap to run your workload you have a sizing problem, not a swap problem." A small file plus zram gives me that. The page cache stays warm, the cold startup allocations get compressed away, and on the rare day something genuinely runs the box out of memory, the OOM killer does its job instead of the machine flatlining for ten minutes first.
It is not a thrilling conclusion. But it's written down now, and next rebuild I get to skip the argument.