Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

the swap debate, settled for my homelab

Why I keep a small amount of swap on every homelab box even when there is plenty of RAM, and what the OOM killer taught me about it.

A Linux terminal showing memory and swap statistics

"Just turn swap off, you have loads of RAM." I have heard this for years and I used to nod along. Then a box with 32GB of RAM and no swap fell over under memory pressure in the least graceful way possible, and I changed my mind. Here is where I landed.

Keep a small amount of swap. Not because you intend to use it as overflow RAM, you don't, but because a Linux system with zero swap behaves worse under pressure, not better. Without swap the kernel has nowhere to push genuinely cold anonymous pages, so it cannot reclaim them, and it leans harder on dropping file-backed cache instead. When that runs out the OOM killer arrives, and it tends to arrive suddenly and pick the wrong process.

A rack-mounted homelab server with its lid off

The detail that finally settled it for me is vm.swappiness. Swappiness is not "how much swap to use", it is the kernel's bias between reclaiming anonymous pages and reclaiming file cache. With a modest swap device and a low swappiness, the kernel can quietly evict the pages that genuinely never get touched, a daemon's startup junk that has sat idle for a week, while keeping the hot cache that actually makes the box feel fast.

So my standard now is boring and consistent across every machine:

# /etc/sysctl.d/99-swap.conf
vm.swappiness = 10
vm.vfs_cache_pressure = 50

A few gigabytes of swap, swappiness at 10 so it only reaches for swap under real pressure, and cache pressure dialled down a touch so the kernel holds onto inode and dentry caches a bit longer. On the boxes that matter I add zram for compressed in-memory swap, which gets you most of the resilience without the disk latency, but the principle is the same.

The one place I do switch swap off entirely is anything with a strict latency budget where a swap stall would be worse than an outright kill, and there I would rather set proper cgroup memory limits and let the right thing die predictably. But that is a deliberate decision per workload, not a blanket rule applied because someone on a forum said RAM is cheap.

The lesson, if there is one: swap is not overflow RAM, it is a pressure-relief valve. A system with a small valve degrades. A system with no valve detonates. I will take the gentle degradation every time.