how much swap, and why i stopped arguing about it

A rack of homelab servers

The swap question is one of those topics where everyone has an opinion, most of them are out of date, and none of them quite match the box in front of you. I've spent more time than I'd admit toggling swap on and off across the homelab and convincing myself the difference mattered. It mostly didn't, but the failure modes did, and the failure modes are the whole point. So here's where I've landed, with reasons, so I can stop relitigating it every time I provision a new machine.

the bad advice that won't die

The old rule was "swap should be twice your RAM". That made sense when machines had 256MB and disks were the only place to put pages you weren't using. It makes no sense now. Twice the RAM on a box with 64GB is 128GB of swap, and if you ever actually use a meaningful fraction of that the machine will be so deep into thrashing that it's effectively dead anyway. You'll be reaching for the power button long before swap fills.

The other extreme, "RAM is cheap, disable swap entirely", is the fashionable advice and it's also wrong, just in a quieter way. With no swap at all, the kernel has nowhere to put genuinely cold anonymous pages. Every page of memory a process has touched but isn't using has to stay resident in RAM, taking up space that could be page cache. And when you do run out, the OOM killer is your only release valve, and it has the judgement of a fairground claw machine. It will reliably grab the wrong process.

what swap is actually for

Swap is not "extra RAM". If you're using swap as overflow capacity for your working set, you've already lost, because disk is orders of magnitude slower than memory and the box will crawl. Swap is for two useful things.

First, it lets the kernel evict cold anonymous pages, the bits of process memory that were touched once at startup and never again. Pushing those out to disk frees real RAM for page cache, which is doing useful work. A little swap usage here is healthy, not a warning sign. People panic when free shows a few hundred megabytes of swap in use on a box that's otherwise fine, and they shouldn't.

Second, it's a shock absorber. If something briefly spikes, swap buys you a moment to react rather than going straight to the OOM killer. It turns a hard failure into a slow one, and a slow failure is one you can alert on and fix before it becomes an outage.

A close-up of a server's internals

what I actually do now

For the homelab I've settled on a small, fixed swap size rather than anything proportional to RAM. A few gigabytes is plenty on every box regardless of whether it has 8GB or 64GB. Enough to absorb a spike and let the kernel relocate cold pages, not enough to let a runaway process limp along in swap for an hour pretending everything's fine.

Then I turn swappiness down, but not off:

sysctl -w vm.swappiness=10

The default of 60 is tuned for desktops, where you'd happily swap out a background app to keep the foreground responsive. On a server I'd rather keep the working set in RAM and only swap when there's genuine pressure. Ten is a sensible floor; it still swaps cold pages eventually, it just stops being eager about it. I leave it at 10 across the board and have never regretted it.

For the boxes I care about most, I also lean on the OOM controls rather than trying to avoid OOM entirely. On the database host I set vm.overcommit_memory=2 and an overcommit ratio I've actually thought about, so an over-ambitious allocation fails at malloc time with an honest error rather than getting handed out and then murdered later when the box tries to back it. A failed allocation a process can handle. A SIGKILL out of nowhere it cannot.

the bit that actually matters

Whatever you choose, monitor swap activity, not swap usage. The number to watch is the rate of pages going in and out, si/so in vmstat 1, not the static "swap used" figure. A box sitting on 500MB of swap with zero swap traffic is perfectly happy; those are cold pages doing nobody any harm. A box with 50MB of swap and a constant stream of pages thrashing in and out is in trouble, even though the usage number looks tiny. The static figure tells you almost nothing. The rate tells you everything.

So: a few gigs of swap, swappiness at 10, alert on swap-in rate, and think harder about OOM behaviour on the boxes that matter. That's it. It's not clever and it's not controversial once you stop treating swap as either extra RAM or a moral failing. It's a small safety margin, sized so it can't paper over a real problem for long. I've stopped arguing about it, which is the most useful outcome of the whole exercise.