when the write-back stalls: tuning dirty_ratio on a busy box

A terminal showing system stats on a Linux server

The symptom was a box that ran beautifully and then, every minute or two, froze for a second or two. Not crashed, not paged, just a brief stall where everything writing to disk seemed to hold its breath. Request latency would sit flat and lovely, then spike, then flatten again, with a rhythm regular enough that I knew it wasn't load. Load doesn't tick like a metronome. Something internal was.

The culprit was the page cache and the dirty-page write-back machinery, specifically vm.dirty_ratio and vm.dirty_background_ratio. When you write to a file, the data lands in the page cache first and gets flushed to disk later. That's normally a gift: it lets the kernel batch and reorder writes. But two thresholds govern when the flushing happens, and on a box with a lot of RAM the defaults can be dangerous.

what the two thresholds actually do

dirty_background_ratio is the polite threshold. When dirty pages exceed this percentage of available memory, the kernel kicks off background flusher threads to write them out, and your processes carry on unbothered. dirty_ratio is the rude one. When dirty pages hit that percentage, the kernel stops being polite: any process trying to write gets blocked, synchronously, until enough has been flushed to get back under the line.

On a server with 64GB of RAM and the old percentage-based defaults, dirty_ratio at 20% meant the kernel was willing to accumulate something like 12GB of dirty pages before it slammed the brakes on. And when it slammed them on, it had to push all of that to disk before anyone could write again. On spinning media, or even a busy SSD, that's your one-to-two-second stall, arriving like clockwork as the cache filled and dumped, filled and dumped.

A rack-mounted server seen from the front

the fix: smaller, or better, in bytes

You can see the current values plainly:

sysctl vm.dirty_ratio vm.dirty_background_ratio

The instinct is to just lower the ratios, and that helps. But the percentages scale with RAM, which is exactly the trap, the more memory you add the worse the stall gets. The better tool is the byte-valued siblings, which let you cap the dirty pool in absolute terms regardless of how much RAM the box has:

sysctl -w vm.dirty_background_bytes=$((256 * 1024 * 1024))
sysctl -w vm.dirty_bytes=$((1024 * 1024 * 1024))

Setting either of the _bytes values automatically zeroes its _ratio counterpart, so you're choosing one model or the other, not both. With background flushing starting at 256MB and the hard wall at 1GB, the kernel writes little and often instead of hoarding and dumping. The flushers stay busy in the background, the dirty pool never grows large enough to require a dramatic synchronous purge, and the metronome stops.

The latency graph after the change is the kind of thing you screenshot and feel quietly smug about. Flat, with the periodic spikes simply gone. Make the change persistent in a file under /etc/sysctl.d/ once you've confirmed it, because the one thing worse than a stalling box is a box that stalls again after a reboot because you only ran sysctl -w. The defaults are tuned for a desktop's worth of RAM. The moment your server has tens of gigabytes, they stop being defaults and start being a latency bug waiting for a quiet afternoon to reveal itself.