A write-heavy box had a horrible tic: every so often, latency would spike, the disks would saturate for a few seconds, and then everything settled down again until the next time. Steady-state load was fine. It was the periodic stutter that was killing us.
The culprit was page cache writeback. When you write to a file, the data sits in the page cache as "dirty" pages and gets flushed to disk later. With lots of RAM and the defaults, the kernel was happy to let a very large pile of dirty pages accumulate, and then dirty_ratio would trip and the kernel would block writers while it dumped the whole lot to disk at once. That dump was the spike. A long quiet period of buffering, then a brutal flush.
You can see the backlog directly:
grep -i dirty /proc/meminfo
Watch that number climb between flushes and you've found your sawtooth. The fix is to make the kernel flush sooner and in smaller bites, so writeback is a steady trickle instead of an occasional flood:
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
Lower dirty_background_ratio starts the background flusher earlier; lower dirty_ratio caps how much can pile up before writers get throttled. On a box with lots of RAM the byte-based knobs (dirty_background_bytes, dirty_bytes) are often saner than ratios, because 5% of a large amount of memory is still an enormous amount of writeback. Set them, sysctl -p, and watch the sawtooth flatten into something you can live with.