Ramblings of an aging IT geek
← Ramblings of an aging IT geek
debugging

the slow leak that was just a map nobody ever emptied

A short debugging note on a service whose memory crept up over days, which turned out to be a cache map that only ever had entries added and never removed.

A terminal showing a debugging session

A service had been quietly creeping up in memory for as long as anyone could remember, slow enough that the fix was always "restart it on the weekly deploy" and nobody questioned it. It would sit fine for a few days, then the graph would start its gentle climb, and a deploy would reset it before it became a problem. Classic boiled-frog ops: the workaround was cheap enough that the actual bug never got looked at. This week the deploys paused for a release freeze, the climb didn't reset, and suddenly it was my problem.

The shape was textbook slow leak. Heap up and to the right, correlated with uptime rather than load, no single spike. That correlation is the tell: it's not requests, it's time and accumulation. Something was being added and never removed. I grabbed a heap profile, expected an afternoon of spelunking, and found the answer almost immediately because one type dominated the whole heap.

It was a map. A cache, in spirit, keyed by a request identifier, that some long-ago version of me had added to memoise a lookup. Entries went in. Entries never came out. No eviction, no size cap, no TTL, nothing. Every unique request grew it by one and it kept the entry forever, on the apparently-reasonable assumption that the same id might come back. Most never did. The "cache" was a write-only structure that mistook hoarding for caching.

The fix was small and slightly humbling: bound the thing. I swapped it for an LRU with a sensible cap, so old entries get evicted once it's full, and the memory graph went flat as a lake. A cache without an eviction policy isn't a cache, it's a leak with good intentions, and I'd written exactly that and papered over it with weekly restarts for the best part of a year.

The lesson I keep relearning: any structure that only ever grows needs a reason it can't grow forever, written down where you create it. And a workaround that's cheap enough to ignore is the most dangerous kind, because it keeps the real bug out of sight until the day it can't.