Ramblings of an aging IT geek
← Ramblings of an aging IT geek
debugging

a map with no exit, and the eviction i should have written first

Another pass at the unbounded-map leak, this time on what a good cache abstraction would have prevented and how I now bound them by default.

A terminal showing a memory profile

I've already written about the leak itself, the map that only ever gained keys until the service got OOM-killed. I keep coming back to it because the bug was trivial and the habit behind it wasn't. The interesting question isn't "why did the map grow". Maps grow when you put things in them. The interesting question is why I let myself write an unbounded one in the first place, and why I'd do it again next month if I'm not careful.

The shape of the mistake

The code looked entirely reasonable. A map[string]result, a write on the way in, a read on the way out. Nothing in Go shouts at you for that. The compiler is delighted. The tests pass, because tests don't run for three days. The leak only exists in the dimension that unit tests almost never cover: time and volume.

var cache = map[string]result{}

func handle(id string, r result) {
    cache[id] = r   // grows forever
}

That's the whole bug. There is no line missing that the compiler could have demanded. The missing thing is a decision: when does an entry leave? And because nothing forced me to answer it, I didn't.

A heap usage graph climbing in a straight line

Making the right thing the easy thing

The proper fix wasn't the TTL I bolted on in a panic. It was deleting the raw map entirely and putting a real cache type behind it, one that won't let you construct it without a bound. An LRU with a max size, or a TTL map with a sweep, take your pick, but the constructor demands a limit and there's no public method that grows the thing without limit. You physically cannot make the mistake I made, because the type won't let you.

That's the move I trust more than discipline. I am not going to remember, every time, for the rest of my career, to add eviction. I've just proven I won't. But I will reliably reach for a type that already exists, and if that type bounds itself, then my forgetfulness is harmless. The bug becomes unrepresentable rather than merely discouraged.

So the rule now is small and boring: no bare map as a cache. If it's a cache, it goes behind something with a size limit baked into its constructor. The leak cost me a week of restarts and a slightly bruised ego. Cheap, in the end, for a habit I should have had years ago.