Ramblings of an aging IT geek
← Ramblings of an aging IT geek
debugging

the map that ate all the memory

A long-running Go service that slowly ate the box, traced back to a per-request map that nothing ever deleted from.

A terminal showing a process eating memory

The pager went off because RSS had crept past the cgroup limit and the kernel had started OOM-killing the service. Not a crash, not a spike, just a slow climb over about four days until it fell over. The classic shape: sawtooth on restart, then a steady ramp.

I'd been here before, so I reached for the usual. GODEBUG=gctrace=1 confirmed the GC was running fine and reclaiming nothing useful, which meant something was still reachable. A heap profile settled it in under a minute:

go tool pprof -inuse_space http://localhost:6060/debug/pprof/heap

The top node was a map[string]*session I'd added months earlier to cache state per connection. Adding to it was obvious and well tested. Removing from it happened in exactly one place, a defer that only ran on a clean shutdown of the connection. Anything that timed out or dropped never hit that path, so the entry just sat there, holding a pointer, forever.

The fix was three lines: delete on every exit, not just the tidy one. The lesson is older than that. Every m[k] = v in a long-lived map is a promise to call delete somewhere, and Go won't remind you. If a thing lives for the life of the process, treat it like manual memory, because that's what it is.