Ramblings of an aging IT geek
← Ramblings of an aging IT geek
golang

pprof, a load test, and the allocation i couldn't see

A short note on profiling a Go service under real load with pprof, and why the bottleneck was an allocation in a hot path that no amount of staring at the code revealed.

A code editor showing Go source under a profiler

A Go service was spending more CPU than it had any right to under load, and I had a theory about which function was to blame. My theory was wrong, as theories about performance almost always are. So I stopped guessing and turned on the profiler.

The trick that matters is profiling under realistic load, not an idle service or a synthetic microbenchmark. I pointed a load generator at it to hold a steady stream of requests, then pulled a CPU profile while it was actually sweating:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

With net/http/pprof imported for its side effect, that endpoint is just there. Thirty seconds of profile, then top in the pprof shell to see where the CPU actually went, and list on the offending function to see it line by line.

It was not the function I suspected. The CPU was going to garbage collection, and the GC pressure was coming from a hot path that built up a []byte by repeated append and allocated a fresh buffer on every single request. Thousands of small allocations a second, all short-lived, all feeding the collector. Invisible by reading the code, glaringly obvious in the profile.

The fix was a sync.Pool to reuse the buffers, and the allocation graph flattened out immediately. The wider lesson is the boring one I keep relearning: do not optimise the function you think is slow. Profile under load, read the graph, and fix the thing the profile actually points at. The profiler is right and you are usually not.