Ramblings of an aging IT geek
← Ramblings of an aging IT geek
golang

where the time actually went under load

A Go service got slow under load and the profiler showed the cost was garbage, not logic, in a place I would never have guessed.

A code editor showing a Go service

A service that was fine in testing fell over under real traffic, and I did the thing everyone does first: I guessed. I was sure it was the database. It was not the database. It almost never is.

pprof is built in and there is no excuse not to use it. I exposed the handlers, threw load at the service, and grabbed a CPU profile:

import _ "net/http/pprof"

go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }()
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

The flame graph was blunt about it. A third of CPU time was in the garbage collector, and the allocations leading there came from a hot path that built a fresh byte buffer per request and threw it away. Not the logic, the litter.

The fix was a sync.Pool for the buffers and pre-sizing a slice I'd been letting grow one append at a time. P99 latency dropped by more than half, and the GC pauses I'd been blaming on "Go being Go" turned out to be my own allocations coming home. The lesson, again, is to measure before you have opinions. The profiler had the answer in thirty seconds; my intuition had been wrong for two days.