A service was spending more CPU than it had any right to, and the usual eyeballing got me nowhere. So I did what I should have done first: profiled it under real load rather than guessing.
The trick people skip is that an idle service profiles like an idle service. You have to catch it busy. I left net/http/pprof mounted, threw load at it with a quick vegeta run, and grabbed a thirty-second CPU profile mid-storm:
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30
The flame graph put a third of the time inside JSON encoding, which I half expected. The surprise was the allocation profile. I'd convinced myself a particular hot path was allocation-free, and -alloc_objects cheerfully proved me a liar: a fmt.Sprintf in a logging helper, called on every request, building a string that was usually thrown away by the log level filter milliseconds later.
Moving that formatting behind the level check, and swapping one Sprintf for plain concatenation, dropped allocations on the path to nearly nothing and took a visible bite out of GC time.
Nothing clever, just measurement beating intuition again. The profile under load is the bit that matters; everything before that is hoping.