A service of mine was using more CPU than it had any right to. Nothing was on fire, the latency was fine, but the box sat at a baseline that didn't match the work it was doing, and that itch eventually got loud enough that I sat down to profile it properly. I was completely wrong about where the time was going, which is the usual outcome, and the whole point of profiling rather than guessing.
My prediction, stated out loud so I could be embarrassed by it later, was that the cost was in request handling: parsing, a bit of JSON, the database round-trip. That's where the interesting code is, so that must be where the time is. Reader, it was not.
getting a profile out of it
The service is Go, so pprof is right there. I had the HTTP profiling endpoint already wired in:
import _ "net/http/pprof"
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
That _ import is doing all the work; it registers the profiling handlers on the default mux as a side effect. With that running, I pulled a thirty-second CPU profile while the service was under its normal load:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
A flat top listing in pprof is useful but easy to misread, because it shows you where the program counter sat without showing you who called whom. So I generated a flamegraph instead. Brendan Gregg's flamegraph scripts turn a folded stack trace into an SVG where width is time and the stack reads bottom to top, and once you've used one a plain profiler listing feels like reading a phone book.
the wide bit that shouldn't have been there
The request-handling tower in the graph was there, sensible, about as wide as I'd expect. But sitting next to it was another tower, nearly as wide, and it had nothing to do with handling requests. It was logging. Specifically, it was time formatting inside the logging path.
The service logged every request, which is reasonable, and every log line carried a timestamp, which is also reasonable. The unreasonable part was buried one level down. The timestamp was being formatted with Go's reference-time layout on every single line:
log.Printf("%s %s %d", t.Format("2006-01-02 15:04:05.000"), method, status)
time.Format with a layout string isn't free. It parses that layout, allocates, and does a fair amount of work to assemble the string, and I was paying that cost for every request, plus a handful of internal log lines per request. At the request rate this box handled, formatting timestamps had quietly become one of the most expensive things the program did. Not the database. Not the JSON. Printing the date.
That's the part I want to underline. There was no clever algorithm in the wrong place, no accidental quadratic loop. It was a perfectly ordinary, perfectly correct line of code that happened to run an enormous number of times, and volume turned something cheap into something expensive.
the fix, which was boring
I didn't need millisecond-precision wall-clock timestamps on every internal log line. Most of them were diagnostic chatter that nobody reads unless something has already gone wrong. So I did two unglamorous things.
First, I dropped the log level on the internal chatter so it stopped emitting in normal operation. Fewer lines, less formatting, done. Second, for the request log that genuinely needed a timestamp, I stopped reformatting an identical layout millions of times and leaned on the cheaper paths the standard library already offers when you're not hand-rolling the layout yourself. The width of that second tower in the after-graph dropped to a sliver.
The baseline CPU came down to where I'd expected it to be all along. No heroics, no rewrite, just deleting work that didn't need doing.
what I took from it
Profile before you optimise, every time, no exceptions. I have read this advice a hundred times and I still walked into this session with a confident, wrong theory about the hot path. The flamegraph didn't argue with me, it just showed me a wide tower where I'd promised myself there wouldn't be one.
Logging is code, and it runs at request volume. It's easy to treat a log line as free because it's not "the logic", but a Printf on the hot path runs as often as the hot path does, and formatting is not free. If you log on every request, your logging is part of your performance profile whether you've thought about it or not.
And the meta-lesson, the one I keep relearning: the expensive thing is rarely the clever thing. It's the boring thing you do a million times. Go looking for volume, not complexity.