A service that had no business being slow was sitting at a steady, unbothered 80% CPU under perfectly ordinary load. Nothing in the obvious places explained it. The database was bored, the requests were small, the response times were merely "annoying" rather than alarming. So I did the thing I should always do first and reached for Brendan Gregg's flamegraph tooling instead of staring at the code and guessing.
The recipe hasn't changed in years and still feels like a magic trick:
perf record -F 99 -p $(pgrep myservice) -g -- sleep 30
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > out.svg
Ninety-nine hertz so you don't accidentally lock-step with a timer, thirty seconds of real traffic, and out the other end comes an SVG you can actually click around in. The width of each bar is time on CPU. Wide is bad. You're hunting for a plateau that's wider than it has any right to be.
And there it was. Not the request handler, not serialisation, not the database driver. A fat, embarrassing plateau sat squarely in our logging path. We were building a structured log line for every single request at debug level, formatting it, allocating the string, and then throwing it away because the configured level was info. The work to decide "we don't want this" cost more than the work we actually wanted.
The fix was a one-liner: guard the expensive formatting behind an if log.DebugEnabled() check so we don't render the message unless something is going to read it. CPU dropped from 80% to about 35% on the same traffic. No clever algorithm, no rewrite, just stopping the machine from carefully preparing a meal and binning it.
The lesson I keep relearning: I am a terrible judge of where my own code spends its time. The hot path is almost never where my intuition points, and thirty seconds of perf beats an afternoon of confident hypotheses. Profile first. I'll have forgotten this again by the next time, no doubt.