Ramblings of an aging IT geek
← Ramblings of an aging IT geek
performance

the flamegraph that pointed at the wrong thing, which was the right thing

A CPU profile of a slow service spent most of its time in a place I had never thought to look, and fixing it took ten minutes.

A flamegraph rendered on a server dashboard

I had a service that was burning more CPU than it had any right to. Nothing dramatic, just steadily expensive: a request handler that should have been trivial sitting at forty milliseconds when I expected five. The logs told me nothing. The timing breakdowns I had bolted on by hand told me even less, because I had instrumented the parts I already suspected and learned only that they were fine.

So I gave up guessing and reached for a flamegraph. If you have not used Brendan Gregg's scripts, the shape is simple: sample the stack many times a second, fold the samples, render them as nested rectangles. The width of a box is the proportion of time spent in that function and everything it called. You read it across, not down. Wide is bad.

perf record -F 99 -p $(pgrep myservice) -g -- sleep 30
perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg

The wide box was not in my code at all. It was in JSON deserialisation, and underneath it, dominating everything, was reflection. We were decoding the same response shape on every request and the decoder was working it all out from scratch each time. I had never suspected it because in my head deserialising a small struct is free. It is not free. It is free per byte, and we were doing it ten thousand times a minute.

A flamegraph SVG open in a browser, one box far wider than the rest

The fix was to cache the decoder, or in the language I was using, to stop reaching for the convenient generic call and use the path that did the type analysis once. Ten minutes of work. The handler dropped from forty milliseconds to nine, and the CPU graph for the whole box stepped down by about a third.

What stays with me is not the win, it is how confidently I had looked in the wrong place first. I instrumented my own logic because that is the part I understand and feel responsible for. The hot path was in a library I trusted so completely I had stopped seeing it. A flamegraph does not care what you trust. It just shows you where the time went, and the time went somewhere I would never have typed into a stopwatch.

I now profile before I theorise. It feels like cheating, in the good way. Half an hour with perf beats a day of educated guessing, and it is a lot less smug about it afterwards.