Ramblings of an aging IT geek
← Ramblings of an aging IT geek
performance

the flamegraph found the thing I'd sworn was fine

A perf flamegraph showed nearly a fifth of CPU time going into a logging helper I'd never thought twice about.

A wide flamegraph rendered over a server graph

I had a service eating more CPU than it had any right to, and I was certain I knew where. The request parser, obviously. It always is. I'd spent half a day staring at that code, finding nothing, getting cross.

So I gave up guessing and did what I should have done first:

perf record -F 99 -g -p $(pidof myservice) -- sleep 30
perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg

The parser was a sliver. The fat tower in the middle was a logging helper that formatted a context string on every request, whether or not the line was ever emitted. Roughly 18% of CPU, building messages we threw away because the level was set to warn in production.

The fix was embarrassing. Make the formatting lazy so it only happens when the line will actually be logged. One commit, CPU down by about a sixth, and a small lesson I keep relearning: I am consistently wrong about where the time goes. The profiler isn't. Stop arguing with it.