A batch job that should have finished in seconds was taking the better part of two minutes, and I had a theory. I always have a theory. The job parses a pile of records, does some lookups, and writes a report, so obviously the slow bit was the lookups, which hit a database. Obviously.
I was wrong, and the flamegraph told me so in about thirty seconds. If you have never used one, the pitch is simple: you sample the stack a few thousand times a second while the program runs, then draw each stack as a stack of boxes where the width is how often that frame was on the CPU. Wide boxes are where the time goes. You don't read it top to bottom, you scan it left to right looking for anything suspiciously fat.
I generated it the usual way, recording with perf and rendering with Brendan Gregg's scripts:
perf record -F 997 -g -- ./batch-job --input records.jsonl
perf script | stackcollapse-perf.pl | flamegraph.pl > out.svg
The database lookups were there, a modest little plateau, exactly as wide as you'd expect for something that's mostly waiting on I/O rather than burning CPU. The fat box, the one that ate nearly forty percent of the on-CPU time, was a timestamp formatter. A helper that turned an epoch into a human-readable string, called once per record, and apparently allocating and tearing down a fresh format object every single time it was called.
I'd ruled it out precisely because it looked too trivial to matter. That's the trap. A cheap operation done a few million times is not cheap, and your intuition about "trivial" is calibrated for the single call, not the loop. The fix was embarrassingly small: build the formatter once outside the loop and reuse it. The job dropped from a hundred-odd seconds to under fifteen, and the database lookups I'd been so sure about turned out to be perfectly fine.
The lesson isn't "timestamp formatting is slow", because in isolation it isn't. The lesson is the one I keep relearning: I am a poor judge of where my own code spends its time, and a flamegraph is cheaper than my guesses. The whole exercise, from "this is slow" to "this is fixed", took less time than I'd already wasted staring at the database query plan. Measure first. I'll remember next time. I won't, but I'll mean to.