the flame graph that pointed at a function i'd forgotten existed

A performance graph on a monitor

A service that used to be comfortable had started running hot. Not falling over, just sitting at a CPU level that did not match the work it was doing, and creeping up week on week as traffic grew. The kind of slow rot that is easy to ignore until the autoscaler is spinning up twice the boxes it should and the bill notices for you.

I had suspects. Everyone has suspects. Mine was the serialisation layer, because it is always the serialisation layer, and when it is not, it is the database. I was wrong on both counts, and a flame graph told me so in about five minutes.

reading the graph properly

If you have not used one, a flame graph is a picture of where the CPU actually spends its time, stacked by call hierarchy. The width of each box is how much time was spent in that function and everything it called. Wide boxes are where your time goes. You do not read it top to bottom looking for deep stacks, you scan left to right for the widest bars at any level and ask why they are so wide.

I collected a CPU profile while the service was under real production-shaped load, folded the stacks, and rendered the graph. The serialisation code was there, sure, a respectable but unremarkable sliver. The database calls were narrow, which made sense because they were mostly waiting on the network rather than burning CPU, and waiting does not show up on a CPU graph.

Source code on a screen

The widest bar, by a country mile, was a helper I had genuinely forgotten was there. A little function that normalised some user-supplied strings: lowercasing, stripping accents, collapsing whitespace, the usual hygiene before a comparison. Innocent looking. The problem was that it compiled a regular expression on every call, and it was called several times per request, and the request rate had grown into the thousands per second. We were compiling the same pattern over and over, hundreds of thousands of times a second, and throwing it away each time.

the fix was embarrassingly small

// before: compiled on every call
func normalise(s string) string {
    re := regexp.MustCompile(`\s+`)
    return re.ReplaceAllString(strings.ToLower(s), " ")
}

// after: compiled once, at startup
var wsRe = regexp.MustCompile(`\s+`)

func normalise(s string) string {
    return wsRe.ReplaceAllString(strings.ToLower(s), " ")
}

Hoist the compilation out to a package-level variable so it happens once, at startup, instead of on every call. That is the entire change. The hot bar on the flame graph shrank to almost nothing, and the steady-state CPU on the service dropped by a meaningful chunk, enough that the autoscaler relaxed and the box count came back down.

A close-up of code on a dark editor

what I took from it

The thing I keep relearning is that the expensive code is rarely the code you are thinking about. I was staring at the parts of the system I knew were complicated, because complicated feels like where the cost should be. The actual cost was hiding in a function so dull I had stopped seeing it, doing something cheap an enormous number of times. Cheap times enormous is expensive, and intuition is terrible at that multiplication.

The flame graph does not care what you think is slow. It just draws the truth, wide bars and all, and lets you stop guessing. Five minutes of profiling beat an afternoon of staring at the code I assumed was guilty. I should know this by now. I will forget it again before the next incident, and the flame graph will be there to remind me.