the flamegraph that pointed at the last function I'd have guessed

A flamegraph-style performance graph

A service was burning more CPU than its workload justified, and the whole team had a theory. The theory was the database. It's always the database, until you measure it, and then about half the time it isn't. This was one of the half-times.

Rather than argue from intuition I grabbed a CPU profile and turned it into a flamegraph, which is the single best tool I know for the question "where is the time actually going". The width of each box is how much CPU is spent in that function and everything it calls. You read it across, not down: the widest towers are where the work is. You don't need to understand the call stack, you just look for the fat boxes and ask why they're fat.

go tool pprof -seconds 30 http://localhost:6060/debug/pprof/profile
# then in the interactive prompt:
(pprof) web

The database calls were there, a respectable tower as expected. But the widest thing on the graph, wider than the query handling, was a stack I had to read three times to believe. It bottomed out in time formatting. The CPU was being spent turning timestamps into strings.

A close-up of source code

The culprit was a logging line on the hot path. Every request logged a debug message, and the message included a formatted timestamp built with the reflection-heavy formatting verbs. time.Format with a layout string is not free, and fmt.Sprintf with a %v over a struct is considerably less free, and we were doing both, per request, thousands of times a second, to produce a debug line that in production nobody ever read because the log level was set above debug. We were paying full price to format a message and then throwing it away.

The fix was almost embarrassing. The log call already sat behind a level, but the arguments were evaluated before the level check decided to discard them. Go evaluates function arguments eagerly, so log.Debug(fmt.Sprintf(...)) does all the Sprintf work whether or not Debug ends up writing anything. The formatting happened first, the discarding happened second. Moving to a logger that only formats once it's decided the line will actually be emitted, and passing structured fields rather than a pre-built string, took that whole tower off the graph.

// before: Sprintf runs every time, even when discarded
log.Debug(fmt.Sprintf("handling request for user %v at %s",
    user, time.Now().Format(time.RFC3339)))

// after: nothing formats unless debug is actually enabled
log.Debug("handling request",
    zap.Stringer("user", user),
    zap.Time("at", time.Now()))

CPU usage dropped by about a third, on a change that touched one logging call and zero lines of the database code everyone had been ready to blame.

The lesson isn't about logging specifically, it's about the gap between where you think the time goes and where it goes. The database was the obvious suspect because it's the obvious suspect for everything, and it was completely innocent. The guilty party was a debug line we couldn't even see in production. I'd never have found it by reading the code, because nothing about that line looks expensive. The flamegraph didn't have a theory. It just showed me the widest box, and the widest box was the last place I'd have looked.