Ramblings of an aging IT geek
← Ramblings of an aging IT geek
performance

perf top on a box that had no business being busy

An idle server sitting at noticeable CPU load, traced with perf top to a runaway logging loop burning cycles on string formatting.

A CPU utilisation graph climbing on an idle server

A box that should have been bored stiff was sitting at a steady twenty percent CPU. No traffic, no scheduled jobs running, nothing in the dashboards to explain it. top showed the usual suspects all behaving, the load just smeared thinly across everything, which is the least helpful kind of busy.

So, perf top. If you've never reached for it, it samples what the CPU is actually executing right now and ranks the hottest functions. Within a couple of seconds the answer was glowing at the top of the list: the program was spending its life in string formatting and memory allocation deep inside a logging library. Not doing work. Describing work it wasn't doing.

The cause was dull and entirely my fault. A debug log line had been left in a tight retry loop, and the log level on this host was set low enough that it was actually emitting. Every iteration formatted a chatty message, allocated, wrote it, and went round again, thousands of times a second. The message was being rendered even when nobody was reading it. Multiply that by an over-eager retry and you get a fifth of a CPU vanishing into a log nobody asked for.

The fix was one line moved behind a level check and the retry given a sensible backoff. Load dropped to near zero immediately. The wider lesson is that perf top is the fastest way to answer "what is this thing actually doing", and the answer is far more often "formatting a string" than anyone expects.