A box that should have been idle was sitting at a steady quarter of a core. Nothing scheduled, no traffic, no cron job due for hours, and yet top showed a persistent low hum of CPU that would not go away. Not enough to alert on. Just enough to nag at me every time I looked.
The first instinct is to go hunting in userspace. Which process? But top was unhelpful: the usage was spread thinly, mostly in the system column, nothing obviously to blame. When the work is in the kernel and not in any one process, you need to look at where the CPU actually is, instruction by instruction. That is what perf top is for.
sudo perf top -g
It samples the running system and shows you, live, the functions burning cycles right now, sorted by cost, with call graphs if you ask. Within about thirty seconds the answer was sat at the top of the list.
The hot function was a high-resolution timer callback, firing far more often than anything on an idle box had any business doing. The call graph traced it back to a userspace daemon that had been configured with an absurdly tight poll interval, a fraction of a millisecond, almost certainly a typo where someone meant milliseconds and wrote microseconds. So the daemon was asking the kernel to wake it thousands of times a second to check whether anything had happened. It never had. But the asking was not free, and a tight timer means the CPU never gets to drop into a deep idle state, which is why the cost was so evenly smeared.
the fix and the lesson
The fix was a one-character edit to a config value and a service restart. The quarter-core vanished and the box went properly idle, the kind of idle where the cores actually sleep.
# before: poll every 100 microseconds (almost certainly a typo)
poll_interval = 100us
# after: poll every 100 milliseconds, which is what was meant
poll_interval = 100ms
Two things stuck with me. First, perf top is the tool to reach for the moment the cost is in the kernel or smeared across the system rather than parked in one fat userspace process. top tells you a process is busy; perf top tells you which function, which is the question you actually need answered.
Second, a tight poll loop is invisible until you go looking. It does not crash anything, it does not trip a threshold, it just quietly burns power and keeps your CPU from ever resting. Across a fleet, that is real money and real heat. The cheapest performance win is often not making something faster, it is finding the thing that is busy for no reason and telling it to stop.