A box that does almost nothing was sitting at 30% CPU. No traffic to speak of, a couple of cron jobs, otherwise idle. top showed the load but pointed at nothing useful: no single process owned it, and most of the time was in %sy, system, not user. That's the tell. When the kernel is busy and userspace isn't, top runs out of road.
So perf top, which samples the CPU and shows you the hottest functions live, kernel symbols included:
perf top -g
Straight away the top line was a kernel symbol, something deep in the network stack, churning. Not an application function at all. The machine wasn't busy doing work, it was busy doing the same bit of housekeeping over and over.
It was a misconfigured monitoring agent hammering a netlink socket in a tight loop, asking the kernel for interface stats hundreds of times a second. Userspace cost almost nothing, so it hid; the kernel side added up to a third of a core. Rate-limited the agent, the symbol dropped off the list, CPU went back to flat.
The lesson I keep relearning: when CPU is high but no process looks guilty, check user versus system time first, then point perf top at it. It finds in a minute what staring at top won't find in an hour.