the day i could finally watch the kernel work

A performance graph on a server dashboard

I'd spent years debugging Linux performance from the outside, watching CPU graphs, prodding at strace, inferring what the kernel was up to from the shadows it cast on my metrics. eBPF is the first time I've felt like I could just open the door and look. This week I finally sat down properly with the bcc tools, and the experience was a bit like discovering a window in a room you thought was windowless.

The pitch, if you've not met it: eBPF lets you run small, sandboxed programs inside the kernel, attached to events like syscalls, function entry and exit, or tracepoints. The kernel verifies your program won't do anything daft (no unbounded loops, no wild pointers) and then runs it in-place, which means you can collect data right where the event happens rather than reconstructing it after the fact from the outside. The point isn't the cleverness. The point is that you stop guessing.

The friendliest way in is the bcc collection, a pile of ready-made tools that hide the machinery. execsnoop shows you every process being exec'd, system-wide, as it happens. The first time I ran it I caught a cron job spawning a small army of short-lived helpers I had no idea existed, none of which lived long enough for ps or top to ever show me. They were invisible to every tool I'd been using because they were gone before I could look.

Terminal output from a kernel tracing tool

The one that genuinely changed how I think is biolatency. It records block-device I/O latency and prints it as a histogram, in the kernel, with effectively no overhead. Instead of an average that lies to you, you get the actual distribution:

     usecs               : count     distribution
       128 -> 255         : 1043     |********                |
       256 -> 511         : 4192     |********************************|
       512 -> 1023        : 218      |*                       |
      8192 -> 16383       : 37       |                        |
     16384 -> 32767       : 12       |                        |

That long tail down at the bottom, the twelve requests taking sixteen milliseconds whilst the bulk take well under one, is exactly the kind of thing an average buries and a user notices. Those twelve are somebody's slow page load. Seeing the shape of the distribution rather than a single number is most of the value, and it's a thing I genuinely couldn't get cheaply before.

What I keep coming back to is the cost. The classic way to get this kind of detail was tracing tools that perturbed the thing you were measuring so much that the measurement became suspect. eBPF runs in the kernel, aggregates in the kernel, and only hands a summary back to userspace, so the overhead is low enough to run on a busy production box without flinching. That's the part that turns it from a lab curiosity into something I'd actually reach for during an incident.

I'm very much at the bottom of this hill. The bcc tools are training wheels, and the real power is in writing your own probes, with bpftrace as the gentler middle ground between canned tools and full C. But even just running other people's tools has changed my instinct. When something's slow now, my first thought isn't "what does the graph imply", it's "what is the kernel actually doing, and can I just go and watch it". The answer, more and more, is yes.