ebpf, and finally seeing what the kernel sees

A latency graph on a server monitor

We had a service that was slow in a way that didn't show up anywhere. CPU fine, memory fine, the application's own metrics insisting everything was healthy. The gap between "the app thinks it's fast" and "the box feels slow" is exactly where eBPF earns its keep, so I finally sat down with it properly.

The short version: eBPF lets you run small, verified programs inside the kernel, attached to tracepoints, kprobes and the like, and ship the results back to userspace cheaply. The "cheaply" is the bit that matters. You can do a lot of this with strace, but strace stops the world and changes the timing you're trying to measure. eBPF watches from the inside without that tax.

bpftrace for the quick questions

For one-off questions, bpftrace is the tool I reach for now. It's awk for the kernel. Want to know the distribution of read sizes hitting a disk?

bpftrace -e 'tracepoint:syscalls:sys_enter_read { @bytes = hist(args->count); }'

Ctrl-C it and you get an ASCII histogram. No recompiling, no kernel module, no rebooting. The first time I ran something like that on a live production box and got a real answer in ten seconds, I sat back in my chair.

A trace program on screen

bcc when the question grows up

When the one-liner isn't enough, bcc gives you the toolkit. biolatency, execsnoop, tcpconnect, funclatency, dozens of them, mostly Brendan Gregg-shaped and battle-tested. For our slow service the winner was runqlat, which showed scheduler run-queue latency climbing under load. The app wasn't slow. It was waiting to be scheduled. That's a completely different fix, and nothing in the application's own dashboards would ever have told us.

the honest caveats

It isn't free of friction. You want a reasonably recent kernel; the older the box, the more features simply aren't there, and some of the nicer bpftrace builtins only landed in the last year or so. The bcc tools pull in LLVM and compile on the fly, which is a chunky dependency to put on a lean production host. And the verifier will reject programs for reasons that aren't always obvious until you've been bitten a few times.

But the trade is worth it. The mental shift is the real prize here. For years "what is the kernel actually doing right now" was a question I answered with inference and educated guesses. Now I can mostly just ask, on the live system, and get told. That changes how you debug. You stop theorising and start measuring, because measuring finally got cheap.