Ramblings of an aging IT geek
← Ramblings of an aging IT geek
performance

eBPF, or finally seeing what the kernel sees

Using bcc and bpftrace to answer "what is this box actually doing" without strace overhead, and the latency histogram that found a slow disk.

A latency histogram on a monitoring dashboard

For years my answer to "what is this box actually doing" was some combination of strace, tcpdump, and squinting. strace works, but it's a sledgehammer: it stops the process on every syscall and the overhead is bad enough that the thing you're measuring changes shape under observation. The classic problem of asking a question loudly enough that it changes the answer.

eBPF is the first thing in a while that genuinely changed how I do this. You attach a small, verified program to a kernel hook, kprobe, tracepoint, whatever, it runs in the kernel, aggregates in a map, and hands you the summary. No copying every event to userspace. The overhead is low enough that I'll run it on a busy production box without flinching, which I would never do with strace.

the tools you actually use

There are two front ends worth knowing. bcc gives you a pile of ready-made tools with sensible names: execsnoop shows every process exec'd, opensnoop every file opened, biolatency block I/O latency as a histogram. The other is bpftrace, which is the awk of this world, a one-liner language for ad-hoc questions.

The first time it earned its keep, I had a box that felt sluggish under load with nothing obvious in the metrics. One command:

biolatency-bpfcc

A code editor showing a bpftrace one-liner

It printed a latency histogram of block I/O, and there was a clear second hump out at tens of milliseconds. The dashboards showed average latency, which was fine, because the average drowned the tail. The histogram showed a disk that was occasionally taking far too long to answer. It was a failing drive in a RAID set, degrading rather than dying, exactly the failure mode that never trips a simple threshold alert.

the part to be honest about

It is not free of sharp edges. The tooling assumes a recent-ish kernel, the bcc tools recompile against kernel headers at runtime which is slow and occasionally fragile, and the verifier will reject programs for reasons that take a while to understand. Portability across kernel versions is still a sore point; the BTF and CO-RE work that fixes it is early.

But the trade is worth it. The mental shift is that you stop sampling and start asking the kernel precise questions, then letting it aggregate the answer. biolatency, execsnoop, and a couple of bpftrace one-liners have replaced a lot of guesswork. I keep them in my back pocket now, and reach for strace far less often.