Ramblings of an aging IT geek
← Ramblings of an aging IT geek
performance

bpftrace, and finally being able to ask the kernel a question

How eBPF and bpftrace let me trace a latency problem live on a production box without strace, without a debug build, and without taking anything down.

A performance graph on a server

There's a particular kind of frustrating problem where the metrics all look fine but something is clearly wrong. A service was occasionally taking far too long to respond, no obvious cause, nothing in the logs, CPU and memory unremarkable. The old playbook would be strace and hope, except strace on a busy production process slows it down enough that the problem you're chasing vanishes. Observer effect, the worst kind.

This is where eBPF has quietly changed everything, and bpftrace is the friendly front door to it.

The idea, very briefly: the kernel will let you load small, verified programs that run at chosen hook points, kernel functions, tracepoints, syscalls, and collect data with almost no overhead. The verifier guarantees your program can't loop forever or crash the kernel, so you can run this on production with a fairly clear conscience. No debug build, no restart, no recompile. You attach, you watch, you detach.

What I love is how little you have to write. To see which processes are spending time in disk reads and how long each takes, a one-liner gets you a histogram:

bpftrace -e 'tracepoint:syscalls:sys_exit_read /args->ret > 0/ { @bytes = hist(args->ret); }'

Code on a screen

For the actual problem, I wanted to know whether the slow responses correlated with the process blocking somewhere specific. A short script hooking the scheduler showed me off-CPU time per process, how long things were spending not running, waiting. And there it was: periodic multi-hundred-millisecond stalls waiting on a lock, in a code path I'd have sworn was never contended. The averages hid it completely because it happened to maybe one request in a few thousand. The histogram showed the long tail plainly.

The thing that still feels slightly magical is doing this live, on the box that's serving traffic, with no measurable hit. Brendan Gregg's been banging this drum for years and his collection of tools, the bcc and bpftrace ones, are an education in themselves. biolatency for block IO latency, execsnoop to watch every process that spawns, tcpconnect for outbound connections. Each one is a question you used to need a debugger and a maintenance window to answer, now reduced to a command you run for ten seconds.

It hasn't replaced everything. You still need to know roughly where to look, and reading kernel source to find the right function to hook is sometimes part of the job. But the gap between "something is slow" and "here is exactly where the time goes" has collapsed. For anyone doing ops or performance work and still reaching for strace first, it's well worth the afternoon it takes to get comfortable. I wish I'd learned it sooner.