Ramblings of an aging IT geek
← Ramblings of an aging IT geek
performance

ebpf, and finally seeing what the kernel sees

A short note on using bpftrace to answer a question about disk latency that no other tool could answer cleanly.

A latency graph on a server console

I spent years answering "why is this slow" with strace, perf, and a lot of squinting. eBPF, and bpftrace in particular, has quietly changed how I do that. You write a tiny program, it runs in the kernel, and you get exactly the numbers you asked for with almost no overhead.

The question this week was simple: which files were taking the longest to read on a box that felt sluggish under load. One line of bpftrace and I had a latency histogram per syscall, live, on the production-shaped machine, without attaching a debugger or restarting anything.

bpftrace -e 'tracepoint:syscalls:sys_exit_read { @ = hist(args->ret); }'

The thing that sells me is the lack of guilt. No instrumentation to leave behind, no sampling bias to argue about, no recompiling. You ask the kernel a question and the kernel answers. After a decade of inferring behaviour from the outside, that still feels slightly magic.