watching syscalls without a debugger in sight

A server graph showing latency spikes

A box was doing far more disk I/O than it had any business doing, and iotop only told me the obvious bit: a Python process was writing a lot. It did not tell me what it was writing, or why, or how often, or which files. The usual answer is to attach strace and drown in output. I have done that more times than I would like to admit, and it always slows the process to a crawl and changes the behaviour you were trying to observe.

So I finally sat down with eBPF properly, rather than reading about it and nodding.

what it actually gives you

The pitch is simple once it clicks. You write a tiny program, it gets verified for safety, and the kernel runs it at a hook point: a kprobe, a tracepoint, a function entry. No module to compile, no reboot, no realistic way to panic the box. You attach, you collect, you detach. The overhead is small enough that you can run it on something that matters.

The two front ends I reached for were bcc and bpftrace. bcc has a pile of ready-made tools in /usr/share/bcc/tools, and most days that is all you need.

A terminal showing bpftrace output

finding the actual writer

opensnoop answered the question in about four seconds:

opensnoop-bpfcc -p 24117

It listed every open() the process made, with the path. Turned out a logging library had been configured to rotate on every write, so it was opening, writing one line, closing, and reopening a fresh file. Thousands of times a minute. Nothing was broken in the sense that it produced wrong output, it was just doing an enormous amount of pointless work to produce the right one.

For the I/O size question, biolatency drew a histogram of block latency, and biosnoop gave me a per-request trace. None of it needed me to recompile anything or stop the service.

the bpftrace one-liner I keep now

When I just want a count of syscalls by name for a PID, this lives in my notes:

bpftrace -e 'tracepoint:raw_syscalls:sys_enter /pid == 24117/ { @[args->id] = count(); }'

It is not pretty, and the syscall id rather than name is a faff, but it tells you instantly whether a process is I/O bound, doing endless futex calls, or spinning on something daft.

The thing that surprised me is how quickly you go from "I wonder what the kernel is doing" to an actual answer. For years that question meant either guesswork or a debug build. Now it is a one-liner and a histogram. The fix here was a single config line, but I would never have found it by staring at top.