watching syscalls without a debugger holding the door open

A network graph overlaid on a rack of servers

The thing that finally sold me on eBPF was a process that was slow for no reason I could see. CPU was idle. The flame graph from our usual profiler was a flat wall of epoll_wait. The logs said nothing. It was just... slow, in that maddening way where every individual operation looks fine and the aggregate is a disaster.

The old reflex here is strace. And strace works, right up until it doesn't, because it stops the world on every syscall and trampolines through ptrace. Point it at something doing tens of thousands of syscalls a second and you've changed the experiment. You're no longer measuring the program, you're measuring the program plus a tax collector standing at every door.

eBPF moves the observation into the kernel. The probe runs in kernel space, on the syscall path, and only ships you the data you asked for. No context-switch storm, no halting. You can leave it running on a production box and mostly not notice it.

The tool I reach for first is from bcc or bpftrace, depending on how lazy I'm feeling. For "which syscalls is PID 4172 spending its time in", a one-liner does it:

bpftrace -e 'tracepoint:raw_syscalls:sys_enter /pid == 4172/ { @[args.id] = count(); }'

That gave me a histogram of syscall IDs, and the answer was openat. Thousands and thousands of openat. The service was opening the same config file on every request because someone had "simplified" a caching layer out of existence three weeks earlier. Not a kernel problem at all, but the kernel was the only honest witness left.

A terminal showing eBPF probe output

What I like about this, beyond the specific win, is the shift in posture. Most observability you bolt on ahead of time: you instrument, you add metrics, you hope you guessed the right ones. eBPF lets you ask a new question of a running system you didn't prepare, and get an answer in seconds, without a redeploy and without a maintenance window.

There are sharp edges. The verifier will reject programs it can't prove safe, and its error messages are not what you'd call welcoming. Kernel version matters more than you'd like, and some of the nicer tooling assumes a fairly recent one. But the floor is high. execsnoop, opensnoop, tcpconnect, biolatency: these are off-the-shelf, they answer real questions, and you can run them without writing a line of C.

If you've been avoiding it because it sounds like kernel hacking, it mostly isn't, not for the everyday stuff. Install bpftrace, run opensnoop against a misbehaving process, and watch the kernel tell you the truth your application was too polite to mention.