I spent years answering "why is this slow" with strace, perf, and a lot of squinting. eBPF, and bpftrace in particular, has quietly changed how I do that. You write a tiny program, it runs in the kernel, and you get exactly the numbers you asked for with almost no overhead.
The question this week was simple: which files were taking the longest to read on a box that felt sluggish under load. One line of bpftrace and I had a latency histogram per syscall, live, on the production-shaped machine, without attaching a debugger or restarting anything.
bpftrace -e 'tracepoint:syscalls:sys_exit_read { @ = hist(args->ret); }'
The thing that sells me is the lack of guilt. No instrumentation to leave behind, no sampling bias to argue about, no recompiling. You ask the kernel a question and the kernel answers. After a decade of inferring behaviour from the outside, that still feels slightly magic.