poking at io_uring for an afternoon

A terminal glowing in a dark room

I finally sat down with io_uring this afternoon, having nodded along to talks about it for years without writing a line. The pitch is well known: a shared ring buffer between your process and the kernel, so you batch submissions and reap completions without a syscall per operation. For an I/O-bound loop that should be a real win.

I rewrote a daft little file-copy benchmark, read a chunk, write a chunk, repeat, to submit reads and writes through the ring instead of blocking. The throughput improvement was there but modest on my hardware; the interesting part was the syscall count. strace -c went from a dense wall of read/write to a handful of io_uring_enter calls. That is the whole point, and seeing it drop out of the trace made it click in a way the diagrams never did.

The ergonomics are sharper than the old async APIs, but it is still low-level plumbing: you manage the submission queue, the completion queue, and the lifetime of every buffer you hand the kernel, by hand. Get a buffer freed while the kernel still owns it and you have a use-after-free with extra steps. I used liburing rather than driving the rings raw, and even then I'd want a wrapper crate or library before anything touched production.

Worth the afternoon. Not worth reaching for unless you actually have an I/O bottleneck and the syscall overhead is a measurable chunk of it. Most of the time it isn't, and a thread pool is fine.