io_uring, first impressions from the bleeding edge

A terminal on a Linux box

io_uring is the most interesting thing to land in the Linux I/O stack in years, and I've spent the past week or so poking at it on a 5.1 kernel. These are first impressions, not a verdict. But the impressions are good.

The problem it solves is old. Asynchronous I/O on Linux has been a bit of a sad story. The POSIX AIO interface only ever really worked for direct I/O on files, and even then it was awkward. Everyone who wanted true async ended up either with thread pools pretending to be async, or epoll plus non-blocking sockets, which works for network I/O but does nothing for files. io_uring is Jens Axboe's answer, and it covers both.

two rings

The model is a pair of shared ring buffers between your process and the kernel: a submission queue and a completion queue. You write requests into the submission ring, tell the kernel they're there, and pick up results from the completion ring. The clever part is that, set up correctly, you can submit and reap batches of operations without a syscall per operation. At high request rates the syscall overhead was the cost, so removing it matters.

You can drive it directly, but liburing wraps the setup so you're not hand-rolling memory barriers. The shape of a loop is roughly:

struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_read(sqe, fd, buf, len, offset);
io_uring_submit(&ring);

struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe);
/* cqe->res holds the result */
io_uring_cqe_seen(&ring, cqe);

That's the whole shape. Submit, wait, mark seen. It reads like a normal API rather than a kernel interface you have to apologise for.

A storage server rack

where it earned its keep

I threw a synthetic buffered-read workload at it, the kind that previously meant a pile of worker threads each blocking on pread. With io_uring it's a single thread filling the ring and draining completions, and the numbers were comfortably better, mostly from not paying for context switches between all those threads.

A couple of honest caveats. It's new, and it's moving fast: the feature set on 5.1 is the floor, and a lot of the more exciting operations are arriving in later kernels, so what you can do depends heavily on which kernel you're on. The security surface is large and still settling. And the programming model, whilst clean once it clicks, is genuinely different from anything you've written before, so expect the first attempt to deadlock on an empty completion queue at least once. Mine did.

But this is the right design. For the first time, async file and network I/O on Linux share one coherent, fast interface, and you can feel the thought that went into it. I'll be watching the next few kernel releases closely.