Ramblings of an aging IT geek
← Ramblings of an aging IT geek
rust

i rewrote a service in async rust and learned what coloured functions cost

Porting a thread-per-connection Rust service to async/await on Tokio, what got better, what got harder, and where I would not bother again.

A screen full of Rust source code

When async/await stabilised in Rust 1.39 back in November 2019, I read the announcement, nodded, and did nothing about it for ages. The service I had in mind was a small network daemon doing thread-per-connection with a bounded pool, and it was fine. "Fine" is the great enemy of rewrites, and rightly so. But the connection count crept up, the thread pool started to feel like the wrong shape, and last month I finally did the port to Tokio properly. This is what I actually found, now that the dust has settled and the futures crate churn of the early days is long behind us.

The short version: async Rust delivered exactly the resource win I expected, made the easy parts easy, and made about three specific things much harder than the synchronous version ever was. I would do it again for this service. I would not do it reflexively for the next one.

what the port actually looked like

The old code was the honest, boring shape. A loop accepting connections, and for each one a thread off the pool running a blocking read/process/write cycle. Roughly:

for stream in listener.incoming() {
    let stream = stream?;
    pool.execute(move || handle_connection(stream));
}

Each connection cost a real OS thread for its lifetime, with a real stack, parked in a blocking read most of the time waiting for bytes that came in bursts. With a few hundred mostly-idle connections that is a lot of memory doing nothing but waiting.

The async version replaces the thread per connection with a task per connection, multiplexed over a small runtime thread pool:

let listener = TcpListener::bind(addr).await?;
loop {
    let (stream, _) = listener.accept().await?;
    tokio::spawn(handle_connection(stream));
}

A tokio::spawned task is cheap. It is a state machine the compiler built out of my async fn, parked as a future when it hits an .await that is not ready, and woken when the I/O is. Idle connections cost almost nothing because there is no thread sitting in a blocking syscall, just a suspended future holding its small slice of state. That is the whole pitch, and on this workload the pitch is real. Memory under a few thousand idle connections dropped to a fraction of the thread-per-connection figure, and tail latency under bursty load got noticeably steadier because I was no longer at the mercy of a fixed pool size.

A diagram-like wall of code

the function colour tax

Here is the part the cheerful blog posts skim over. async colours your functions. An async fn can only be .awaited from inside another async context, which means the asynchrony is not a local decision. It propagates up through every caller until it reaches the runtime. You do not port a function. You port a calling tree.

Concretely, the moment I made the connection handler async, everything it touched wanted to be async too, or I had to make a conscious decision to keep it blocking and isolate it. A synchronous database call in the middle of an async task is not just ugly, it is a correctness problem, because a blocking call inside an async task hogs a runtime worker thread and quietly starves every other task scheduled on it. The whole runtime can stall on one badly-behaved std::fs::read.

The fix is real but it is friction. CPU-bound or genuinely blocking work goes through spawn_blocking, which hands it to a separate pool so it cannot jam the async workers:

let result = tokio::task::spawn_blocking(move || {
    expensive_synchronous_thing(input)
}).await?;

That works, but now I am thinking about which pool a piece of work lives on, which is exactly the kind of bookkeeping the synchronous version never asked of me. The thread-per-connection model was wasteful, but it was wasteful in a way that never made me reason about starvation.

the error messages got worse before they got better

The other tax is on the compiler conversations. When a future is not Send and you try to spawn it across threads, the error points at the spawn, but the cause is some non-Send thing held across an .await deep inside, often an Rc or a MutexGuard from the wrong mutex. The classic trap is holding a std::sync::Mutex guard across an await point. The compiler is right to stop you, because that guard is not meant to survive a yield, but the message can send you a fair distance from the actual line. The answer is usually tokio::sync::Mutex when a lock genuinely needs to be held across an await, and being disciplined about dropping guards before you yield otherwise. I got fluent in this eventually. I was not fluent on day one, and the learning curve is steeper than synchronous Rust's already-notable one.

what I would keep and what I would not

A few honest conclusions, having lived with it for a few weeks now.

  • For an I/O-bound service holding lots of mostly-idle connections, async earned its place. The resource profile is genuinely better and the code, once ported, reads cleanly.
  • For a CPU-bound batch tool or a simple request/response thing with modest concurrency, I would not bother. Threads are easy to reason about, the borrow checker is friendlier without the future state-machine layer, and you skip the colour tax entirely.
  • Pick a runtime and commit. I went with Tokio because its ecosystem is where the libraries are, and fighting the runtime question on top of the async question would have doubled the pain.
  • Keep blocking work explicit and isolated. The single biggest source of mysterious async misbehaviour is a sneaky blocking call on a runtime thread.

The marketing line is that async/await makes concurrent Rust feel like synchronous code. That is half true. The syntax does. The mental model does not, and the gap between those two is where the evenings go. I am glad I did the rewrite, the numbers justify it, and I will reach for async again with my eyes open rather than because it is the modern-sounding choice. Some of my services are I/O-bound and full of idle waiting. Plenty are not, and for those, a thread and a blocking call remain a perfectly respectable answer in 2022.