Ramblings of an aging IT geek
← Ramblings of an aging IT geek
rust

the great async rewrite, and what it actually cost me

I migrated a small ingest service from a hand-rolled thread pool to async Rust, and the win was real but smaller and stranger than I expected.

A screen of Rust source with async functions

The service was a little ingest daemon. It pulled events off a queue, did some validation, and fanned them out to three downstream HTTP endpoints. The original version used a fixed thread pool and a bounded channel, and it was fine. It had been fine for a year. So naturally I rewrote it in async.

The honest reason was that one of the downstreams started getting slow, and a slow response was tying up a whole worker thread doing nothing but waiting. I could have just thrown more threads at it. With sixteen workers and a downstream that occasionally took two seconds, the pool would drain and back-pressure would ripple all the way to the queue. Blocking a real OS thread on a socket read is exactly the thing async is supposed to fix, so the temptation was strong and, for once, justified.

what the rewrite actually looked like

The shape of the code barely changed. The validation stayed synchronous because it was CPU-bound and trivial. The fan-out became three reqwest calls under a tokio::join!, and the worker loop became a stream of tasks spawned onto the runtime. The line count went down slightly, which surprised me, because I'd braced for the usual async tax of wrapper types and lifetime arguments.

let (a, b, c) = tokio::join!(
    send(&client, &a_url, &payload),
    send(&client, &b_url, &payload),
    send(&client, &c_url, &payload),
);

That join! is the whole point. The slow downstream no longer blocks the other two, and it no longer blocks a thread. While it's waiting on the wire, the runtime gets on with something else. Under the same load that used to drain the thread pool, the async version sat at low single-digit thread counts and never broke a sweat.

A diagram of tasks fanning out to three endpoints

the parts nobody warns you about

The throughput win was real. The surprise was where the time went during the rewrite, and almost none of it was on the happy path.

The first cost was the runtime decision. You pick Tokio and then everything you depend on has to agree, or you end up bridging between async and blocking with spawn_blocking and feeling slightly grubby every time. My metrics library was synchronous, my config loader did blocking file I/O at startup, and that was fine, but it meant being deliberate about which world each bit of code lived in.

The second cost was cancellation. In the thread-pool version, a unit of work either finished or the process died. In async, a task can be dropped mid-flight when a future is cancelled, and "mid-flight" might mean after I'd sent the event downstream but before I'd acknowledged it on the queue. That's the kind of bug that doesn't show up in testing and does show up at 3am. I ended up being very explicit about ordering: acknowledge only after all three sends resolve, never before, and treat a cancelled task as a not-acknowledged event that'll be redelivered. Idempotency downstream did the rest.

The third cost was the error messages. A blown lifetime or a future that isn't Send produces a wall of text that takes real practice to read. I've written enough Rust now that I can usually spot "this captured a non-Send thing across an await point" from the shape of the complaint, but I remember when that paragraph may as well have been in Latin.

was it worth it

Yes, but for a narrower reason than the rewrite-everything instinct suggested. The win was specifically that I/O waiting stopped consuming threads, and that mattered because this service spent most of its life waiting on the network. If it had been CPU-bound, async would have bought me nothing but ceremony, and I'd have been better off with a thread pool sized to my cores and left well alone.

So the lesson isn't "async good, threads bad". It's that async pays off precisely when your bottleneck is waiting rather than computing, and you should be able to point at the wait before you reach for the runtime. I could point at mine. That's the only reason this rewrite was a good idea rather than a hobby.