the async rewrite i swore i wouldn't do

Code on a screen

I had a service that worked. It scraped a handful of endpoints on a timer, fanned out, collected results, wrote them somewhere. A thread pool, a channel, a Mutex or two. It had been running for the better part of a year without me looking at it, which is the highest praise I can give any piece of software. So naturally I rewrote it in async.

The honest reason is that the threaded version started to creak the moment I wanted real concurrency. I was polling maybe two hundred endpoints, and spinning two hundred OS threads to sit blocked on a socket each is the sort of thing you can get away with until you can't. Memory crept. Context switching was doing more work than the actual HTTP. I could have reached for a bounded pool and a work queue, and that would have been the sensible answer. But async/await has been stable in Rust for a good while now, the ecosystem has settled, and I wanted to feel it under load rather than read about it.

what the rewrite actually looked like

The shape of the code barely moved. That surprised me. The threaded version had a function that took a URL and returned a Result, and the async version has a function that takes a URL and returns a Result, except it's async fn and the calls inside it are awaited. Most of the day was mechanical: reqwest::blocking became reqwest, the std::thread::spawn calls became tokio::spawn, and the channel went from std::sync::mpsc to tokio::sync::mpsc.

async fn poll(client: &Client, url: &str) -> Result<Sample, Error> {
    let resp = client.get(url).send().await?;
    let body = resp.text().await?;
    Ok(parse(&body)?)
}

The fan-out is where async earns its keep. Instead of a pool, I collect the futures and let the runtime drive them:

let results = stream::iter(urls)
    .map(|url| poll(&client, &url))
    .buffer_unordered(32)
    .collect::<Vec<_>>()
    .await;

buffer_unordered is the whole trick. It runs up to thirty-two of those futures at once, on a small number of threads, and hands me results as they finish. No thread per endpoint. No tuning a pool size that's really a guess about how many sockets the kernel will tolerate. Two hundred endpoints now sit on the multi-threaded Tokio runtime with a worker thread per core, and the machine stops noticing the load entirely.

Programming diagram

where it bit me

The borrow checker does not stop being the borrow checker when you add .await. It gets opinions about lifetimes that span an await point, because anything held across an await has to survive being parked and resumed on possibly a different thread. The classic version of this is holding a MutexGuard across an await: the compiler is right to refuse, because your guard is now !Send and you're trying to send the future. I hit it within the first hour, swore at it, then realised it had just caught a deadlock I'd have shipped. Hold the lock, do the cheap thing, drop the lock, then await. The fix is almost always to narrow the scope, which is also what you should have done anyway.

The other tax is the error messages. When a future doesn't implement Send because something three layers down isn't Send, the compiler tells you, eventually, but you read the message bottom-up and squint. Rc sneaks in, a RefCell sneaks in, some trait object isn't marked, and suddenly tokio::spawn won't take your future. The cause is real and the fix is usually small. Finding it is the slow part.

And the colour problem is real, the one everyone warns you about. Async functions can only be awaited from async contexts, so the asyncness propagates up the call graph until it hits main. You don't get to dip into async for one function and come back out cleanly. Either a thing is async or it isn't, and converting the leaf means converting the branch above it, and so on. For a small service this was a day. For something large it would be a project, and I'd think hard before starting.

was it worth it

For this service, yes, plainly. The memory footprint dropped by more than half, the latency under fan-out improved because I'm no longer thrashing threads, and the code reads about as well as it did before. The conceptual cost is front-loaded: once you've internalised that an .await is a yield point and that everything held across it has constraints, the model is consistent and you stop fighting it.

What I'd say to anyone weighing it up: don't rewrite working threaded code for the aesthetics. If your concurrency is a dozen threads doing blocking work, threads are fine, and they're easier to reason about and debug. Async pays off when you're juggling hundreds or thousands of mostly-idle tasks waiting on IO, which is exactly the workload it was built for. I had that workload. I just hadn't admitted it until the memory graph made the case for me.

The service has been running on the async version for a week now. I haven't looked at it since Tuesday. Back to the highest praise.