Ramblings of an aging IT geek
← Ramblings of an aging IT geek
rust

async/await landed and i rewrote everything

Porting a futures-and-combinators Rust service to async/await, and what the rewrite actually bought me.

A close-up of source code on a dark terminal

I had a service that ingested webhook events, fanned them out to a handful of downstream APIs, and wrote the results to Postgres. It was written in the old style: futures 0.1, tokio-core, and a forest of .and_then().map_err().and_then() chains that I could no longer read without a coffee and a clear hour. It worked. It had worked for two years. And every time I went back to add a feature I lost an afternoon re-deriving how the error types flowed through the combinators.

So when I finally sat down with the modern toolchain and rewrote the whole thing in async/await, the headline is dull and true: it is just nicer. Not faster, not dramatically smaller, just dramatically more legible. That turns out to be the thing that mattered.

what the old code looked like

Here is a representative lump, lightly anonymised. This is the kind of thing the whole codebase was made of.

fn handle_event(&self, ev: Event) -> Box<Future<Item = (), Error = AppError>> {
    let db = self.db.clone();
    let client = self.client.clone();
    Box::new(
        validate(ev)
            .and_then(move |valid| {
                client
                    .enrich(valid)
                    .map_err(AppError::Upstream)
                    .and_then(move |enriched| {
                        db.insert(enriched)
                            .map_err(AppError::Db)
                            .map(|_| ())
                    })
            }),
    )
}

The Box<Future<...>> everywhere was the tell. Every function that touched the runtime allocated a trait object because we couldn't name the concrete future type. The move closures formed a staircase. And the error mapping had to be threaded by hand at each junction, which meant any new branch was an opportunity to map the wrong variant and only find out at runtime.

what it looks like now

The same logic, rewritten:

async fn handle_event(&self, ev: Event) -> Result<(), AppError> {
    let valid = validate(ev).await?;
    let enriched = self.client.enrich(valid).await.map_err(AppError::Upstream)?;
    self.db.insert(enriched).await.map_err(AppError::Db)?;
    Ok(())
}

That is the entire pitch. The ? operator does the error threading. The await points read top to bottom. There is no Box, no clone dance to satisfy the borrow checker across closure boundaries, no staircase. A new branch is just another line. Six months from now I will be able to read this without the coffee.

A diagram-style screenshot of refactored code

the parts that bit

It was not all a victory lap. A few things cost me real time.

The biggest was Send bounds. The old combinator code spawned everything onto a multi-threaded runtime and nobody had thought hard about it. The moment I held a non-Send type across an .await, the compiler told me, at length, exactly which line and exactly which type was the problem. That is the good kind of pain: the error was real, it had probably been latent in the old code, and the borrow checker just refused to let me pretend otherwise. I ended up scoping a couple of MutexGuards into their own blocks so they dropped before the await point.

The second was that combinators and async blocks compose differently, and a lot of my utility functions returned impl Future in ways that were awkward to call from async code. I rewrote them as async fn and the awkwardness evaporated. There is a temptation to keep a "bridge" layer that translates between the two worlds. Resist it. The bridge is where the bugs live. I converted the whole thing in one go rather than leaving a seam down the middle.

The third was cancellation. In an async world a future that is dropped is simply cancelled, and a couple of my flows had implicit cleanup that assumed they would always run to completion. They mostly did, in practice, because nothing was dropping them mid-flight. But "mostly" is how you end up writing an incident report. I moved the cleanup into explicit Drop impls and a scopeguard-style helper, so it runs regardless of how the future ends.

was it worth it

Yes, but be honest about why. The rewrite did not make the service faster in any way a user would notice; the work it does is dominated by network and database latency, and async was never going to change that. It did not reduce the binary size meaningfully. It did not fix any open bug.

What it bought me was that the next change is cheap. The code now says what it does. When the on-call engineer (me) gets paged at 3am, the stack of awaits reads like a description of the flow rather than a puzzle to be reassembled. I added two new downstream integrations the week after the rewrite and each took an hour, where the old code would have taken an afternoon of combinator archaeology each.

If you have a futures 0.1 codebase still ticking along, you do not need to panic. It works. But the next time it fights you, that is your signal. Do the conversion in one decisive pass, lean on the compiler's Send complaints rather than fighting them, and make your cleanup explicit. The result is code you can hand to someone else, or to yourself in a year, without an apology.