two years of rust in production, the honest ledger

A monitor showing Rust source code

Two years ago we put a Rust service into production. Not a toy, not a side project, a thing that takes real traffic and that people are paged for when it falls over. It has not fallen over much, which is most of the point, but the road here had enough potholes that I want to write down the honest ledger before I forget the bad bits and remember only the smug bits.

The short version: I would do it again, for this service. I would not do it again for the next three things we built, and we didn't, which I think is the actual mark of having learned something rather than just acquired a hammer.

What it replaced, and why

The service is a request-path component that sits in front of a few backends, normalises some data, enforces some limits, and has to do all of that under a tight latency budget. The previous version was in a garbage-collected language and it was fine, genuinely fine, right up until the tail. The p99 was acceptable; the p999 was a horror show, and the horror was almost entirely the GC deciding to do its housekeeping at the worst possible moment. We spent more engineering hours tuning the collector than I'd like to admit, and every tune was a local minimum that the next traffic pattern would knock us out of.

Rust's pitch here is not "it's fast", although it is. The pitch is "the latency you measure on Tuesday is the latency you get on Friday". No collector deciding your fate. That predictability is the single thing that has paid for the whole exercise.

A flame graph style profiling view on a screen

The wins, plainly

The tail latency is flat and boring. That's the headline and it's earned. Our p999 went from "embarrassing" to "indistinguishable from p99", and it has stayed there across two years of traffic growth without a single GC-tuning session, because there is no GC to tune.

Memory is predictable too. The process sits where it sits. We've had no creeping RSS, no mysterious overnight growth that turns out to be a cache nobody bounded. When something does allocate more than expected, it's because we wrote code that allocates, and the fix is in our code, not in a runtime flag.

And the one I undersold to myself going in: the compiler genuinely catches the class of bug that used to wake people up. The ownership rules are a pain in the neck until they aren't, and then one day you realise you have not shipped a data race in two years, not because you're careful, but because the code that would have a data race did not compile. That's not nothing. That's a whole category of 3am pages that simply stopped happening.

The error handling deserves a mention. Result everywhere, ? to propagate, and a habit of being explicit about what can fail. It's verbose. It's also why, when something does go wrong in production, the error tells you what and where instead of a stack trace that bottoms out in a framework you've never read.

The regrets, also plainly

Compile times. I'm not going to pretend otherwise. A clean build of the whole workspace is long enough to make tea, and for a while that genuinely hurt the inner loop. We've clawed a lot of it back, splitting the workspace into crates so a change touches less, leaning on cargo check instead of full builds during development, and being disciplined about which dependencies we pull in, because every fat dependency is compile time you pay forever. But I won't tell you it's solved. It's managed.

The dependency-as-compile-cost thing surprised me. In a GC language a dependency is mostly a runtime concern. In Rust a heavy dependency tree is something you feel on every single build, and the temptation to add a crate for a small convenience is one you have to actively resist. We now treat a new dependency as a small decision with a real cost, which is healthier, but it took a while to internalise.

Hiring and ramp-up is real. Rust people exist, but the people who already know your domain mostly don't know Rust, and the people who know Rust mostly don't know your domain. Either way someone is learning, and the borrow checker is a steep first month. We got through it by pairing newcomers on existing code before letting them design new code, and by being kind about the fact that fighting the borrow checker is a rite of passage, not a character flaw. Budget for the ramp. It's weeks, not days.

Two developers reviewing code on a laptop

The ecosystem is good and getting better, but it is younger than the alternatives, and you will occasionally hit a library that's at 0.x, lovely, and maintained by one heroic person who has just had a baby. Async in particular has matured a lot since we started, but it was a sharp edge early on, and the split between async runtimes is still a thing you have to make a decision about rather than something that's just handled.

Where I'd draw the line

Here's the bit that took me two years to be honest about. Rust earned its place in the one service where latency predictability and a tight resource budget were the actual problem. For the boring internal services, the CRUD over a database that nobody pages anyone for, reaching for Rust would have been showing off. The compile-time tax and the ramp-up cost buy you nothing when the bottleneck is the database and the latency budget is "a human is waiting, so under a second is fine".

So we didn't. The next few services went into the boring, productive, garbage-collected language the rest of the team already knows, and they were shipped in a fraction of the time, and they are fine, and nobody's tail latency is keeping me up at night because nobody cares about their tail latency.

That's the win I'm proudest of, oddly. Not the Rust service running flat for two years, though it is, but the discipline to use it exactly once and then put the hammer down. The technology was never the hard part. Knowing which problem it was actually for was.

If you're weighing this up: pick the one service where the runtime's behaviour is the problem, commit to it properly, budget for the compile times and the ramp, and resist using it for everything else just because the first one went well. Two years in, that's the whole lesson, and I think it's worth more than any benchmark.