Ramblings of an aging IT geek
← Ramblings of an aging IT geek
rust

dynamic dispatch isn't as slow as you've been told

Benchmarking trait objects against generic monomorphisation in Rust to see whether the vtable cost actually shows up in practice.

A screen full of source code

Every Rust discussion about dyn Trait versus generics eventually produces someone saying trait objects are slow because of the vtable indirection, and someone else saying it doesn't matter, and neither of them showing a number. I got tired of the argument and wrote a benchmark.

The claim under test: a Box<dyn Shape> call goes through a vtable pointer, an extra indirection the compiler can't inline through, whereas a generic impl Shape gets monomorphised into concrete code the compiler can inline and optimise freely. All true. The question is whether that difference is measurable on real work, or whether it vanishes into the noise the moment your method does anything more interesting than return a constant.

I set up a Shape trait with an area method and a handful of implementors, then summed the areas over a large vector, once via Vec<Box<dyn Shape>> and once via a generic over a single concrete type. Criterion for the measuring, because eyeballing Instant::now() is how you get nonsense.

trait Shape {
    fn area(&self) -> f64;
}

fn sum_dyn(shapes: &[Box<dyn Shape>]) -> f64 {
    shapes.iter().map(|s| s.area()).sum()
}

fn sum_generic<S: Shape>(shapes: &[S]) -> f64 {
    shapes.iter().map(|s| s.area()).sum()
}

Lines of code on a dark editor

The result, when area is a trivial multiply, is that the generic version is meaningfully faster: roughly two to three times, because the compiler inlines the whole thing and vectorises the loop, which it simply cannot do across a vtable boundary. So the "trait objects are slow" crowd are right, in this specific, contrived case.

Then I made area do real work, a few transcendental functions, the kind of thing a method actually does in a program that exists, and the gap collapsed to single-digit percent. The vtable lookup is one pointer dereference. When the method body is one multiply, that dereference dominates and the difference is dramatic. When the method body is fifty instructions, the dereference is rounding error and you'd struggle to measure it without a stopwatch the size of a building.

So the honest answer is: it depends entirely on the ratio of dispatch cost to work done. If you are calling a trivial method in a tight loop millions of times, generics win and the win is real. If your method does anything substantial, reach for whichever gives you the cleaner code, because the performance difference has gone to live with the noise. I default to trait objects for flexibility now and only reach for generics when a profiler tells me a hot dispatch is costing me something. Which, so far, it has done exactly once.