trait objects vs generics, a real benchmark

Rust source code on a dark editor theme

Every Rust thread on dispatch eventually splits into two camps. One side says dyn Trait is fine and you're micro-optimising. The other side says generics are zero-cost and you should never reach for a vtable. Both are right sometimes, which is the least useful answer, so I wrote a benchmark to find out where the line actually sits.

The question is narrow on purpose. Given a trait with one method, how much does it cost to call that method through a &dyn Trait (dynamic dispatch, one vtable lookup per call) versus a generic <T: Trait> (monomorphised, the call inlined at compile time)? I used Criterion, a recent stable rustc, --release, and a method just meaty enough that the compiler can't fold the whole thing into a constant.

trait Shape {
    fn area(&self) -> f64;
}

fn sum_dyn(shapes: &[Box<dyn Shape>]) -> f64 {
    shapes.iter().map(|s| s.area()).sum()
}

fn sum_generic<T: Shape>(shapes: &[T]) -> f64 {
    shapes.iter().map(|s| s.area()).sum()
}

A flame-coloured abstract of running code

What the numbers said

Over a slice of 10,000 elements, called in a tight loop, the generic version came out ahead, but by less than I expected and far less than the forum confidence implied. On this machine the gap was in the low single-digit percentages. The vtable lookup is one predictable indirect branch, the branch predictor learns it immediately when the slice is homogeneous, and the cost more or less vanishes into the noise of actually doing the work.

The interesting result was what happens to that gap. It widens sharply when:

the method body is trivial, because then the dispatch overhead is a large fraction of the total. If area() is one multiply, the vtable lookup matters. If it does real work, it doesn't.
the compiler could have inlined across the call to do something cleverer, like vectorise the loop. Dynamic dispatch is an optimisation barrier, and losing autovectorisation costs far more than the lookup itself ever could.

So the honest summary is that the vtable is rarely the thing that hurts. The lost inlining is. When dyn is slow it's usually because it stopped the optimiser seeing through the call, not because the indirect branch was expensive.

What I'll actually do

Reach for generics in hot inner loops over trivial methods, especially where vectorisation is on the table. Reach for dyn everywhere else, which is most code, because the binary is smaller, compiles faster, and the difference is unmeasurable in anything doing real work. The phrase "zero-cost abstraction" is doing some heavy lifting here: monomorphisation moves the cost to compile time and binary size, it doesn't delete it. Pick the trade-off you can afford, and if you genuinely can't tell which, you're optimising the wrong function.