i benchmarked dyn vs generics so you can stop arguing about it

Source code filling a monitor

A few days ago I posted a quick benchmark showing that the speed gap between dyn Trait and generics depends on how much work the method does, and that for any non-trivial method body it mostly vanishes. That's true, but it's also the easy half of the question. Raw call overhead is the bit everyone argues about and the bit that matters least. The interesting costs are the ones nobody mentions: cache behaviour, binary size, and what your compile times look like after you've monomorphised the same generic forty different ways. So I went back and measured those too.

the easy axis, recapped

To recap the bit I've already covered: a dyn call goes through a vtable, one pointer dereference the compiler can't inline through, whereas a generic gets monomorphised into concrete code that inlines and optimises freely. With a trivial area() method in a tight loop, generics ran two to three times faster because the compiler vectorised the loop. Make the method do real work and the gap fell to single digits. That part is settled. The dispatch cost is real but only dominates when there's nothing else going on.

trait Shape {
    fn area(&self) -> f64;
}

// monomorphised: one specialised copy per concrete type
fn sum_generic<S: Shape>(shapes: &[S]) -> f64 {
    shapes.iter().map(|s| s.area()).sum()
}

// dynamic: one function, vtable dispatch per element
fn sum_dyn(shapes: &[Box<dyn Shape>]) -> f64 {
    shapes.iter().map(|s| s.area()).sum()
}

the axis people forget: cache and heterogeneity

Here's where the simple benchmark lies to you. The generic version is fast partly because every element is the same concrete type, sitting in a contiguous Vec<S>, perfectly laid out for the prefetcher. That's a best case the benchmark accidentally rigged.

The moment you have a genuinely heterogeneous collection, a circle, a rectangle, a triangle, all in one list, you cannot have a Vec<S> at all. You need Vec<Box<dyn Shape>> or an enum. And Vec<Box<dyn Shape>> means every element is a pointer to a heap allocation that could be anywhere. Now you're paying for pointer chasing and cache misses on every iteration, and that cost dwarfs the vtable lookup it's so often blamed for.

Code on a dark editor window

I measured this and the heap indirection cost roughly four times more than the dispatch cost on my data set. So if you want the trait-object pattern to be fast with heterogeneous data, the lever to pull is not "avoid dyn", it's "stop scattering the objects across the heap". An enum keeps everything inline and contiguous, no boxing, no pointer chase, and dispatches via a jump table that the branch predictor handles well. In my benchmark the enum approach beat both the boxed-dyn version and, on heterogeneous data, came within a whisker of the monomorphised one while actually being able to hold mixed types. The trade is that you have to know all your types up front, which is exactly what dyn exists to avoid.

binary size and compile time

Monomorphisation is not free, it's just that you pay somewhere other than runtime. Every distinct type you instantiate a generic with produces a fresh copy of the function in your binary. For a small generic over three types, who cares. For a generic that's instantiated dozens of ways across a large codebase, the binary bloats and, more painfully, your compile times climb, because the compiler is doing all that codegen and optimisation per instantiation.

I built a small synthetic crate with one generic function instantiated over an increasing number of types and watched both numbers. Binary size grew roughly linearly with the count of instantiations, as you'd expect. Compile time grew faster than linearly once the optimiser had enough copies to chew on. The dyn version, by contrast, compiles one copy of the function and stays flat no matter how many types implement the trait. On a large workspace this is not academic. Trait objects can be the difference between a fifteen-second incremental build and a forty-second one, and over a working day that adds up to real human time.

so what do you actually use

After all of that, here's where I've landed, and it's less exciting than the benchmark suggests it should be.

Reach for generics when the hot path is genuinely hot, the method is small, the type is fixed at the call site, and you've got a profiler pointing at the call. That's the case where monomorphisation earns its keep, and it's narrower than people assume.

Reach for an enum when you have a known, closed set of types and you care about cache layout, because it gives you contiguous storage and predictable dispatch without boxing. This is my default for the "list of mixed but known things" shape, and it's the option the original argument never even mentions.

Reach for dyn Trait when the set of types is open or large, when compile time matters, or when the flexibility is simply worth more than the dispatch cost, which is most of the time in code that isn't a tight numeric kernel. The vtable cost is real and it is also, on any realistic method, the smallest number in this whole exercise.

The thing the original argument gets wrong is treating this as a performance question with one answer. It's three separate trade-offs, runtime dispatch, memory layout, and build cost, and they don't all point the same way. The people insisting dyn is slow are optimising the one axis that matters least and ignoring the heap indirection that's actually costing them. Measure your own workload, because the only general rule I'm confident in is that the obvious rule is wrong about half the time.