Ramblings of an aging IT geek
← Ramblings of an aging IT geek
gamedev

reliable doesn't mean what you think in unreal

Why a Reliable Multicast RPC in Unreal flooded the network and dropped, and how the reliable channel actually behaves under load.

A monitor showing a game engine viewport

I spent an evening convinced Unreal's replication was broken. It wasn't. I just didn't understand what Reliable promised, and the word did a lot of quiet damage.

The setup was simple. A weapon fired, and I wanted every client to see the muzzle flash and play the sound. So I marked the RPC Reliable and NetMulticast, because of course I want it to be reliable, who wouldn't. Under light testing it was perfect. Under a stress test with a few players spraying full-auto, flashes started arriving late, in bursts, and occasionally the server logged that the reliable buffer had overflowed and it was closing the connection. Closing the connection! For a muzzle flash.

Here is the bit that catches everyone. Unreal has a fixed-size reliable buffer per connection, RELIABLE_BUFFER, 256 entries by default. Reliable RPCs are queued in order and must be acknowledged. If you generate them faster than they can be drained, the buffer fills, and rather than silently drop a reliable message (which would violate the reliability contract), the engine considers the connection broken and kicks it. So "reliable" doesn't mean "best effort, just try harder". It means "guaranteed in order, or the connection dies trying".

// What I had. Looks innocent.
UFUNCTION(NetMulticast, Reliable)
void Multicast_Fire();

// What it should have been.
UFUNCTION(NetMulticast, Unreliable)
void Multicast_Fire();

The flash is a cosmetic, fire-and-forget event. If one of forty rounds doesn't render its flash, nobody notices, and nobody should pay for it with a guaranteed, acknowledged, buffered delivery. Cosmetic, high-frequency events want Unreliable. Reliable is for things where missing one breaks game state: a death, a round-end, picking up an objective. Things that happen a handful of times, not sixty times a second.

A close-up of replication code on screen

There's a second footgun layered on top. NetMulticast reliable RPCs are particularly nasty because the cost scales with the number of connections. One client firing turns into N reliable messages, one per other client, each consuming a buffer slot on its connection. So the failure is non-linear: fine at two players, ugly at eight, connection-killing at sixteen.

The general rule I settled on: reliability is a budget, not a default. Spend it on state that must converge, and let the eye-candy ride unreliable. If you genuinely need a cosmetic event to be slightly more robust, the better tool is replicating a property and reacting to its change with RepNotify, which gives you eventual consistency without queuing a message per occurrence. Late joiners get the current value for free, which a one-shot multicast never would have given them anyway.

I now treat Reliable on a multicast the way I treat sudo: occasionally exactly what I want, and worth a second of thought every single time.