Ramblings of an aging IT geek
← Ramblings of an aging IT geek
gamedev

reliable does not mean what you think in unreal

Marking Unreal RPCs Reliable feels safe but it is a footgun that fills the channel and stalls everything if you flood it.

A game development screen with networking code

I spent a chunk of this week being humbled by Unreal's networking, specifically by the word Reliable. It looks like a promise. It reads like one. "Make this RPC reliable and it will definitely arrive." And it will! That is exactly the problem.

Here is the setup. An RPC in Unreal is a function call that crosses the wire, declared with a specifier so the engine knows which way it travels:

UFUNCTION(Server, Reliable)
void ServerFireWeapon(FVector_NetQuantize Location, FVector_NetQuantizeNormal Direction);

Server means the client asks the server to run it. Reliable means the engine guarantees delivery and ordering, retransmitting until it gets through. Lovely. Use it for the things that must happen exactly once: spawning, scoring, a door opening. The footgun is what happens when you use it for things that happen often.

The reliable buffer is not infinite

Reliable RPCs go into a per-connection ordered buffer. If a packet is lost, everything queued behind it waits, because ordering is part of the guarantee. Now imagine you mark a high-frequency RPC reliable, something firing many times a second per actor, and you have a few dozen actors. You are not sending messages any more. You are filling a queue faster than the wire can drain it under any packet loss at all.

When that buffer saturates, Unreal does not silently drop the overflow. It closes the connection. The player gets booted with a vague disconnect, and your logs, if you're watching, will show the reliable buffer overflowing. The first time it happened to me I assumed a server crash. It was the opposite: the server was working exactly as designed, refusing to break the reliability promise I had foolishly demanded.

Tracing the RPC channel back through the netcode

The rule I should have started with

The fix is conceptual, not a clever flag. Ask of every RPC: does this need to arrive, or does it need to arrive now?

  • State that must be correct eventually: reliable. Score changes, inventory, the authoritative outcome of an action.
  • State where only the latest value matters: unreliable, and ideally not an RPC at all. Movement, aim direction, anything you sample continuously.

For the continuous stuff, replicated properties with DOREPLIFETIME are usually the right tool, because property replication is built to send the newest value and drop stale ones rather than queue them all. An unreliable RPC that gets lost is gone, and for a position update fifty milliseconds out of date, gone is the correct outcome. You'll get a fresher one next tick.

So my high-frequency calls became unreliable, the genuinely important events stayed reliable, and the continuous state moved to property replication where it belonged. The disconnects stopped. The lesson sits oddly in the brain: in netcode, "reliable" is not the safe default. It is a strong constraint you are asking the engine to honour at any cost, and the cost, if you ask too often, is the connection itself.