Ramblings of an aging IT geek
← Ramblings of an aging IT geek
gamedev

rpcs in unreal, and the reliable/unreliable footgun

A walk through Unreal's RPC model in C++, why marking calls reliable everywhere will quietly wreck your bandwidth, and how I got bitten by ordering assumptions.

A game development workspace with code on screen

The bug looked like rubber-banding. A player would shoot, the projectile would spawn on their machine, and then half a second later it would vanish and respawn somewhere slightly different. Classic desync, and I spent an embarrassing afternoon staring at movement replication before realising the actual problem was the way I'd wired up my RPCs. So this is the post I wish I'd read first: how remote procedure calls work in Unreal's C++, and the specific way they'll hurt you if you don't think about reliability.

the three flavours

Unreal RPCs come in three shapes, declared with UFUNCTION specifiers:

UFUNCTION(Server, Reliable, WithValidation)
void Server_Fire(FVector_NetQuantize Origin, FVector_NetQuantizeNormal Direction);

UFUNCTION(Client, Reliable)
void Client_PlayHitConfirm();

UFUNCTION(NetMulticast, Unreliable)
void Multicast_SpawnImpactFX(FVector_NetQuantize Location);

Server runs on the server, called from an owning client. Client runs on a specific client, called from the server. NetMulticast runs on the server and every connected client. That much is in every tutorial. What the tutorials skim over is the Reliable / Unreliable decision, and that's the bit that bites.

A reliable RPC is guaranteed to arrive and to arrive in order. Unreal will retransmit it until the receiver acknowledges it. That sounds like exactly what you want, so the obvious move (the move I made) is to mark everything reliable and stop worrying. Don't do that.

Source code on a dark editor showing RPC declarations

why reliable-everywhere is a footgun

Reliable RPCs share a queue, and that queue is finite. If you flood it (say, a multicast FX call on every projectile impact in a busy firefight) you can overflow it. When the reliable buffer overflows, Unreal doesn't drop the message. It disconnects the client. You read that correctly: the "safe" option, used carelessly, kicks people off the server. The first time I saw a packet log full of Closing connection. Reason: RELIABLE BUFFER OVERFLOW I genuinely thought it was a netcode bug in the engine rather than a bug in me.

The mental model that fixed it: reliable is for things that change state and must not be lost. Unreliable is for things that are either sent frequently or are purely cosmetic, where losing one is fine because another is along in a moment.

So:

  • Firing a weapon, spending ammo, applying damage: reliable. Losing it desyncs the simulation.
  • Impact sparks, muzzle flash, footstep sounds: unreliable. If one impact effect goes missing in a 32-player scrum, nobody notices, and you've saved yourself a guaranteed retransmit.
  • Movement: handled by the engine's own replication, which is unreliable by design and reconciled, not by your RPCs.

ordering, and the assumption that got me

The respawning projectile turned out to be an ordering problem dressed up as a reliability one. I had the fire input doing two things: a reliable Server_Fire to spawn the authoritative projectile, and, separately, an unreliable multicast for the muzzle effect. Fine. But I'd also added a reliable Client_ confirm that, in a fit of tidying, ended up spawning a visual projectile on the firing client before the server's replicated one arrived. Two projectiles, one cosmetic and early, one authoritative and slightly later, in slightly different places because of latency. The eye reads that as a teleport.

The fix was conceptual, not clever. Decide who owns the truth. The server owns the projectile's existence and trajectory. The client may show a predicted tracer immediately for feel, but it must reconcile to the replicated actor when it appears, not spawn a competing one. Once the predicted visual was tied to the same actor that the server replicated down (rather than a parallel throwaway), the rubber-banding went away.

A couple of things worth keeping pinned to the wall:

Validation matters. WithValidation forces you to write a _Validate function that returns a bool. Return false and the connection is closed for cheating. This is not optional politeness; it's where you check that the client isn't asking the server to fire a weapon it doesn't have, or from a location it can't be in. The server trusts nothing.

NetQuantize types are free wins. FVector_NetQuantize and friends compress position data on the wire with sensible precision. For anything spatial in an RPC, use them rather than raw FVector, and you'll shave real bytes off every call. Multiply that by the call rate of a multicast in a firefight and it adds up.

And measure before you tune. Stat Net and the network profiler tell you which RPCs are actually expensive. I'd assumed my movement code was the bandwidth hog. It wasn't. It was a chatty reliable RPC I'd left in from a debugging session, firing every tick.

The summary I'd give past-me: reliable is not "better", it's "more expensive and load-bearing". Spend it where losing a message corrupts the game's truth. Everywhere else, send it unreliable, send it often, and let the next one cover for the one that got dropped. The network is lossy. Build as though you expected that, because you should have.