Ramblings of an aging IT geek
← Ramblings of an aging IT geek
gamedev

rpcs in unreal, and the reliability footgun

How a Reliable RPC fired thousands of times a second quietly saturated a connection in an Unreal Engine prototype, and what actually went wrong.

A game development screen with code and a 3D viewport

The bug looked like lag. Players in the prototype would feel a rubber-band hitch every few seconds, worse the more of them joined. It wasn't the network, the bandwidth graphs were calm. It was a single RPC, marked Reliable, firing far more often than I'd intended, and Unreal's reliable channel doing exactly what I'd told it to.

Lead with the lesson, because I wish someone had told me bluntly: in Unreal, Reliable does not mean "important". It means "the engine will resend this until it's acknowledged, in order, and everything queued behind it waits". A reliable RPC you call every tick is not a slightly slower unreliable one. It's a queue you can overflow, and when you overflow it the connection closes.

the three RPC types, briefly

Unreal gives you three flavours, declared on a UFUNCTION:

UFUNCTION(Server, Reliable, WithValidation)
void ServerFireWeapon(FVector_NetQuantize Origin, FVector_NetQuantize Direction);

UFUNCTION(NetMulticast, Unreliable)
void MulticastSpawnEffect(FVector Location);

UFUNCTION(Client, Reliable)
void ClientNotifyScore(int32 NewScore);

Server runs on the server when called from an owning client. Client runs on the owning client. NetMulticast runs on the server and every connected client. Reliable RPCs are guaranteed and ordered. Unreliable ones are fire-and-forget, dropped freely under pressure, and that is a feature.

The footgun is that Reliable is so easy to type, and it always works in testing. On the loopback connection you use whilst developing, nothing is ever dropped and nothing ever backs up. The reliable buffer only bites when there's real latency and real volume, which is to say in front of actual players.

A close-up of C++ source code on screen

what actually happened

My weapon code called a reliable server RPC on every input tick whilst the trigger was held. Automatic fire, 600 rounds per minute on paper, but the input poll ran at frame rate. So at 120fps with the trigger down I was queuing 120 reliable RPCs a second, each one ordered behind the last, each one waiting for an ack across a connection with real round-trip time.

Unreal has a configurable cap on the reliable buffer, RELIABLE_BUFFER, and when you exceed it the engine doesn't silently drop the message. It closes the connection with a log line that, if you're lucky, you spot amongst everything else:

LogNet: Warning: Closing connection. Reliable buffer overflow.

That was the rubber-banding. Not lag at all. The connection was being torn down and re-established, and the visible hitch was the client recovering.

the fix, and the principle

The fix was boring, which is how you know it's right. Hold state, not events. Instead of telling the server "fire" on every tick, the client sends "I am holding the trigger" once when it changes, as a reliable RPC, and the server runs the firing loop on its own authoritative timer. State transitions are reliable. The high-frequency stuff, muzzle flashes and tracer effects, goes out as unreliable multicasts, because if one cosmetic effect drops nobody will ever notice.

// client: send once, on change
UFUNCTION(Server, Reliable)
void ServerSetFiring(bool bWantsToFire);

// server: own the cadence
void ARifle::ServerTick(float Dt)
{
    if (bFiring && TimeSinceLastShot >= FireInterval)
    {
        FireOneRound();
        MulticastSpawnEffect(MuzzleLocation()); // unreliable
        TimeSinceLastShot = 0.f;
    }
}

The rule I now apply without thinking: reliable RPCs are for things that change rarely and must arrive. Doors opening, score changes, a round starting. Anything you'd send more than a handful of times a second should be unreliable, or better, replicated state with the server as the source of truth. If you find yourself reaching for a reliable RPC inside Tick, stop, because that is the exact shape of the bug.

A second view of source code with network function declarations

There's a wider point hiding in here. Reliability in networking is never free, it's deferred cost. TCP hides it well enough that we forget, but Unreal's reliable channel makes the bill visible: ordering, retransmission, and a finite buffer that you can absolutely overrun. Treating "reliable" as a synonym for "good" is how you get a connection that drops under exactly the load you built the game for. The unreliable path isn't the cheap option you settle for. Most of the time it's the correct one, and the discipline is knowing which of your messages genuinely cannot be lost.

The prototype plays smoothly now, and the bandwidth graphs are even calmer than before, because most of what I was sending didn't need a guarantee at all.