The bug looked like physics. A pawn would occasionally teleport a metre to the left for one frame and snap back. It was not physics. It was an RPC arriving in an order I had assumed was impossible, and the assumption was entirely my fault. So this is a note to my past self about how Unreal's replicated RPCs actually behave, as opposed to how the word "reliable" made me feel.
the three knobs
A UFUNCTION that crosses the network gets marked with a direction and a reliability. The direction is Server, Client, or NetMulticast. The reliability is Reliable or, by omission, unreliable. That is the whole vocabulary, and most of my trouble came from reading more into it than was there.
UFUNCTION(Server, Reliable, WithValidation)
void ServerFireWeapon(FVector_NetQuantize Origin, FVector_NetQuantize Direction);
UFUNCTION(NetMulticast, Unreliable)
void MulticastSpawnTracer(FVector_NetQuantize Origin, FVector_NetQuantize Direction);
Server means the call originates on a client and runs on the server. Client runs on the owning client. NetMulticast runs on the server and on every connected client. WithValidation forces you to write a companion _Validate that returns false to kick a misbehaving client, which you want on anything a Server RPC trusts.
It is worth being precise about who is allowed to call what, because this trips people up before they even reach the reliability question. A Server RPC may only be invoked on an actor the calling client owns, otherwise it is dropped on the floor. A Client RPC runs only on the owning connection. A NetMulticast is called on the server and fans out. Get the ownership wrong and the call simply evaporates, which is the recurring theme of this whole post: the failure mode is silence, not an error.
So far so reasonable. The footgun is in what Reliable promises, and what it doesn't.
reliable is about arrival, not ordering across channels
Here is the assumption that cost me a Sunday. I believed that if I marked two RPCs Reliable, they would arrive in the order I called them. They will, if they are on the same actor and the same reliability class, because reliable RPCs from a given actor go down an ordered channel. The moment you mix reliable and unreliable, or you involve a different actor, that guarantee evaporates.
My teleport was an unreliable movement update racing a reliable correction. The movement RPC was unreliable on purpose, because it fires constantly and a dropped one is fine, the next one fixes it. The correction was reliable. Under packet loss the reliable one got retransmitted and landed a frame late, after a newer unreliable update had already moved the pawn. The pawn jumped back to the old corrected position for exactly one frame, then the next movement update fixed it. One frame of teleport, intermittent, dependent on the network. The worst kind of bug.
The fix was not to make everything reliable. That is the instinct and it is wrong, because reliable RPCs are expensive: they sit in a queue, they retransmit, and if that queue overflows Unreal closes the connection. Spamming reliable multicasts at tick rate is how you take your own server down. The fix was to stop expressing position as an RPC at all and let the replicated CharacterMovementComponent do its job, which it is genuinely good at, and reserve RPCs for events: I fired, I jumped, I picked this up.
Once I'd written it down the cause was embarrassingly obvious. Two messages, two different delivery guarantees, no shared ordering, and a piece of state that depended on both. The reliable channel and the unreliable channel are genuinely separate pipes. Reliable says "this will arrive, even if I have to send it again." Unreliable says "I'll send it once and move on." When a retransmitted reliable correction lands after a fresher unreliable update, the older value wins for a frame, and you get a teleport that only appears under loss. It was never going to show up on a LAN with no dropped packets, which is exactly why it survived testing.
reliable rpcs can still be dropped (by you)
The second thing the word "reliable" hides: reliability is per-connection, and only as long as the connection lives. There is also a quieter trap. If the actor is not relevant to a client, the RPC is silently discarded. Relevancy and reliability are orthogonal, and I had conflated them. A reliable multicast to a client that has culled the actor for distance does not queue up for later, it simply does not run. So "reliable" means "we will keep trying to deliver this to a connection that should receive it," not "this will run everywhere no matter what."
The other silent dropper is ownership. A Client RPC only runs on the actor's owning connection. Call it on a server-spawned actor that has no owner and nothing happens, no warning, no log unless you go looking with Log LogNet Verbose. I lost an afternoon to a UI prompt that "wasn't firing" before I realised the actor it lived on had never been assigned an owner.
the rules I now actually follow
A short list, because I keep relearning these:
- Movement and continuous state go through replicated properties and the movement component, not RPCs. Let Unreal interpolate.
- RPCs are for discrete events. Fire, jump, interact, took damage.
- Default to
Unreliablefor anything cosmetic that fires often. Tracers, hit flashes, footsteps. A dropped tracer is invisible. A laggy server is not. - Use
Reliableonly for events that must happen exactly once and matter to gameplay. Then accept the cost. - Never assume ordering across a reliable and an unreliable call, or across two actors. If order matters, put a sequence number in the payload and sort it out on arrival.
- Put
WithValidationon everyServerRPC and actually validate, because a client can send you anything.
None of this is hidden. It is all in the documentation, phrased plainly, and I had read it. The trouble is that "reliable" is an ordinary English word with a comforting meaning, and I let that meaning stand in for the engineering one. Reliable means it will arrive if it can. It does not mean it will arrive in order, it does not mean it will arrive everywhere, and it certainly does not mean it will arrive cheaply. The teleporting pawn was the network telling me, politely, to read the words I'd already read.