a vpn that throttled itself, and yes it was the mtu

A terminal showing a packet capture mid-debug

The tunnel came up clean. WireGuard between two sites, handshake in milliseconds, ssh worked, ping worked, a quick curl of a small endpoint worked. Then someone tried to rsync a few hundred megabytes across it and the whole thing wedged. Not slow. Wedged. The transfer would start, move a few kilobytes, and then hang until it gave up.

You can probably see where this is going from the title. But I want to walk through it, because the symptom is a near-perfect dead ringer for "the MTU is wrong" and I still managed to spend an hour blaming WireGuard first.

The tell is the shape of the failure. Small things work, large things don't, and the handshake is fine. That combination almost never means a broken application or a dead route. It means a packet that is too big to fit down some link, and a path that has lost the ability to say so. Encapsulation is the obvious culprit on a VPN: every packet now carries an outer header, so the usable payload shrinks. WireGuard's overhead is around 60 bytes on IPv4. Send a 1500-byte packet into a 1500-byte tunnel and the encapsulated result is 1560 bytes, which doesn't fit, so it has to fragment or be dropped.

A close-up of a packet capture showing ICMP unreachable messages

Normally path MTU discovery handles this. The router that can't forward the oversized packet sends back an ICMP "fragmentation needed" message, the sender shrinks its packets, everyone gets on with their day. Except a firewall in the middle was dropping ICMP wholesale, because someone years ago decided ICMP was "a security risk" and blocked the lot. So the sender never heard the bad news. It just kept hammering out packets that vanished into the tunnel, and the transfer stalled forever in a black hole of its own making.

The confirmation took thirty seconds once I stopped theorising:

ping -M do -s 1400 10.20.0.1   # ok
ping -M do -s 1420 10.20.0.1   # no reply

-M do sets the don't-fragment bit, so the kernel won't quietly chop the packet up. Somewhere between 1400 and 1420 bytes of payload, replies stopped. That's the wall.

The fix was to set the tunnel interface MTU to something that accounts for the overhead, and to let TCP MSS clamping handle the rest so connections negotiate a sane segment size up front rather than discovering it the hard way.

[Interface]
MTU = 1420

# and on the router, for good measure:
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
  -j TCPMSS --clamp-mss-to-pmtu

The MSS clamp is the belt to the MTU's braces. It rewrites the maximum segment size in the SYN so both ends agree on packets that will actually fit, regardless of what PMTU discovery is or isn't allowed to do. With those two in place the rsync ran at line rate and ssh carried on not caring, because ssh's packets were small enough to fit all along, which is exactly why it lied to me about everything being fine.

I did go and have a quiet word about that ICMP-blocking firewall rule. Dropping all ICMP doesn't make you secure, it makes path MTU discovery impossible and turns every encapsulated link into a guessing game. But the real lesson is the old one, worn smooth from repetition: when small works and large hangs, don't open the application logs. Open a terminal and ping with the don't-fragment bit set. It was the MTU. It's always the MTU.