Connectivity was "fine". ping worked. SSH connected, prompted, accepted my password, and then hung the instant I ran anything that printed more than a line. curl against the internal registry got as far as the TLS handshake and stopped dead. Small things worked, big things didn't, and that shape of symptom has a name. It was the MTU. It's always the MTU.
I want to write this one down properly because every time it happens I rediscover it from first principles, swear, and forget again by the next time. Maybe writing it makes it stick.
the setup
New site-to-site tunnel, WireGuard over the open internet between two offices. Routing was correct, the handshake completed, ping 10.20.0.1 came back in 12ms. By every check I'd normally run, the link was up. But anything that moved real data stalled, and it stalled in a particular way: it didn't fail, it didn't reset, it just stopped, then eventually timed out.
That "stops rather than fails" behaviour is the tell. A firewall drop gives you a clean rejection or a connection refused. A black hole gives you silence, because something is dropping packets and nobody is being told.
the smoking gun
The fastest test for an MTU problem is to fire packets of a known size with fragmentation forbidden and watch where the wall is.
# Don't Fragment set, payload size we choose. 1472 + 28 = 1500.
ping -M do -s 1472 10.20.0.1 # times out
ping -M do -s 1372 10.20.0.1 # replies
There it was. Packets up to 1400 bytes total got through, anything larger vanished. The tunnel's usable MTU was lower than the 1500 the interfaces assumed, because WireGuard wraps each packet in its own encapsulation and that overhead has to come from somewhere. A full 1500-byte packet plus the tunnel header exceeds 1500 on the underlay, so it needs to fragment, and somewhere along the path a router had Don't Fragment respected and ICMP "fragmentation needed" dropped. That ICMP message is how Path MTU Discovery is supposed to work. Block it and you get exactly this: a silent black hole for large packets only.
Why did small things work? A bare ping is tiny. SSH's initial negotiation is small packets. The TLS handshake fits in small packets too. It's only when the connection starts carrying real payload, a screenful of ls, a registry blob, that the segments grow to full size, hit the invisible wall, and disappear. The connection was never broken. It was just incapable of carrying anything that mattered.
the fix, and the better fix
The immediate fix is to stop pretending the MTU is 1500 when it isn't:
ip link set dev wg0 mtu 1380
That makes the tunnel interface honest about its size, so the kernel produces appropriately small packets to begin with and never needs to fragment. For WireGuard, 1420 is the common conservative default; I went lower because the underlay path itself had some overhead I didn't fully trust.
But MTU on the interface only fixes traffic the local kernel originates with awareness of that interface. The more robust belt-and-braces move for TCP is to clamp the MSS, so every TCP connection across the tunnel negotiates a maximum segment size that already fits:
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
-j TCPMSS --clamp-mss-to-pmtu
--clamp-mss-to-pmtu rewrites the MSS in the SYN to match the outgoing interface MTU minus the TCP/IP headers. The two ends then agree never to send a segment bigger than the path can carry, and the whole fragmentation question never arises. It's a hack in the sense that it rewrites packets in flight, but it's the standard hack, and it's reliable in a way that depending on PMTUD across the public internet simply is not. Too many networks drop the ICMP it needs.
why path mtu discovery couldn't save me
It's reasonable to ask why any of this is necessary. The internet has a mechanism for exactly this problem, and it's been there since the start. Path MTU Discovery works like this: a host sends a full-size packet with the Don't Fragment bit set, a router along the way finds it can't forward something that big without fragmenting, and because DF is set it can't fragment, so it drops the packet and sends back an ICMP "fragmentation needed, DF set" message that includes the MTU it can handle. The sender reads that, lowers its idea of the path MTU, and resends smaller. The connection self-heals in a round trip or two.
That's the theory, and when it works it's invisible and lovely. The problem is the load-bearing word "ICMP". A great many networks, out of a lazy and cargo-culted notion of security, block ICMP wholesale at the firewall. Block the "fragmentation needed" message and you've quietly broken PMTUD, and you've broken it in the most pernicious way: the sender never learns it needs to go smaller, so it keeps sending full packets that keep getting silently dropped, forever. This is the PMTUD black hole, and it's why "ping works but real traffic hangs" is such a reliable fingerprint. The small packets that make a connection were always going to get through. The large ones were always going to die in silence.
So the practical posture is to assume PMTUD will fail and engineer around it, rather than depend on a mechanism that some firewall three hops away has helpfully disabled on your behalf. MSS clamping is precisely that: it stops relying on the path to report its own MTU, and instead pins the segment size at the point you control. You're not fixing PMTUD. You're making sure you never need it.
a checklist for next time
For my own future benefit, the order of operations when connectivity is "fine" but useless:
- Confirm it's a size problem with
ping -M do -s <n>, bisecting the size until you find the wall. If small works and large doesn't, you're done diagnosing. - Work out the real overhead of whatever's in the path: a tunnel, a VLAN, PPPoE, encapsulation of any kind. Each layer steals bytes from the 1500 you assumed you had.
- Set the tunnel or interface MTU to something honest, below the wall you found.
- Clamp MSS on forwarded TCP so connections negotiate a safe segment size regardless of PMTUD.
- Only then go and read the application logs, which by now will have stopped complaining.
With the interface MTU set and MSS clamping in place, SSH printed a full directory listing, curl pulled the blob, and the registry mirror finally synced.
The lesson I keep failing to retain: when small packets work and large packets vanish into silence, do not start at the application. It is not a TLS problem, it is not a routing problem, it is not the registry. It is the MTU. Run the ping -M do test first, find the wall, and you'll save yourself an afternoon of reading the wrong logs. I have now written this down, so naturally I'll be fully cured and never spend that afternoon again.