MTU problems are the worst kind of network bug, because everything mostly works. Small things go through fine. Pings are perfect. You can SSH in, you can run commands, you can browse a few pages. And then you try to do something real, like copy a big file or load a page with a fat response, and it hangs forever with no error. No reset, no refusal, just a transfer that gets a little way in and then stalls. I lost the best part of a Saturday to exactly this on my homelab, and I am writing it down so the next person who searches the symptom finds the answer faster than I did.
the setup
I had built a small site-to-site VPN between my flat and a box at a mate's place, using WireGuard over the public internet. The handshake came up immediately, the link looked healthy, and I happily started using it. SSH worked. DNS worked. Pinging across the tunnel worked. I declared victory and went to make tea.
Then I tried to pull a backup across the link, and it died. Not slowly, not with an error, it just sat at a few hundred kilobytes and stopped. rsync hung. scp hung. A plain curl of a large file hung at the same point every time. But curl of a tiny file was instant. That pattern, small fine, large dead, is the signature. Once you have seen it once you never forget it.
why it happens
Every link has a maximum transmission unit, the largest packet it will carry. On a normal Ethernet network that is 1500 bytes. The moment you wrap traffic in a tunnel, you add headers, and those headers eat into the budget. WireGuard's encapsulation costs you some overhead, so the usable MTU inside the tunnel is smaller than 1500. If something tries to send a full 1500-byte packet down a path that can only carry, say, 1420, one of two things happens. Either the packet gets fragmented, or, if the "don't fragment" bit is set, the router is supposed to send back an ICMP message saying "too big, try smaller", and the sender drops its packet size accordingly. That mechanism is path MTU discovery, and it is supposed to handle exactly this.
The trouble is that ICMP gets blocked all over the modern internet. Overzealous firewalls drop "fragmentation needed" messages because someone decided ICMP is scary. So the sender never hears that its packets are too big. It keeps cheerfully sending 1500-byte packets into a tunnel that silently drops them, and it never finds out. Small packets fit, so they sail through, which is why everything interactive feels fine. Large transfers are made of big packets, so they vanish into the void. This is the PMTU black hole, and it is depressingly common.
finding it
The way to confirm it is to send a ping with a known payload size and the don't-fragment bit set, and walk the size up until it stops working. On Linux:
# this should work
ping -M do -s 1372 10.10.0.1
# somewhere above here it will start failing
ping -M do -s 1500 10.10.0.1
The -M do sets don't-fragment, and -s is the payload size. Remember the payload is smaller than the full packet, because the ICMP and IP headers add 28 bytes on top. When the small one succeeds and the large one fails with "message too long" or just silence, you have found your ceiling. Mine fell over somewhere just above 1400 bytes of payload, which lined up neatly with the WireGuard overhead.
the fix
There are two fixes and you want both. The first is to set a sensible MTU on the tunnel interface itself so anything originating on the box knows the real limit. For WireGuard I set it explicitly:
[Interface]
MTU = 1420
The second, and this is the one that actually saved me, is MSS clamping. The maximum segment size is the TCP-level cousin of MTU, and you can have the router rewrite the MSS in the TCP handshake so that both ends agree to use smaller segments from the start. This sidesteps the broken ICMP path entirely, because nobody needs to discover anything, the size is negotiated up front. With iptables:
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
-j TCPMSS --clamp-mss-to-pmtu
That --clamp-mss-to-pmtu clamps the MSS to fit the path automatically, which is exactly what you want when you cannot trust ICMP to get through. The moment I added that rule, the backup that had been hanging for an hour completed in minutes. No more black hole.
the lesson
MTU is a silent killer because it does not announce itself. There is no error in the logs, the link reports healthy, and the failure mode is selective in a way that fools you into blaming the application. Any time a tunnel or a VPN works for interactive traffic but dies on bulk transfers, MTU is the first suspect, and MSS clamping is usually the cure. I now set the MTU explicitly and clamp the MSS on every tunnel I build, by reflex, before I even test it. It costs two lines of config and it saves a Saturday.