Ramblings of an aging IT geek
← Ramblings of an aging IT geek
networking

the homelab outage that was hiding in 50 bytes of overhead

A full account of an MTU mismatch across a homelab VPN: how it presented, the layers it broke, how Path MTU Discovery was supposed to save me, why it didn't, and the clamping fix that finally made large packets flow.

Network cables and patch panels in a homelab rack

I have written before about MTU being the silent killer, in the short, smug way you write about something when you have just diagnosed it in five minutes. This is the longer, less smug version, because a few days later it came back wearing a different hat and cost me an entire evening before I admitted it was the same villain.

The setup: I had brought up a VPN between my homelab and a small cloud box, so that services on one could reach services on the other as if they were on the same network. Standard stuff. Ping worked. SSH worked. I could browse most internal pages. And then, intermittently and infuriatingly, some things would just hang. A git clone of a large repository would stall partway. A container pull would get a few layers in and freeze. A particular dashboard loaded its HTML and then sat forever waiting for a big JSON payload that never finished arriving.

The maddening part is how reasonable everything looks while this is happening. There is no error. Nothing is refused. The connection is established, data starts flowing, and then it simply stops, as if the other end fell asleep mid-sentence.

The tell: it tracks packet size, not service

The thing that finally pointed me at MTU was noticing the pattern across services rather than within one. It was not that "the dashboard is broken" or "git is broken". It was that anything which needed to move large packets broke, and anything that chattered in small ones was fine. SSH negotiates and then sends modest packets: fine. Ping sends tiny ones: fine. A bulk transfer fills packets right up to the MTU: dead.

Once you see it framed that way, MTU goes straight to the top of the suspect list. The classic test is to send a packet of a known size with fragmentation forbidden, and watch where the ceiling is:

ping -M do -s 1472 10.10.0.1

The -M do sets the don't-fragment bit. The -s 1472 is a payload that, with 28 bytes of ICMP and IP headers, comes to exactly 1500: standard Ethernet. Across my LAN that sailed through. Across the VPN it failed. Bisecting downward, the largest payload that survived the tunnel was around 1372, which put the effective MTU near 1400 rather than 1500. The VPN's own encapsulation was eating roughly 50 bytes of every packet, and nobody had told the rest of the stack.

Why Path MTU Discovery was supposed to fix this

Here is the part that genuinely annoyed me, because TCP/IP has machinery for exactly this and it was being defeated. Path MTU Discovery (PMTUD) is meant to handle the situation where two hosts think they can send 1500-byte packets but a link in between can only carry, say, 1400.

The mechanism is elegant. The sender marks its packets don't-fragment. When one of those packets hits the tunnel and is too big to pass, the router at that hop is supposed to drop it and send back an ICMP "Fragmentation Needed" message that says, in effect, "too big, the most I can carry is 1400". The sender receives that, lowers its idea of the path MTU for that destination, and resends smaller. The transfer recovers automatically and you never notice.

That is the theory, and when it works it is invisible. The reason it so often does not work is the reason it bit me.

A diagram of a packet too large for the next link, and the ICMP that should come back

Why it didn't fix it

PMTUD depends entirely on that ICMP "Fragmentation Needed" message getting back to the sender. And ICMP, over the years, has acquired a reputation as scary attack-surface noise, so a great many firewalls and security groups simply block it. Mine did. Somewhere in the path, ICMP type 3 code 4 was being dropped on the floor along with all the other ICMP that people block reflexively without realising they have just disabled a load-bearing part of IP.

So the sequence was: large packet sent, too big for the tunnel, dropped at the tunnel ingress, ICMP "too big" generated, ICMP blocked before reaching the sender, sender hears nothing. From the sender's point of view the packet just vanished into a black hole. It keeps retransmitting the same oversized packet, which keeps getting dropped, which is why the transfer does not error out and does not recover. It hangs. This failure mode even has a name, the PMTUD black hole, and once you have met it once you recognise the symptom forever: small packets fine, large packets silently lost, no errors anywhere.

I confirmed it by watching with tcpdump on the sending host. I could see the big packets going out and the retransmissions of those exact same big packets going out again. No ICMP coming back. The sender was deaf to the one message that would have rescued it.

The fix: stop relying on ICMP, clamp the MSS

There are two honest fixes and I used both.

The first is to set the interface MTU to match reality, so the stack never tries to send oversized packets in the first place. On the tunnel interface I dropped the MTU to 1400:

ip link set dev wg0 mtu 1400

That solves it for traffic that originates on or routes through that interface and respects its MTU. But MTU alone does not always reach every flow cleanly, especially forwarded traffic, so the more robust homelab fix is to clamp the TCP maximum segment size. MSS clamping rewrites the MSS value during the TCP handshake so that both ends agree, up front, to use segments that fit the path, never relying on PMTUD or ICMP at all:

iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
    -j TCPMSS --clamp-mss-to-pmtu

--clamp-mss-to-pmtu sets the MSS based on the outgoing interface's MTU, so each TCP connection negotiates a segment size that actually fits the tunnel from the very first packet. No oversized packets, no dropped ICMP, no black hole. With clamping in place the git clone ran to completion, the container pulls stopped freezing, and the dashboard's big JSON arrived like it had somewhere to be.

What I took away from it, again

The first time I met this, I patched the symptom and moved on feeling clever. The second time taught me the bit I had skipped: MTU problems are not really about the number, they are about the silence. The protocol has a built-in recovery mechanism, and the most common reason it fails is a well-meaning firewall rule blocking the ICMP that the recovery depends on. Block ICMP indiscriminately and you have not hardened your network, you have lobotomised Path MTU Discovery and signed yourself up for exactly this evening.

So the lesson I am actually keeping this time: when a tunnel is involved and large transfers hang while small ones thrive, do not just lower the MTU and walk away. Clamp the MSS so you never depend on ICMP getting through, and think twice before blocking ICMP "Fragmentation Needed" anywhere, because that message is not noise. It is the network trying to tell you the truth, and MTU only stays a silent killer for as long as you keep silencing the one thing that would speak up.