Ramblings of an aging IT geek
← Ramblings of an aging IT geek
networking

the path mtu problem that only broke big requests

A homelab story about a tunnel with the wrong MTU, where small packets sailed through and anything large hung forever, plus how to find and fix the mismatch.

Network cables and a patch panel

The symptom was maddening. ping worked. SSH connected, I could log in, run ls, read a short file. But the moment I tried anything that moved real data, a git clone, an apt upgrade, a large HTTP response, the connection hung stone dead and eventually timed out. Small things fine, big things broken. That precise shape of failure is almost always MTU.

This was over a tunnel in my homelab, a little site-to-site link I'd stood up to reach a box behind a NAT. The tunnel adds its own headers to every packet, which means the usable payload inside it is smaller than the 1500 bytes a normal Ethernet interface assumes. If nothing tells the sender to use smaller packets, it cheerfully fires off full-size frames that won't fit. They get dropped, and because Path MTU Discovery relies on ICMP "fragmentation needed" messages getting back to the sender, any overzealous firewall blackholing ICMP turns this into a silent hang rather than a clean error.

Finding it

The diagnostic is ping with the don't-fragment bit set and an explicit size. You binary-search the payload until packets stop getting through. Remember the ICMP and IP overhead is 28 bytes, so payload plus 28 is the MTU you're probing.

$ ping -M do -s 1472 10.20.0.2
PING 10.20.0.2 (10.20.0.2) 1472(1500) bytes of data.
ping: local error: message too long, mtu=1420

$ ping -M do -s 1392 10.20.0.2
1400 bytes from 10.20.0.2: icmp_seq=1 ttl=64 time=2.11 ms

So 1500 was too big, 1420 was the real limit on that path. Everything large was being dropped because the interfaces at both ends still thought they could send 1500.

A datacenter rack with networking equipment

Fixing it

Two fixes, and you usually want both. First, set the tunnel interface MTU correctly so the kernel knows the true ceiling:

ip link set dev tun0 mtu 1420

Second, and this is the pragmatic one for TCP, clamp the MSS so every TCP handshake negotiates a segment size that fits inside the tunnel regardless of what PMTU discovery manages to learn:

iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
    -j TCPMSS --clamp-mss-to-pmtu

The MSS clamp is the belt to the MTU's braces. PMTU discovery is the correct mechanism, but it leans on ICMP getting through, and out in the real world ICMP gets eaten by firewalls run by people who think blocking it improves security. Clamping the MSS sidesteps the whole ICMP dependency for TCP, which is the traffic that was actually hurting.

The lesson I keep relearning: when small works and large hangs, stop staring at the application and go measure the path. MTU mismatches don't announce themselves. They just quietly drop the packets that matter and let the small ones through to convince you the link is fine.