Ramblings of an aging IT geek
← Ramblings of an aging IT geek
debugging

small packets fine, big packets gone, and a tunnel in the middle

A connection that completed its handshake and then hung forever on the first large response, traced through a maddening "ping works but transfers stall" symptom to a VPN that quietly shaved the MTU.

A terminal showing a stalled transfer over a tunnel

The symptom was the kind that makes you doubt your understanding of networking from first principles. The connection opened. The TLS handshake completed. The server accepted the request. And then, on the first response large enough to matter, everything stopped dead. No reset, no error, no timeout for a long time, just a transfer that started and then hung as if someone had stood on the cable. Small requests were instant. Large ones vanished into a hole. If you have done this long enough you already know the punchline, and it is in the title, but it took me an hour to admit it because the early evidence was so misleading.

the evidence that lied to me

The first thing anyone does is ping. Ping worked. Beautifully. Sub-millisecond, zero loss, the connection looked perfect.

$ ping -c 4 10.20.0.5
64 bytes from 10.20.0.5: icmp_seq=1 ttl=63 time=0.42 ms
...
4 packets transmitted, 4 received, 0% packet loss

A working ping is the most seductive false negative in networking. It tells you the path exists and the host is alive, and it tells you nothing whatsoever about whether the path can carry a full-sized packet, because a default ping is tiny. A 64-byte packet sails through almost anything. The problem only appears when a packet gets large enough to bump against the smallest link in the path, and ping, helpfully, never sends one of those unless you ask it to.

The handshake worked for the same reason. TLS handshakes are small packets. The connection establishes, both sides are delighted, and then the server tries to send a real response, the packets get big, and the big ones never arrive. Small packets fine, big packets gone. Once you can say that sentence out loud, the diagnosis is basically done. I just had not said it out loud yet.

A diagram of MTU shrinking inside a VPN tunnel

proving it with a ping you have to ask for nicely

The test for this is to send a large packet with the don't-fragment bit set, and watch where it gets refused. On Linux:

$ ping -M do -s 1472 -c 2 10.20.0.5
PING 10.20.0.5 ... 1480(1508) bytes of data.
ping: local error: message too long, mtu=1500

1472 bytes of payload plus 28 of headers is 1500, the standard Ethernet MTU. That fit. So I walked it down:

$ ping -M do -s 1422 -c 2 10.20.0.5
1430 bytes from 10.20.0.5: icmp_seq=1 ttl=63 time=0.6 ms

$ ping -M do -s 1423 -c 2 10.20.0.5
... no reply, hangs ...

There it is. Anything up to 1422 bytes of payload got through. One byte more and it vanished. So the real MTU on this path was 1450, not 1500, and there was a VPN tunnel sitting in the middle of the route. That is exactly the culprit you would expect, because a tunnel wraps each packet in its own headers, and those headers eat into the space available for your actual data. A 1500-byte packet of yours, once the tunnel adds its encapsulation, no longer fits inside a 1500-byte link, so it has to be fragmented or dropped.

why it hung instead of failing cleanly

Normally this fixes itself. The mechanism is Path MTU Discovery: a router that cannot forward an oversized don't-fragment packet sends back an ICMP "fragmentation needed" message telling the sender to use smaller packets, and the sender obligingly does. The system is designed to handle exactly this. The reason it was not handling it was the reason it almost never handles it: somewhere along the path, a firewall was dropping ICMP.

A close-up of an iptables rule blocking ICMP fragmentation-needed messages

Some well-meaning soul, years ago, had blocked ICMP wholesale because "ICMP is a security risk," lumping the genuinely-load-bearing "fragmentation needed" message in with ping. So the oversized packets were dropped at the tunnel, the "please send smaller packets" reply that would have rescued the connection was itself dropped at the firewall, and the sender was left waiting for an acknowledgement that would never come, retransmitting the same too-big packet forever. That is the hang. Not a slow link, not a busy server, just a connection patiently retransmitting a packet that physically cannot fit, into a silence where its only correction had been firewalled off.

the fix

There were two honest fixes and I applied both. The proper one was to stop blocking ICMP type 3 code 4, the fragmentation-needed message, so Path MTU Discovery could do the job it was designed for:

# allow PMTU discovery to work
iptables -A INPUT -p icmp --icmp-type fragmentation-needed -j ACCEPT

The belt-and-braces one, for the hosts behind the tunnel, was MSS clamping, where the router rewrites the TCP maximum segment size during the handshake so both ends agree to use packets that already fit the tunnel, no ICMP required:

iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
    -j TCPMSS --clamp-mss-to-pmtu

With either of those in place the large transfers flowed instantly, and the connection that had hung forever completed in milliseconds.

The lesson is old and I will be relearning it again, probably next year. When a connection establishes and then stalls on the first big payload, it is the MTU, and it is almost always a tunnel, and the reason it hung rather than failed cleanly is almost always somebody who blocked ICMP because it sounded dangerous. Ping proves nothing about big packets. Ask it for a big one with -M do, walk the size down until it breaks, and the path will tell you exactly where it shrank.