small packets fine, big packets gone

A terminal full of packet captures

The bug report was the kind that makes no sense on first read. "The API works, but downloading anything large just hangs." Logins worked. Small requests worked. curl against the health endpoint was instant. Then you ask it for a real response body and the connection sits there, transfers nothing, and eventually times out. A service that works for small things and dies on large things has a very particular smell, and the smell is MTU.

The setup mattered: traffic was crossing a tunnel, in this case a VPN, between the client and the service. Tunnels add their own headers to every packet, which means the effective maximum packet size inside the tunnel is smaller than the 1500 bytes everyone assumes on a normal Ethernet link. If something along the path doesn't account for that, large packets get dropped, and dropped silently is the worst kind.

why small worked and large didn't

A TLS handshake and a tiny request fit comfortably inside even a reduced MTU. They never produce a full-size packet, so they sail through. The trouble starts when the service tries to send a big response. TCP fills packets to the path MTU it believes it can use, the tunnel can't carry one that large, and the packet needs to be fragmented or the sender needs to be told to send smaller ones.

That "needs to be told" is Path MTU Discovery, and it works by the router in the middle sending back an ICMP "fragmentation needed" message. The classic failure is a firewall somewhere blocking ICMP because someone decided years ago that ICMP is dangerous. Now the sender never hears that its packets are too big. It keeps sending full-size packets, they keep vanishing into the tunnel, and the connection just stalls. No error, no reset, nothing. A black hole.

A packet capture showing large packets sent and never acknowledged

proving it

You can confirm MTU with ping and the don't-fragment flag, walking the size up until packets stop getting through:

ping -M do -s 1472 service.internal   # 1472 + 28 = 1500, works on plain Ethernet
ping -M do -s 1400 service.internal   # works across the tunnel
ping -M do -s 1452 service.internal   # silence

-M do sets don't-fragment, and -s is the payload size; add 28 bytes for the IP and ICMP headers to get the real packet size. Somewhere between 1400 and 1452 the replies stopped, which put the usable MTU inside the tunnel at around 1400-something rather than the 1500 the interface was advertising.

A packet capture told the same story from the other end: full-size segments sent, no acknowledgements, retransmits of the same too-big packet over and over, all dutifully disappearing.

the fix

The clean fix is to set the interface MTU correctly for what the path can actually carry, so the stack never builds a packet too big in the first place. Where you don't control the tunnel endpoint, the pragmatic fix is MSS clamping: have the gateway rewrite the TCP maximum segment size during the handshake so both ends agree to use smaller segments. With iptables that's a one-liner that fixes it for every connection crossing the box:

iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
  -j TCPMSS --clamp-mss-to-pmtu

That clamps the MSS down to fit the path MTU automatically, so nobody ever tries to send a packet the tunnel can't carry, and the ICMP black hole stops mattering.

I have lost more hours to MTU than to almost any other single class of bug, and the tell is always the same: small works, large hangs, and there's a tunnel or a VPN in the path that someone forgot subtracts from your packet budget. When you see that shape, don't reach for the application logs. Reach for ping -M do first.