Ramblings of an aging IT geek
← Ramblings of an aging IT geek
networking

when everything works except the things that don't

A homelab debugging story where small packets flew fine and large transfers hung, and the culprit was MTU mismatch across a tunnel.

Network cables and a switch

The symptom was maddeningly specific. SSH connected instantly. ping was perfect. Small web pages loaded. But pull a large file, or load a page with a chunky payload, and the connection would just hang, forever, with no error. The kind of bug where everything you check looks healthy and you start to doubt your own eyes.

This is almost always MTU. The small stuff fits in a single packet and sails through. The big stuff needs to be fragmented, and somewhere on the path a device is dropping the oversized packets and silently swallowing the ICMP "fragmentation needed" messages that are supposed to tell the sender to back off. So the sender keeps trying with packets that never arrive, and the connection wedges. This is path MTU discovery being broken, and it is everywhere once you start looking.

In my case the offender was a WireGuard tunnel between two sites. WireGuard adds overhead, so the usable MTU inside the tunnel is lower than the 1500 you get on the LAN. Anything that assumed 1500 on a connection that traversed the tunnel hit the wall.

The diagnostic is ping with the do-not-fragment flag and a size you choose:

ping -M do -s 1472 10.0.50.1

1472 plus 28 bytes of headers is 1500. If that fails but a smaller size succeeds, you have found your ceiling. Drop the interface MTU to fit, 1420 is a sensible WireGuard starting point, and the hangs vanish.

MTU is the silent killer because nothing logs an error. The packets just disappear. Whenever a connection works for small things and dies for large ones, stop looking at the application and go straight to the path. It will save you an evening.