The symptom was the kind that makes you doubt your own eyes. A service could connect to a backend, the handshake completed, small requests worked perfectly, and then any response over a certain size just hung. Not refused. Not reset. Hung, until something eventually timed out and gave up. Tiny payloads: instant. Large payloads: silence.
When small works and large does not, you are almost always looking at MTU. I know this. I have known it for years. And I still spent the first half hour suspecting the application, because the application is the thing I can read and the network is the thing I have to take on faith.
The handshake working is the trap. TCP's three way handshake is all tiny packets, so it sails through any path that mishandles large frames. The connection looks established and healthy. The problem only appears when something tries to send a full-sized segment, and that segment is too big to cross some link in the path that has a smaller MTU than the endpoints think.
The cause was a tunnel. There was a VPN in the path that I had completely forgotten was there, because it had been working fine for everything else. A tunnel adds its own headers, which eats into the space available for payload, which means the effective MTU inside the tunnel is lower than the standard 1500 the endpoints assume. Normally Path MTU Discovery sorts this out: a router sends back an ICMP "fragmentation needed" message saying "too big, try smaller" and the sender adjusts. But ICMP was being dropped by an overzealous firewall somewhere in the middle. So the big packet went out, got silently discarded for being too large, and no one ever told the sender to back off. The sender just kept hopefully retransmitting a packet that could never fit. That is your hang.
The confirmation is a one-liner. Ping across the path with the don't-fragment bit set and a payload just under the suspected limit, then just over it.
# works
ping -M do -s 1372 backend.internal
# silent
ping -M do -s 1400 backend.internal
The cliff edge between those two is your real MTU. Add 28 bytes for the ICMP and IP headers and you have the number to clamp to.
The fix, since I could not fix the firewall eating ICMP, was to clamp the MSS so that TCP advertised a maximum segment size that fit inside the tunnel from the start. On Linux that is a single iptables rule on the box terminating the tunnel.
iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
-j TCPMSS --clamp-mss-to-pmtu
That tells the connection to negotiate a segment size that actually fits, rather than discovering the hard limit by losing packets into a void. Large transfers worked immediately.
It is always the MTU. Or rather, it is never quite the MTU itself, it is some well-meaning firewall dropping the ICMP that would have told everyone about the MTU. The MTU is just where you end up. The thing to keep in your head is the shape of the symptom: handshake fine, small fine, large gone. That shape has one usual suspect, and the next time I see it I would like to believe I will check the path before I read the code. I would like to believe that.