The symptom was the kind that makes you doubt your sanity. A service worked perfectly for small requests and hung forever on large ones. Curl a tiny endpoint, instant response. Curl one that returned a few kilobytes of JSON, and the connection would establish, the headers would come back, and then nothing. It would sit there until it timed out. No error, no reset, just a stall partway through the body.
The maddening part of a partial-transfer hang is that everything cheap works. DNS resolves. The TCP handshake completes, you can see the SYN and the SYN-ACK in the capture. TLS negotiates. The server clearly accepts the request and starts to answer. So all the things you would normally suspect are demonstrably fine, which sends you off looking at the application, the load balancer, the timeout settings, anything but the layer that is actually broken.
The tell, once you know to look for it, is the relationship between request size and failure. Small payloads work, large payloads hang, and there is a threshold somewhere in between. That is not an application bug. Applications do not care how big a response is, they just write bytes to a socket. Something at the network layer is dropping the big packets and passing the small ones, and the canonical cause of that is the MTU.
Here is the mechanism. Ethernet typically carries a 1500-byte maximum transmission unit. When a path has a smaller MTU somewhere in the middle, a VPN tunnel, a PPPoE link, some encapsulation that eats into the budget, a full-size packet is too big to pass. Normally the router that cannot forward it sends back an ICMP "fragmentation needed" message, the sender learns the path MTU, and shrinks its packets. This is path MTU discovery and it is supposed to be invisible.
It works right up until someone blocks ICMP. A firewall configured by someone who decided ICMP was a security risk and dropped all of it will silently swallow those "fragmentation needed" messages. Now the sender never learns the path is narrower. It keeps sending 1500-byte packets with the don't-fragment bit set, the narrow link keeps dropping them, and the feedback that would fix it never arrives. Small responses fit in packets under the limit and sail through. Large responses need full-size packets, which vanish into the void. The connection hangs because TCP is dutifully retransmitting packets that can never get through.
In this case the narrow link was a VPN tunnel between two sites, and the firewall in front of it was eating ICMP type 3 code 4. You can confirm it in seconds with a ping that sets the don't-fragment bit and steps up the size:
ping -M do -s 1472 host # 1472 + 28 bytes header = 1500, works
ping -M do -s 1473 host # one byte over, "message too long" or silent drop
Find the size where it stops working and you have found your path MTU.
The fix has two halves. The proper one is to stop dropping ICMP fragmentation-needed messages, because path MTU discovery is not optional, it is how the internet copes with mixed link sizes, and blanket-blocking ICMP breaks it. The pragmatic one, when you do not control the offending firewall, is MSS clamping: have the router rewrite the TCP maximum segment size during the handshake so both ends agree to use smaller packets from the start and never generate one too big to fit.
I have lost more hours to this exact bug than I would like to admit, across more jobs than one, and the lesson never changes. When a connection works for small things and hangs for big things, stop reading application logs. It is the MTU. It is always the MTU.