Ramblings of an aging IT geek
← Ramblings of an aging IT geek
debugging

when in doubt, put it on the wire

A service that "couldn't reach" its database turned out to be a path MTU problem, found by watching the actual packets rather than the logs.

A terminal full of packet captures over a tangle of cables

Every few months I am reminded that the application logs are a story the application tells about itself, and the packet capture is what actually happened. This week the two disagreed, and tcpdump was right, as it nearly always is.

The symptom was a service that intermittently "couldn't reach" its database. Not a clean refusal, not a timeout you could set your watch by, just the occasional query that hung for thirty seconds and then died with a connection reset. The database was fine. Its logs showed the connections arriving, doing a little work, and then going quiet. The application's logs showed a healthy connection that suddenly wasn't. Two healthy ends and a sick middle: classic.

Reading the logs got me nowhere

I spent longer than I would like admitting staring at application logs, because that is the comfortable thing to do. The logs told me the connection failed. They did not tell me why, because the application has no idea why. From its point of view a socket it trusted simply stopped answering. Everything in the log was a downstream consequence of something happening a layer below, where the application cannot see.

So I stopped reading the story and went to look at the wire.

$ sudo tcpdump -ni eth0 -w db.pcap host 10.0.4.12 and port 5432

I let that run until the problem happened again, which obligingly it did within a few minutes, then pulled the capture into a tool that could show me the conversation rather than a list of packets.

A packet trace with the failing exchange highlighted

The packets told a different story

The handshake was clean. The query went out. The first chunk of the response came back, then nothing. The application sat there waiting, the database sat there assuming it had sent everything, and in between there was a gap exactly where a large packet should have been. Small responses worked every time. Large ones died. That shape is a path MTU problem with a sign on it, lit up in neon.

Somewhere between the two hosts a link had a smaller MTU than either end believed, and the ICMP "fragmentation needed" messages that are supposed to tell the sender to back off were being dropped by an overly enthusiastic firewall. So the sender kept cheerfully transmitting full-size frames that quietly fell on the floor, and neither end ever found out. This is the textbook PMTU black hole, and it is miserable precisely because both ends look perfectly healthy.

The capture made it obvious. You can see the large segments going out, you can see no acknowledgement coming back, and you can see the retransmissions hammering away at the same doomed size until the connection gives up. No log on either machine would ever show you that, because no application is involved in the failure. It is happening below them both.

The retransmission pattern in the same trace

The fix, and the lesson I keep relearning

The immediate fix was to clamp the MSS so the two ends negotiated a segment size that fit through the narrow link:

iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN \
  -j TCPMSS --clamp-mss-to-pmtu

The proper fix was to stop the firewall eating the ICMP messages that exist precisely to prevent this, but that lived in someone else's remit and took a fortnight. The clamp held the line in the meantime, and large responses started arriving whole.

The lesson is the one I apparently need re-teaching myself on a schedule. When a problem sits between two things that both look healthy, the logs of either thing will lead you in circles, because neither thing knows what is wrong. The wire knows. tcpdump is not glamorous and the output is a wall of hex until you squint at it the right way, but it does not lie to you and it does not have a point of view. It just shows you what crossed the cable.

I keep a little alias for the common case so there is no friction in reaching for it: alias tcpd='sudo tcpdump -ni eth0 -w /tmp/cap.pcap'. The cheaper it is to start a capture, the sooner you start one, and the sooner you start one the less time you waste reading stories the machine is telling about itself. The point is not that tcpdump is clever. The point is that it is honest, and on a bad day honesty is the thing in shortest supply.