Ramblings of an aging IT geek
← Ramblings of an aging IT geek
debugging

when the logs lie, the wire doesn't

A service that swore it was sending requests and a backend that swore it received none, resolved in ten minutes with tcpdump showing the truth that neither application log would admit.

A terminal showing a captured packet trace

Two services, both lying. The frontend's logs said it was dutifully sending requests to the backend. The backend's logs said it had received nothing at all. Both teams were confident, both had logs to wave, and the meeting was heading exactly where these meetings go: a slow circular argument about whose code was wrong, conducted entirely through the lens of two applications each reporting their own version of events.

The thing is, application logs tell you what the application thinks happened. They don't tell you what actually went down the wire. And when two sides disagree about whether traffic exists, the only honest witness is the wire itself. So I stopped reading logs and started watching packets.

tcpdump -nn -i any 'host 10.0.4.12 and port 8080'

-nn to skip name and port resolution, because I want addresses not guesses and I don't want a slow reverse lookup hiding a fast problem. host and port to cut the firehose down to just the conversation in question. Then I poked the frontend and watched.

A close-up of code on a screen

The packets were there. SYN going out from the frontend, arriving at the backend host. And then a RST coming straight back. The frontend was sending, exactly as its logs claimed. The backend host was receiving and actively refusing the connection, which is why the backend application logged nothing: the application never saw it. The kernel rejected the connection before anything got as far as the listening process.

A RST in answer to a SYN means nothing is listening on that port, or something is blocking it. The application was up, but it had bound to 127.0.0.1:8080 rather than 0.0.0.0:8080 after a config change nobody flagged. So it was listening, just not on an address anything other than itself could reach. From the application's point of view it was healthy and idle. From the frontend's point of view it was sending into a wall. Both logs were technically true and jointly useless.

The whole thing took ten minutes, and nine of those were arguing before I opened a terminal. The lesson isn't new and I'll relearn it again next year: when two applications disagree about whether traffic is flowing, neither of them is a reliable witness to the network. tcpdump doesn't have an opinion about whose code is correct. It just shows you the bytes, and the bytes don't take sides. The argument ends the moment you stop debating logs and start watching the wire.