Ramblings of an aging IT geek
← Ramblings of an aging IT geek
debugging

when in doubt, watch the wire

A connection that was timing out turned out to be a DNS resolver answering on the wrong interface, which only tcpdump made obvious.

A terminal showing packet capture

A service was timing out talking to an internal API, intermittently, the worst kind. Logs said connection refused, sometimes. Curl from the box worked. Curl from inside the container did not, except when it did. Everyone had a theory.

I stopped guessing and ran the only tool that doesn't have an opinion:

tcpdump -ni any port 53 or host 10.20.0.5

There it was. The DNS lookup for the API was going out, getting an answer, and the answer was an old address from a stale record that pointed at a host that had been recycled. Half the queries hit a caching resolver with the new record, half hit one with the old. The application was fine. The network was fine. The names were lying.

Packets don't argue. Logs tell you what the program thought happened; tcpdump tells you what actually went down the wire, and the gap between those two is where the bug usually lives. Five minutes of capture beat an hour of theories, again. I keep saying I'll learn this lesson permanently, and I keep being grateful that the tool is right there when I forget.