Ramblings of an aging IT geek
← Ramblings of an aging IT geek
networking

the day i broke dns and blamed everything else first

How a single stale upstream in my Pi-hole config took the whole house offline, and why I spent an hour blaming the ISP first.

Network patch panel with tangled cables

The house went dark on a Tuesday evening, in the way that only DNS can manage. Nothing was "down" in the sense of a server being off. Pings to IP addresses worked fine. But every name failed: no BBC iPlayer, no Spotify, no Home Assistant, and a very unimpressed audience asking why the internet was broken.

My first instinct, naturally, was that it was someone else's fault. I checked the ISP status page (which loaded, by IP, eventually). I rebooted the router. I muttered about Virgin Media. All of this took the better part of an hour, and all of it was wrong.

The actual cause: I'd been tidying my Pi-hole the night before and "cleaned up" the upstream resolvers. I'd left a single upstream pointing at an internal recursive resolver I had half-decommissioned. So Pi-hole was happily forwarding every query into a black hole and timing out. The clue was right there in dig:

$ dig @192.168.1.2 example.com
;; connection timed out; no servers could be reached

A resolver that can't reach its upstream is just an expensive way to wait. The lookups against the Pi-hole itself answered for cached and local names, which is exactly why it felt intermittent rather than dead. The stuff I'd visited recently still resolved from cache; everything else fell off a cliff once the TTLs expired.

Datacenter aisle with blinking server lights

The fix was thirty seconds of work once I stopped flailing: point the upstreams back at something that actually exists.

# /etc/pihole/setupVars.conf
PIHOLE_DNS_1=1.1.1.1
PIHOLE_DNS_2=9.9.9.9

Then pihole restartdns and the house came back to life, accompanied by the sound of three paused streams resuming at once.

The lesson isn't really about Pi-hole. It's that DNS failures feel like everything is broken, because in a sense everything is, and that makes you reach for the biggest, most external explanation first. The boring truth is usually that you changed something. I now run a single host on a fixed second resolver, on a different box entirely, precisely so that when I fat-finger the main one there's a working name server to dig against and embarrass me quickly. An hour of blaming the ISP is an hour I'd quite like back.