Ramblings of an aging IT geek
← Ramblings of an aging IT geek
networking

the day my own resolver broke and dnssec was right to

A self-inflicted DNS outage on a home Unbound resolver, traced to a stale trust anchor and a clock that had drifted.

Network cables in a patch panel

I've run my own recursive resolver for a while now, and it had been boringly reliable, which is the only kind of reliable worth having. Then one morning, nothing resolved. Not some things. Nothing. Every lookup came back SERVFAIL, the house went quiet, and I had nobody to blame but the person who'd insisted on doing DNS himself.

The instinct is to assume the resolver is broken. The harder, more useful assumption is that the resolver is doing exactly what you told it, and what you told it was wrong.

following the SERVFAIL

SERVFAIL with DNSSEC validation on is usually one of two things: the data really is bad, or your validator can't trust it. I ran the query by hand and asked Unbound to tell me why:

dig @192.168.1.2 example.com +dnssec
unbound-host -D -v example.com

The verbose output said the validation was failing on a signature whose inception time was in the future. A signature can't be valid before it's been made. Which meant either the entire DNS had travelled back in time, or my box's clock had.

A rack of servers

it was the clock, of course

The little resolver box had lost NTP at some point and quietly drifted. Not by much, but DNSSEC signatures have tight validity windows, and "not much" was enough to push the system clock outside the window for freshly rotated signatures. The cryptography was working perfectly. It was correctly refusing to trust records that, as far as my wrong clock could tell, hadn't been signed yet.

Fixing the clock fixed everything in seconds:

systemctl restart systemd-timesyncd
timedatectl status

The second issue, which I found whilst I was in there, was a trust anchor I'd half-managed by hand. Unbound can maintain the root key itself via RFC 5011 if you let it own the anchor file, and you point auto-trust-anchor-file at something it can write:

server:
    auto-trust-anchor-file: "/var/lib/unbound/root.key"

I'd been treating that file as read-only config. It isn't. It's state, and the resolver needs to update it as the root key rolls.

the lesson I keep relearning

DNSSEC didn't fail me. It did its job and refused to lie. The failure was upstream of the cryptography, in the dull infrastructure underneath: a clock and a key file. When you run your own resolver you also own its time, its trust anchors, and the privilege of breaking your entire house with a drifted clock. I'd still do it. I just check NTP first now, every single time.