The thing that finally pushed me over the edge was the browser warnings. Every self-hosted service on my network had its own self-signed certificate, which meant every visit was a click-through, and click-through warnings train you to ignore the one that actually matters. So I gave up on per-service certs and put one reverse proxy in front of everything.
The point of this post: you can have real, trusted HTTPS on internal-only hostnames without exposing anything to the internet. The trick is the DNS-01 challenge, which proves you own the domain by writing a TXT record rather than by serving a file on port 80. No inbound ports, no public service, just a proxy that knows how to talk to my DNS provider's API.
the shape of it
Everything lives behind a single Traefik instance on the homelab box. Each service gets a hostname under a real domain I own, something.lab.example.com, with an A record pointing at the proxy's internal IP. Public DNS resolves the name, but the IP is RFC1918, so it only works from inside the house or over the VPN. That's exactly what I want.
Traefik handles certificate issuance itself. I point it at Let's Encrypt with the DNS challenge configured for my provider, hand it an API token through an environment variable, and it does the rest. The first time a new router comes up, it requests a cert, writes the TXT record, waits for propagation, and stores the result. After that it renews quietly in the background.
The static config is short. The interesting part is the certificate resolver:
certificatesResolvers:
le:
acme:
email: [email protected]
storage: /etc/traefik/acme.json
dnsChallenge:
provider: cloudflare
resolvers:
- "1.1.1.1:53"
- "8.8.8.8:53"
The resolvers line matters more than it looks. Traefik checks that the TXT record has actually propagated before it tells Let's Encrypt to validate, and if you let it ask your own internal resolver, you can end up in a loop where it sees a cached negative answer and gives up. Pointing it at public resolvers for that check saved me a frustrating evening.
Per service, the config is just a label or two telling the proxy which hostname routes where and which resolver to use. Adding a new service is now: bring up the container, add three lines of router config, add an A record. The cert appears on its own.
the bits that bit me
Two things caught me out. First, acme.json must be mode 0600 or Traefik refuses to use it and silently falls back to its default self-signed cert, which is exactly the warning I was trying to escape. Easy to miss because everything looks like it's working until you actually load the page.
Second, rate limits. Let's Encrypt allows a generous but finite number of certs per registered domain per week, and while you're fiddling it's easy to burn through them by tearing services up and down. Use the staging endpoint while you're getting the plumbing right, then switch to production once a real cert issues cleanly. The staging cert won't be trusted, but a single trusted lock icon at the end is worth the extra step.
The result is undramatic, which is the highest praise I can give infrastructure. Every internal service is https://, every cert is valid, nothing is exposed, and I haven't clicked through a warning in months. When the lock icon means something again, you start trusting it again, and that was the whole point.