For about a year my homelab ran on a pile of docker run commands I'd typed by hand and then forgotten. Pi-hole here, a Plex container there, a Unifi controller I was faintly afraid to touch, Grafana, an MQTT broker, a Syncthing node. Each one started with a long incantation full of -v and -p and --restart=always, none of it written down anywhere except my shell history, which I'd lost twice. When the box rebooted, half of them came back and half didn't, and working out which was which involved a lot of docker ps and squinting.
The breaking point was a power cut. The NUC came back up, Docker started, the --restart=always containers mostly returned, and then I spent an hour reconstructing the two that didn't because I genuinely could not remember how I'd launched them. That's not a homelab, that's a hostage situation. So I sat down and moved the whole lot into a single docker-compose.yml, and it's the best afternoon of admin I've done all year.
why compose, and not something cleverer
I know about Kubernetes. I have run Kubernetes. For one box in a cupboard under the stairs, Kubernetes is using a forklift to move a kettle. Compose is the right size: it's declarative enough that the file is the documentation, it's plain YAML, and docker-compose up -d brings everything back exactly as described. No control plane to babysit, no etcd to corrupt, nothing to learn beyond what I already half-knew.
The whole thing lives in one directory, under git, on the host and pushed to a private repo. The directory is the source of truth:
homelab/
docker-compose.yml
.env
pihole/
grafana/
traefik/
data/ # gitignored, this is the actual state
The .env file holds the handful of things I don't want in the repo, mostly the host's timezone, a couple of API tokens, and the path to my data volume. Everything else is in the compose file, in the open, where future-me can read it.
the shape of the file
Here's the spine of it, trimmed down. The real one has a dozen services but they all follow the same pattern.
version: "3.7"
networks:
proxy:
internal:
services:
traefik:
image: traefik:1.7
command:
- "--docker"
- "--docker.exposedbydefault=false"
- "--entrypoints=Name:http Address::80"
ports:
- "80:80"
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
networks:
- proxy
restart: unless-stopped
pihole:
image: pihole/pihole:latest
environment:
TZ: ${TZ}
WEBPASSWORD: ${PIHOLE_PASSWORD}
volumes:
- ./data/pihole/etc:/etc/pihole
- ./data/pihole/dnsmasq:/etc/dnsmasq.d
ports:
- "53:53/tcp"
- "53:53/udp"
networks:
- proxy
- internal
labels:
- "traefik.enable=true"
- "traefik.frontend.rule=Host:pihole.house.lan"
- "traefik.port=80"
restart: unless-stopped
grafana:
image: grafana/grafana:5.2.2
user: "1000"
volumes:
- ./data/grafana:/var/lib/grafana
networks:
- internal
- proxy
labels:
- "traefik.enable=true"
- "traefik.frontend.rule=Host:grafana.house.lan"
- "traefik.port=3000"
restart: unless-stopped
Two things changed how the whole house feels to run.
The first is the reverse proxy. I put Traefik in front of everything, and now instead of remembering that Grafana is on :3000 and the Unifi controller is on :8443 and Pi-hole's admin is on /admin, I have names. grafana.house.lan, pihole.house.lan, unifi.house.lan, all on port 80, all resolved by a wildcard record in Pi-hole itself, which is a pleasingly circular arrangement. Adding a new service is three labels. Traefik watches the Docker socket and picks up new containers automatically, so there's no separate config to edit.
The second is named, predictable volumes. Every service's persistent state lives under ./data/<service>, relative to the compose file. That one decision means my backup is now trivial: stop the stack, tar up ./data, ship it offsite, start the stack. No hunting through /var/lib/docker/volumes for a hash-named directory that may or may not be the one I want.
restart policy, and the lie I'd been telling myself
I'd been using --restart=always everywhere, which sounds robust and is actually slightly wrong. always means a container restarts even if you deliberately stopped it, which fights you every time you try to take something down for maintenance. I switched the lot to unless-stopped, which restarts on crash and on boot but respects a manual docker-compose stop. Small thing, but it stopped Docker and me arguing about who was in charge.
There's a subtlety worth flagging. The version: "3.7" at the top isn't decorative; it pins the compose file format and gates which features are available, and I'd been bitten before by copying a snippet that quietly needed a newer schema than the one I'd declared. I keep it explicit and bump it deliberately rather than by accident. Likewise the two networks, proxy and internal: only services that need to be reachable through Traefik join proxy, and the database-style backends sit on internal where nothing from outside can route to them at all. It's a small amount of segmentation, but it means a misconfigured label can't accidentally expose a service that was meant to stay private. The proxy is the only thing with a published port; everything else talks over the Docker networks by service name, which is its own little piece of magic, because grafana can reach prometheus just by that name with no IPs to wire up.
the backup, because a homelab without one is a countdown
The bit I'm most pleased with is the least clever. A cron job, on the host, that does exactly what I'd do by hand:
#!/bin/bash
set -euo pipefail
cd /home/john/homelab
docker-compose stop
tar czf "/mnt/backup/homelab-$(date +%F).tar.gz" data/
docker-compose start
find /mnt/backup -name 'homelab-*.tar.gz' -mtime +14 -delete
It stops the stack so nothing is mid-write, archives the state, brings it back, and prunes anything older than a fortnight. The downtime is about forty seconds at 04:00, which nobody in this house is awake to notice. I've tested a restore onto a spare Pi: untar the data directory, docker-compose up -d, and the whole house comes back, Pi-hole's blocklists and all.
was it worth it
Unreservedly. The migration took an afternoon, most of which was reconstructing the two containers I'd lost, and the payoff is that my homelab is now a thing I can read, version, move to another box, and rebuild from a git clone and a tarball. When something breaks I edit a file and run one command. When I add a service I copy a block and change three labels. And when the power goes again, which around here it will, the whole house comes back on its own, in the right order, exactly as written down. That's the entire point. The cleverness isn't in the tooling, it's in finally having the thing be boring.