I have eleven Grafana dashboards. I am not proud of this. There's one for the NAS, one for the network, one for "node exporter, everything," one I built to watch the Pi-hole, and several whose purpose I genuinely cannot recall without clicking in.
What I do not have is a single alert that would have told me a disk was at 97% last week. I found out the old-fashioned way, when a service stopped writing and fell over. All those beautiful gauges were showing me the problem in real time, in a tab I wasn't looking at.
This is the trap with self-hosted monitoring. Dashboards are fun to make. They give you that pleasant sense of being on top of things, all those sparklines ticking along. But a dashboard only works if a human is staring at it, and no human stares at eleven of them. An alert works while you're asleep.
So this weekend's job is subtraction, not addition. Three dashboards I'll actually open, and a handful of Alertmanager rules for the things that genuinely ruin my evening: disk above 90%, a host down for more than five minutes, the backup job not finishing. Everything else can stay a pretty graph I visit once a month and feel briefly clever about.