Ramblings of an aging IT geek
← Ramblings of an aging IT geek
news

when the dashboard is green and the site is down

A provider-side networking wobble this month was a useful reminder that a green status page and a working service are not the same thing, and you should monitor accordingly.

A bank of monitors showing status dashboards

There was another regional networking wobble at a large provider this month, the sort that drops a chunk of traffic for twenty minutes, lights up Twitter, and is quietly resolved before the post-mortem is written. Nothing catastrophic. But it caught a service of mine, and it caught it in the most annoying way: the provider's status page stayed green throughout.

That's the lesson, and it's an old one I clearly hadn't learned hard enough. A status page is the provider telling you what they've noticed and chosen to admit, on their schedule. It is not a measurement of whether your thing works. Those are different questions, and only one of them matters to your users at 3pm on a Tuesday.

The fix is boring and it's the same fix every time. Monitor from outside, against the actual user-facing endpoint, from a place that isn't the same cloud you're hosting in. I had external checks, but they were running in the same region that was wobbling, which is a bit like asking the patient to take their own pulse. I've moved them to a different provider entirely now. If both my host and my monitoring go down together, I'd quite like that to require two unrelated companies to fail at once, not one.

Don't trust the green light. Trust your own probe.