There was a fairly grim run of cloud outages earlier this month, the kind where half the web has a bad afternoon because one provider's control plane has a worse one. I am being deliberately vague on the exact post-mortem because the specifics never matter as much as we pretend they do. A region wobbles, identity or networking takes the brunt, and a long tail of services that swore they were independent discover they all leaned on the same load-bearing wall. We have watched this film several times now and it always has the same twist: the thing that broke you was a dependency you had forgotten you had.
What got me this week was not the headline outage but a smaller, quieter cousin of it. A managed service we use posted a deprecation notice with a cutover date, and the cutover date arrived, and a job that had run untouched for two years simply stopped. No drama, no alert from the provider that morning, just a silent change of behaviour on a thing nobody had looked at since they built it. The email had gone out. It had gone to a distribution list that, somewhere in a reorg, had stopped pointing at anyone who would act on it.
That is the failure mode that actually bites, and it is almost never technical. The technical part of a deprecation is trivial: change the endpoint, bump the version, redeploy. The hard part is the human relay. Does the notice reach a person? Does that person own the system? Do they have time before the date? Every quiet outage I have ever cleaned up traces back to a break somewhere in that chain, not to the change itself.
So I did the unglamorous thing. I went looking for our own forgotten walls. Anything pinned to a deprecated API version, anything reading a config from a service with a sunset date, anything where the runbook says "ask Dave" and Dave left in 2023. It is not a sexy afternoon. It is grep and a spreadsheet and a lot of "oh, that's still running?"
The uncomfortable conclusion is that resilience is mostly bookkeeping. You cannot out-architect a deprecation you never heard about. The clever distributed-systems work is real and necessary, but the thing that takes you down on a Tuesday is usually a notice that landed in a dead inbox about a dependency you stopped thinking about the day after you shipped it. The outages make the news. The slow rot of who-owns-what is what actually gets you, and it does it without a status page.