Ramblings of an aging IT geek
← Ramblings of an aging IT geek
personal

the book that taught me to stop blaming the operator

How reading about how complex systems fail rewired the way I run postmortems and think about resilience.

Coffee and books on a table

I came to Sidney Dekker's writing on human error late, and I rather wish I'd found it a decade earlier. The short version of the idea: when something goes wrong in a complex system, "human error" is not an explanation, it's the place you stopped asking questions.

For years my instinct after an outage was to find the change, find the person, and feel quietly relieved it wasn't me. Dekker's argument is that the person who pushed the button was almost always doing something reasonable given what they could see at the time. The interesting question is why the system made the wrong action look right. That reframe is annoyingly hard to unlearn once you've absorbed it.

It changed how I write postmortems. Less "operator ran the wrong command", more "why did the runbook and the dashboard agree that this was safe". The blameless thing isn't soft, it's just more accurate, and it gets you fixes that actually hold rather than a stern note about being careful next time.

I think about it on the bad days, when something I built bites someone. The honest answer is usually that the sharp edge was there all along, and they were the unlucky one who reached for it. Worth a read if you run anything that pages people at 3am.