Ramblings of an aging IT geek
← Ramblings of an aging IT geek
personal

the book that taught me failure is normal, not exceptional

How Charles Perrow's Normal Accidents changed the way I think about complex systems and the inevitability of certain failures.

A coffee and a stack of books on a quiet afternoon

Most of the books I read about systems are written by engineers for engineers, and they're useful, but they share an unspoken assumption: that with enough care, failure is avoidable. Charles Perrow's Normal Accidents takes a sledgehammer to that, and it's the most quietly unsettling thing I've read in ages. Perrow was a sociologist studying things like Three Mile Island, and his argument is that in systems with enough complexity and tight coupling, certain accidents aren't bad luck or negligence. They're a property of the system. They are, in his word, normal.

The two ideas that lodged are complexity and coupling, and crucially that they're different axes. Complexity is how many ways parts can interact in ways you didn't foresee. Coupling is how little slack there is between them, how fast a problem propagates before anyone can intervene. A system that's complex and tightly coupled is one where small faults combine in unexpected ways and then race through the system faster than humans can respond. Sound like anything you've operated?

I think about my own incidents differently now. The worst ones were never a single big mistake. They were two or three small, individually reasonable things that interacted in a way nobody had drawn, and then propagated before anyone could pull the cord. Perrow's gift isn't despair, it's permission to stop hunting for the one guilty root cause and instead ask whether the system was built to make that class of accident normal. The fix is usually to add slack, to loosen the coupling, to buy yourself the seconds a human needs to notice. It hasn't made me a better coder. It's made me a calmer one, which on a bad night is worth more.