Ramblings of an aging IT geek
← Ramblings of an aging IT geek
personal

The Book That Made Me Stop Drawing Boxes and Start Drawing Loops

How Donella Meadows' Thinking in Systems changed the way I reason about feedback, delays and the systems I actually run.

Coffee and books on a table

I have read a lot of books that promised to change how I think and quietly didn't. Donella Meadows' Thinking in Systems is the one that actually stuck, and it stuck in an inconvenient way: I cannot un-see it now. Every dashboard, every retro, every bit of capacity planning, I find myself reaching for the same handful of ideas. It is a slim book, written for a general audience, and it has more practical use to me as an engineer than most of the architecture tomes on the shelf above it.

The short pitch is that a system is not its parts. It is the parts plus the connections plus the behaviour those connections produce over time. We are trained, especially in this trade, to break things into components and reason about each one. Meadows' argument is that the interesting behaviour almost never lives in a component. It lives in the loops between them.

Stocks, flows and the things that actually move

The first thing the book gives you is a vocabulary, and it turns out to be a good one. A stock is something that accumulates: water in a bath, money in an account, messages in a queue, unprocessed work in a backlog. A flow is the rate at which a stock changes: the tap and the drain. That is almost embarrassingly simple, and then you start noticing that a huge amount of operational pain is just stocks and flows behaving exactly as they must.

A queue backing up is a stock filling because inflow exceeds outflow. Obvious when you say it like that. But the framing changes what you do about it. You stop staring at the instantaneous depth and start asking about the two rates and the gap between them. You stop treating a full queue as a surprise and start treating it as arithmetic.

A wide landscape view

The bit that genuinely rewired me was delays. Every real system has lag between an action and its effect, and we are terrible at reasoning about lag. Meadows uses the example of a shower with a slow boiler: you turn the tap, nothing happens, so you turn it further, and then a minute later you are scalded and you overcorrect the other way. That oscillation is not because anyone is stupid. It is the structure of the system meeting a human who cannot perceive the delay.

I have been that person on the shower tap with an autoscaler. Traffic rises, the scaler reacts, but the new capacity takes two minutes to be ready, so during those two minutes the signal keeps screaming and the scaler keeps adding, and then five minutes later you are paying for three times the fleet you need and it is scaling back down into the next dip. The fix was not a cleverer threshold. It was respecting the delay: longer windows, cooldowns, and accepting that a system with lag will always overshoot if you ask it to react instantly.

Where to push

The part of the book everyone quotes is the list of leverage points, the places where a small change produces a large effect. I will not reproduce the whole list, partly because Meadows herself was wary of it being treated as a tidy ranking. The shape of the idea is what matters. The leverage points we reach for most readily, parameters and numbers, tuning a constant here and a buffer there, are the weakest. The ones with real power, the structure of the feedback loops, the goals of the system, the rules, the very paradigm the thing is built around, are the hardest to touch and the ones we mostly leave alone.

That lands hard if you have spent a career adjusting numbers. So much of what I have done under pressure is twiddle a parameter: bump a timeout, raise a limit, add a replica. It works, sometimes, and it is the weakest possible intervention. The reason a service keeps falling over is rarely that one constant was slightly wrong. It is usually that the loop is structured to drive it over, and no amount of constant-twiddling changes the loop.

The flip side is encouraging. When I have managed to make a system genuinely calmer, in hindsight it was nearly always a structural change rather than a tuning one. Adding backpressure so that an overloaded service slows its callers instead of falling on its face. Changing a retry policy so that failure does not amplify into a storm. Those are loop changes. They are the leverage points the book points at, and they are durable in a way that a tuned threshold never is.

Why systems resist you

The chapter I think about most is the one on why systems surprise us, and on the ways they defend themselves against being fixed. Policy resistance is the one I see constantly: you push on a problem, the system pushes back, and the symptom you fought to suppress reappears somewhere else. You cap one queue and the pressure simply relocates to the next one upstream. The system has a goal, often an emergent one nobody chose deliberately, and it will defend that goal against your tinkering.

There is a humility in that which I have come to value. It is very easy in engineering to believe that a system is misbehaving and needs correcting, when in fact it is behaving exactly as its structure demands and the structure is the thing at fault. Meadows is gentle but firm on this. She talks about working with systems rather than bullying them, about watching their behaviour for long enough to understand the loops before you reach in and start changing things. Most of my worst operational decisions came from reaching in too fast.

She also has a wonderful, slightly unsettling line about bounded rationality: that people behave rationally given the information they can actually see from where they sit, and that if you put a different person in the same position with the same limited view, they will mostly make the same choices. That is the systems-thinking version of blamelessness, and it arrives at it from first principles rather than from kindness. The operator who made the bad call was responding sensibly to the signals available. Change the signals, change the loop, and you change the behaviour. Replace the operator and you change nothing.

What I actually do differently

I draw differently now. When I sketch a system on a whiteboard I used to draw boxes and arrows, components and the calls between them. Now I find myself drawing loops, marking where things accumulate and where the delays are, and asking what behaviour the structure will produce over time rather than just what calls what. It is a small change in notation that turns out to be a large change in what I notice.

I am more patient with slow systems and more suspicious of fast fixes. I assume there is a delay I have not accounted for. I assume the obvious lever is the weak one. And I try, not always successfully, to watch a system's behaviour for a while before deciding I know what is wrong with it.

It is not a long book and it is not a technical one, which is exactly why I keep pressing it on people. The ideas are general enough to apply to a queue, a team, a budget or a climate, and that generality is the point. If you run anything that has feedback in it, which is everything worth running, it is worth the afternoon. Mine has more than earned its place on the shelf, and the loops it taught me to see are still, annoyingly, everywhere I look.