Ramblings of an aging IT geek
← Ramblings of an aging IT geek
linux

the one panic that didn't waste my week

A kernel panic that reproduced on demand the moment a particular filesystem operation ran, and why a deterministic crash is a small mercy.

A terminal displaying a kernel stack trace

I wrote yesterday about a panic that only showed up under load. Today's was its better-behaved cousin: it reproduced the instant a specific operation ran, every time, no warm-up required. After a career of chasing ghosts, a crash with a clear trigger feels like cheating.

The trigger was a particular rsync against a freshly mounted volume. Run it, and the box would panic within seconds with a trace pointing at the filesystem layer. Don't run it, and the machine would happily stay up for weeks. There's nothing quite like a one-line reproduction case to lift the mood of a debugging session.

What made it deterministic was that the bug lived in a code path the operation hit immediately, rather than one that only fired when some buffer filled or some race lined up. No timing, no load threshold, no phase of the moon. Just: do the thing, get the panic. I could bisect kernels against it in an afternoon instead of waiting days between each "well, it hasn't crashed yet".

The fix, once I'd narrowed it down, was a kernel upgrade that carried the relevant patch. Unglamorous. But I'll take a boring fix to a reproducible bug over a clever theory about an intermittent one any day of the week.