Most kernel panics I've chased have been ghosts. The box falls over at three in the morning, nothing useful makes it to disk because the disk subsystem is exactly what died, and you're left reading tea leaves in a half-flushed log. So when one of my homelab machines started panicking and I found I could make it panic on command, I was almost cheerful about it. A bug you can summon is a bug you can kill.
The machine was an old box I'd repurposed as a NAS, running a stable distro kernel, nothing exotic. It would lock hard and reboot, sometimes after days, sometimes after hours. The first job was getting the panic to actually survive the panic, because a hard lock that drags the whole machine down with it tells you nothing. I set up netconsole so the dying kernel would spit its last words at another box over UDP before everything went quiet.
kernel: netconsole: network logging started
# on the receiver
nc -u -l 6666
With that in place I caught the trace, and it pointed squarely at the network driver under load. The r8169 driver, a Realtek NIC, the kind of cheap onboard gigabit that's in half the consumer hardware ever made. The panic landed in the driver's receive path. That gave me something to poke at, and poking is where reproducibility earns its keep.
I threw traffic at it. A sustained iperf3 run from another machine, pushing the link as hard as it would go, and within a couple of minutes the box would go down every single time. That's the moment a frustrating intermittent fault turns into an engineering problem. I could change one thing, run the test, and get a clear yes or no in two minutes. No more waiting days to find out whether a guess was right.
The trail led where these things often lead with that family of NICs: a combination of an offload feature and the driver mishandling a particular packet pattern under pressure. I didn't need to fix the driver, I needed to stop standing on the broken bit. Turning off the relevant offloads with ethtool took the load off the path that was crashing.
ethtool -K eth0 tso off gso off gro off
After that, the same iperf3 run that reliably killed the box in two minutes ran flat out for an hour without a flicker. Throughput dropped a touch because the CPU was now doing work the NIC had been doing badly, but a slightly slower NAS that stays up beats a fast one that reboots itself. I made the ethtool settings persistent and moved on.
The real takeaway isn't about Realtek NICs, though if you run them you'll meet this eventually. It's that reproducibility is most of the fight. The panic itself was unremarkable once I could see it. What turned it from a haunting into a half-afternoon's work was getting the trace off the dying box with netconsole, and finding a load test that triggered it on demand. Once a bug shows up when you ask it to, you've basically already won; the rest is just turning knobs and reading the result. The ones that stay scary are the ones you can't summon.