Ramblings of an aging IT geek
← Ramblings of an aging IT geek
debugging

The Bug Was in My Head, Not the Function

A timestamp that looked wrong for three hours turned out to be exactly right, and the bug was the assumption I refused to question.

A terminal showing a debugger session

The point first, because I wasted half a day on this and you needn't: the function was correct the whole time. The bug was a thing I believed so firmly that I never thought to check it. Specifically, I was convinced our event timestamps were UTC. They weren't. They had been local time since a config change nobody mentioned, eighteen months before I joined.

The symptom was that a batch job emitted records that looked exactly one hour into the future. Reasonable people, presented with "data from the future", go hunting for an off-by-one in a date calculation. So I did. I read the parsing code. I read the formatting code. I read the bit in the middle that I'd written myself and was therefore certain was fine. All correct. The numbers still came out an hour ahead.

A close-up of code on a screen

What I did wrong was treat one fact as an axiom. "The timestamps are UTC" wasn't something I'd verified that morning, it was something I'd absorbed from a README and never tested. Every hypothesis I formed sat on top of it. When your foundational assumption is wrong, every correct deduction built on it lands you somewhere wrong, and the maddening part is that the logic feels airtight the whole way down.

The thing that broke it open was boring. I stopped reasoning about the code and printed the raw bytes coming off the queue, before any of my code touched them. There it was: a +01:00 offset, sitting in the payload, plain as day. The producer was stamping local time. My code was doing exactly what it should with the value it was given. The hour wasn't a bug I'd introduced, it was the actual offset between Europe/London and UTC in late January, which is to say, an hour, because we were not yet in summer time.

I've started keeping a small discipline about this. When I'm stuck for more than about twenty minutes, I write down the assumptions the bug "can't" be in, the ones I haven't actually looked at because they're obviously true. Then I go and look at one of them. Half the time the bug is sitting there, smug, in the first thing on the list. It's humbling how often "obviously true" means "last verified by someone else, possibly years ago, possibly never".

The fix was a one-line change to normalise at ingest and a slightly embarrassing note in the postmortem. The lesson was worth more than the fix. Code is honest in a way that beliefs aren't. The code will do precisely what you told it. Your assumptions will tell you whatever lets you stop thinking. So when the evidence and your mental model disagree, the evidence is winning, and the sooner you let it, the sooner you go home.