Ramblings of an aging IT geek
← Ramblings of an aging IT geek
debugging

the off-by-one three of us missed

A batching loop that silently dropped the last record, approved by two reviewers, and how it finally surfaced.

A terminal with a stack trace and highlighted line

It shipped with two approvals on the pull request. Mine was one of them. The code paged records in batches and the loop bound looked right at a glance:

for i := 0; i < len(records); i += batchSize {
    end := i + batchSize
    if end > len(records) {
        end = len(records)
    }
    flush(records[i:end])
}

That bit is fine, actually. The bug was upstream, where we computed total := count - 1 to "skip the header" on input that didn't always have a header. When there was no header we lost the last real record on every run. Small enough that nobody noticed for weeks, because the missing record was usually the most recent one and got re-sent on the next poll anyway. It only became visible when a customer ran a one-shot export and came up exactly one short.

Three sets of eyes looked at count - 1 and all three brains read it as obviously correct, because it usually is. That's the trap with off-by-ones: they hide inside idioms you've typed a thousand times. The diff was clean, the variable was well named, the intent was clear. The intent was just wrong for one input shape.

What caught it in the end wasn't a sharper reviewer. It was a test that asserted on the actual count out, fed both with and without a header. We had tests, plenty of them, but every fixture happened to start with a header line. The reviewers couldn't have caught what the fixtures never exercised. I trust property checks and boundary fixtures over a careful read now. My careful read approved this one.