the off-by-one four of us read and none of us saw

A terminal showing a stack of code under inspection

Support raised a ticket that a customer's export was "missing some rows". Not all of them, which would have been a clear failure, just some. The maddening kind. When a thing is mostly right, your brain wants to file it under "user error" and move on, and that instinct is exactly how this one survived as long as it did.

I pulled the data. The export was a paginated job, fetching a page of records at a time and writing them out. Total expected: a few thousand rows. Total written: a few dozen short. Then the pattern landed. The number missing was equal to the number of pages. One row per page, gone.

When the count of missing things matches the count of pages, batches or iterations, you have an off-by-one at a boundary. Stop reading the data and read the loop.

Here is roughly what the page loop looked like, lightly disguised.

for i in range(0, len(page) - 1):
    writer.write(page[i])

There it is. range(0, len(page) - 1) stops one short of the end, so the last record on every page silently never gets written. On a single page you'd notice. Spread across a hundred pages, you lose a hundred rows out of thousands, the totals look plausible, and nobody counts.

A close-up of a loop boundary highlighted in source

The fix is the most boring diff imaginable.

for record in page:
    writer.write(record)

What I keep chewing on is why four of us signed off on the original. This went through review. I read it, two others read it, the author read it more than anyone. And the reason it passed is that len(x) - 1 is a shape your eye has seen a thousand times in correct code: indexing the last element, computing a final boundary, all the legitimate places that subtraction belongs. It pattern-matches as "this person knows what they're doing with indices". Review catches code that looks wrong. This looked right. That's the trap with off-by-ones: they're not careless, they're plausible.

A few things came out of it. We stopped hand-rolling index loops where iterating the collection directly would do, because the version with no index has no boundary to get wrong. We added a cheap assertion to the export that the rows written equalled the rows fetched, the sort of total-counts check that would have screamed on the first run. And in review I now treat any + 1 or - 1 near a loop bound as a thing to actively justify out loud, not skim past.

The bug was trivial. The lesson wasn't: the dangerous defects aren't the ones that look wrong, they're the ones that look exactly like the correct code sitting right next to them.