Ramblings of an aging IT geek
← Ramblings of an aging IT geek
debugging

the off-by-one that three of us missed

An off-by-one error in a pagination loop that passed code review, passed tests, and only showed up at a record count nobody had tried.

A bug crawling across a terminal screen

New Year's Day, and I am thinking about a bug from a few weeks ago that I still cannot quite forgive myself for. An off-by-one. The oldest mistake in the book, and it sailed through review with two approvals and a green test suite.

The code paginated through a result set in batches. It computed how many pages it needed, then looped over them fetching each batch. Read it out loud and it is obviously correct. Three of us read it, more or less out loud, and all three of us agreed it was obviously correct.

Where the boundary bit

The page count was computed with a plain integer division. If you have 100 records and a batch size of 25, you get 4 pages, and everything is fine. The loop runs 0, 1, 2, 3 and you have your 100 records.

The trouble is the case where the total does not divide evenly. With 105 records and a batch of 25, integer division gives you 4. The loop fetches pages 0 through 3, that is 100 records, and the last 5 are silently left behind. No error, no warning, no crash. Just five rows that quietly never made it.

# what we had
pages = total // batch_size

# what it should have been
pages = -(-total // batch_size)   # ceiling division

That ceiling-division trick with the double negation is ugly, and I would normally reach for a clearer helper, but the point stands: we needed to round up, and we rounded down.

Source code displayed on a dark editor

Why review and tests both waved it through

The tests passed because every fixture had a record count that happened to be a clean multiple of the batch size. Of course it did. When you write the test you reach for a tidy round number, and tidy round numbers are exactly the inputs that hide this class of bug. The test was not testing the boundary, it was confirming the happy path and looking confident about it.

Review missed it for the same reason a fresh pair of eyes misses anything you read at the level of intent rather than arithmetic. We all saw "loop over the pages" and nodded. Nobody sat down and asked what happens when total mod batch is non-zero, because the code reads as if that case cannot exist.

The fix took thirty seconds. The lesson took longer to land. Off-by-ones do not survive because they are subtle, they survive because they hide in the one input you did not think to try, and your tests share your blind spots exactly. Now when I see integer division near a loop bound, I deliberately pick a deliberately awkward number and check the edges. It is not cleverness. It is just refusing to trust the round number.