The bug was one character. The review had four approvers, counting me. The test suite was green. And for about three weeks, every paginated export quietly dropped exactly one record per page, which on a hundred-row page is a one percent error and on a finance report is a very awkward conversation.
I want to write this one down properly, not because off-by-ones are interesting (they are the oldest joke in the trade) but because of how thoroughly it got past us. Four people read this code with intent to find faults. It had tests. It still shipped. That is the part worth understanding.
the code
The function paginated a result set for export. Take a page number and a page size, slice the right window out of the data. Here is the shape of it, lightly anonymised:
func pageSlice(rows []Record, page, size int) []Record {
start := page * size
end := start + size
if end > len(rows) {
end = len(rows)
}
if start >= len(rows) {
return nil
}
return rows[start:end]
}
Look at it. It is fine. It is, in fact, correct. The off-by-one was not here, which is exactly why it survived. We all stared at this function, the obvious suspect, nodded, and moved on. The bug was one layer up, in the caller that decided how many pages there were.
// the actual culprit
totalPages := len(rows) / pageSize // integer division, no rounding up
for page := 0; page < totalPages; page++ {
export(pageSlice(rows, page, pageSize))
}
With 250 rows and a page size of 100, totalPages is 2. Pages 0 and 1 export rows 0–199. Rows 200–249 are simply never asked for. The slice function would have handed them over perfectly happily; nobody ever called it for page 2. The data wasn't corrupted, it wasn't logged as an error, it just quietly wasn't there.
why the tests were green
Here is the genuinely uncomfortable bit. There were tests. They tested pageSlice thoroughly, with empty inputs, single rows, exact multiples, the lot. And they tested the export loop, but the fixture had 200 rows and a page size of 100. Two hundred divides by a hundred exactly. There was no remainder, so there was no missing final page, so the bug was structurally invisible to the test.
We had tested the part that worked and chosen, by accident, the one input that hid the part that didn't. The data that triggers an off-by-one is the data that doesn't divide evenly, and our fixture divided evenly because round numbers feel tidy and a human picked it.
why four reviewers missed it
I have thought about this more than is healthy. A few reasons, none flattering.
- We all looked at the same suspect. The slicing function looks like where a pagination bug would live, so attention pooled there. The arithmetic in the caller looked like setup, like glue, like the boring bit, and boring code does not get read, it gets skimmed.
- Integer division doesn't look like a bug.
len(rows) / pageSizereads as plain English: the number of pages. The trap is that it silently rounds down, and rounding down is exactly wrong for "how many pages do I need". You want a ceiling, not a floor. But nothing on the line warns you; it is correct-looking code that does the wrong thing. - Approval is contagious. By the time the third reviewer arrived, two people had approved. I would love to say that didn't soften my attention. It did.
the fix and the lesson
The fix is the standard ceiling-division idiom, with a comment so the next person doesn't "simplify" it back:
// ceiling division: we need a final page for the remainder
totalPages := (len(rows) + pageSize - 1) / pageSize
The deeper fix was the test. The new fixture has 250 rows against a page size of 100, deliberately not a clean multiple, and asserts that the last partial page comes through with all fifty of its rows. A test that only uses round numbers is testing the easy half of the problem.
What I took away is less about off-by-ones and more about review. We are good at scrutinising the code that looks dangerous and we wave through the code that looks like plumbing, and bugs know this, so they live in the plumbing. The arithmetic that decides how many times a loop runs deserves at least as much suspicion as the body of the loop. And if a fixture's numbers are suspiciously tidy, that tidiness is doing you a quiet disservice. One percent of a finance report is the kind of small wrong that nobody notices for three weeks and everybody remembers afterwards.