the off-by-one that three of us signed off on

A terminal showing a failing assertion in red

The bug was a classic off-by-one, the kind we all claim we'd never write. It dropped exactly one record from every page of a paginated export. Not the first page, not a crash, nothing loud: just a single row quietly missing per page, every page, forever. It went out in a release that three engineers reviewed, that had tests, and that sat in staging for a week. I was one of the three. So this is partly a confession.

The point I want to land before the story runs away with me: the arithmetic was trivial and the bug was obvious in hindsight, but it survived because every layer of our process was looking at the change the same way the author was. Review isn't a second pair of eyes if the second pair reads the diff in the same direction the first pair wrote it.

what the code did

The export feature pulled records out of the database in pages and streamed them to a CSV. Someone refactored the pagination from a clumsy offset-based loop into something tidier that tracked a cursor. Here's the shape of the change, simplified:

def fetch_page(cursor, page_size):
    rows = db.query(
        "SELECT id, name, amount FROM records "
        "WHERE id > %s ORDER BY id ASC LIMIT %s",
        (cursor, page_size),
    )
    next_cursor = rows[-1].id if rows else None
    return rows, next_cursor

That looks fine, and on its own it is fine. The bug wasn't here. The bug was in the loop that called it:

cursor = 0
while True:
    rows, cursor = fetch_page(cursor, PAGE_SIZE)
    if not rows:
        break
    for row in rows[1:]:   # <- here
        writer.writerow(row)

rows[1:]. Every page, the first row got skipped. The author had been wrestling with an earlier version where the cursor row was being returned again at the top of the next page, a genuine off-by-one in the boundary condition, and they'd "fixed" it by slicing off the first row of every page rather than by fixing the boundary. The real bug was a > that should have stayed > while the cursor was seeded correctly; instead they patched the symptom one layer up and the two errors didn't quite cancel.

So on page one you lost record one. On page two you lost the first record of page two. And so on. With a page size of five hundred and a few hundred thousand records, you silently dropped several hundred rows out of an export that finance used to reconcile payments. Nobody noticed for weeks because the totals were close enough to look right and nobody had a known-good number to check against.

why the tests didn't catch it

We had tests. That's the uncomfortable part. There was a unit test for fetch_page that asserted it returned the right rows for a given cursor, and it passed, because fetch_page was correct. The bug lived in the caller, and the caller's test used a dataset of three records with a page size of ten. One page, three rows, slice off the first, assert you got two. Except the assertion had been written against the buggy output, so it asserted you got two rows and called that correct.

That's the trap with tests written after the code: they encode the behaviour you have, not the behaviour you want. The test author looked at the function, saw it returned two of three rows, decided that looked plausible, and froze it. A test that asserts the bug is worse than no test, because it actively defends the bug from the next person who might fix it.

A diff with an added line slicing the first element off a list

why review didn't catch it

Three of us approved the pull request. I've thought about why, because "we weren't paying attention" is too easy and not actually true. We were paying attention. We were just paying attention to the wrong thing.

The diff was a refactor from offset pagination to cursor pagination, and it was a nice refactor. The cursor approach is genuinely better: it doesn't drift when rows are inserted mid-export, and it doesn't get slower as the offset grows. So all three reviewers read the diff as "is this a good refactor?" and the answer was yes. We were admiring the new structure. The rows[1:] was a single small line in a much larger and more interesting change, and it read as deliberate, because it was deliberate. The author meant to write it. It just shouldn't have been there.

That's the thing about review that I keep relearning. We're good at catching code that looks wrong. We're terrible at catching code that looks right and is wrong, especially when it's surrounded by code that's genuinely an improvement. A confident, tidy diff lowers your guard. The bug rode in on the coat-tails of a good change.

how it finally surfaced

Finance flagged a reconciliation that was off by a few hundred quid across a month. They assumed it was their end and spent a day on it before someone thought to compare the export's row count against a straight SELECT COUNT(*) over the same date range. The counts didn't match. The export had fewer rows than the table. Once you know rows are missing, the pattern jumps out instantly: the gaps were evenly spaced, one every page-size records, and from there it's a thirty-second read of the loop to spot the slice.

The fix was to delete [1:] and correct the cursor seeding so the boundary was handled in exactly one place. Seed the cursor below the lowest possible id, use a strict >, and never return the cursor row twice in the first place. One source of truth for the boundary, instead of two errors arranged to almost cancel.

what actually changed afterwards

The code fix took two minutes. The process changes took longer and mattered more.

First, the export now logs the count it produced and the count it expected, and screams if they differ by more than zero. A reconciliation job that can't reconcile itself is no use. If a future refactor drops rows again, it'll be a failed job at the time, not a confused email a month later.

Second, we changed how we review boundary code specifically. When a diff touches pagination, ranges, slicing, or any loop with an index, the reviewer is asked to do one concrete thing: pick the smallest interesting dataset, two or three records spanning a page boundary, and trace it by hand. Not read the code and nod. Trace the data through it. It's slower and it's tedious and it catches exactly the class of bug that reading-for-structure misses.

Third, and this is the one I push hardest, we treat tests written after the implementation with suspicion. If a test was written to match code that already exists, it can only ever assert "this code does what this code does." The interesting tests are the ones that encode a requirement from outside the code: the export must contain every record in the range. That test, written against the requirement rather than the function, would have failed on day one.

The arithmetic in an off-by-one is never the hard part. A child can spot a fencepost error once you point at it. The hard part is that it hides inside a change everyone approves of, defended by a test that was taught to expect it, in a diff that's otherwise a genuine improvement. The fix is one character. The lesson is to read code against the data it'll see, not against the story the author tells you about it, and that one's taken me years and several confessions like this one to halfway learn.