Finance noticed before the monitoring did, which is never a comfortable order for these things to happen in. A daily aggregation job, fed a start date and an end date, was producing totals that were consistently a touch low. Not wildly wrong, not obviously broken, just a bit under what the source data said it should be. The kind of discrepancy that gets blamed on rounding for a fortnight before someone insists on actually counting.
The job iterated a date range and summed a value for each day. The range came in as a start and an end, and the loop walked from one to the other. Here's the shape of it:
def iter_days(start, end):
d = start
while d < end: # the bug is this one character
yield d
d += timedelta(days=1)
The loop condition is d < end. Strictly less than. So a range of the 1st to the 5th yields the 1st, 2nd, 3rd and 4th, and never the 5th. The last day of every range was silently dropped. For a month-long report you process thirty days instead of thirty-one and the total comes out about three percent light, which is precisely the size of error that looks like rounding and isn't.
Now, d < end is not wrong in the abstract. A half-open interval, start inclusive and end exclusive, is a perfectly respectable convention and arguably the right default. The bug was a mismatch of conventions. The callers passed end dates they thought were inclusive, because that's how a human reads "from the 1st to the 5th", and the loop treated end as exclusive. Both sides were internally consistent and they disagreed at the boundary, which is where off-by-ones always live: not inside the range, but at the edge nobody's test bothered to check.
And the tests, naturally, didn't catch it. They checked a single day, start and end the same date, expecting one row out. With a half-open loop a single-day range where start equals end yields nothing at all, so you'd think that test would have screamed. It didn't, because the test had been written to pass: it set end to the day after start, "to get one day out", which quietly encoded the off-by-one into the test itself. The test and the code agreed because they shared the same misunderstanding. Two wrongs making a passing suite.
The fix was a decision, not just a character. I made the interval half-open everywhere, documented it on the function, and changed the callers to pass an exclusive end. The alternative, flipping the loop to d <= end, would have worked too, but inclusive ranges have their own quiet traps once you start subtracting them or chaining them together, and I'd rather have one convention I can trust than two I have to keep straight in my head. Then I rewrote the test to span several days and assert the exact set of dates yielded, both ends included in the assertion. A boundary bug needs a test that names the boundary. Anything vaguer just learns the same wrong answer the code already knows.