Ramblings of an aging IT geek
← Ramblings of an aging IT geek
performance

the query that was fine until it wasn't

A report query that ran in milliseconds for a year and then started taking thirty seconds, traced to a missing index that only mattered once a table grew past a million rows.

A monitoring graph showing query latency climbing over time

The query had been in production for a year and nobody had ever thought about it, which is exactly the problem. It backed a report page that loaded instantly when we launched, so it never showed up in any slow-query log, never tripped an alert, never got a second look. Then one Tuesday the page started spinning for half a minute and the support queue filled up.

EXPLAIN ANALYZE told the whole story in one line: a sequential scan over a table that had quietly crossed a million rows. The query filtered on a column that had no index, because back when the table held a few thousand rows a full scan was free. It cost nothing right up until it cost everything, and there is no graceful warning in between. The line on the latency graph just bends upward and then it is a Tuesday.

The fix was a single index and the report dropped back under fifty milliseconds.

CREATE INDEX CONCURRENTLY idx_events_account_created
  ON events (account_id, created_at);

What I took from it was not "add indexes", everyone knows that. It was that performance bugs grow in the dark. A query that is fast on launch-day data tells you nothing about how it behaves at a hundred times the volume. These days when I write a query that filters or sorts, I ask what the table looks like in a year and run EXPLAIN against something the right size, not the cosy empty thing in front of me.