Ramblings of an aging IT geek
← Ramblings of an aging IT geek
performance

the orm that hid a thousand queries

An N+1 query problem buried inside an ORM that turned one list view into hundreds of round trips, and the eager-loading fix that killed it.

A latency graph spiking above a server rack

The page that was slow had no business being slow. It listed users, maybe fifty of them, with a little badge next to each showing how many open tickets they had. That's it. A list and a count. Yet it crept up over months from snappy to sluggish to "why is this taking three seconds", and nobody could point at a single query and say, there, that one's the problem. Because there wasn't one query. There were several hundred.

This is the N+1 problem, and if you use an ORM you will meet it eventually. It hides well because each individual query is fast. The damage is in the count, not the cost.

How it looked

The code was unremarkable. Fetch the users, then for each user, read their ticket count. In the ORM, that second step looked like an innocent attribute access, the kind of line you'd never think to question:

users = User.objects.filter(active=True)
for user in users:
    rows.append({
        "name": user.name,
        "open_tickets": user.tickets.filter(status="open").count(),
    })

The first line is one query. The loop is one more query per user, every iteration, quietly, behind that user.tickets access. Fifty users, fifty-one queries. When the active user count grew past a few hundred, so did the query count, and each one carried the fixed cost of a round trip to the database whether it returned data or not. The query log told the whole story once I bothered to read it: page after page of near-identical SELECT COUNT(*) statements, each taking two milliseconds, adding up to seconds.

A wall of near-identical query log lines

Finding it

The trick was turning on query logging for a single request and counting. I added a bit of middleware that logged the number of database queries per request in development, and the number for that page came back as 312. For a page showing one list. That number is the smell. Any request issuing hundreds of queries to render one screen is almost certainly doing N+1 somewhere, and the fix is nearly always the same shape.

You don't even need fancy tooling. Tail the database's own slow query log, or flip on general query logging for a minute in a quiet environment, load the page once, and count the lines. If one page action produces a flat wall of structurally identical queries that differ only by an id, you've found it.

The fix

You tell the ORM to fetch the related data in bulk, up front, instead of lazily per row. The exact incantation varies by framework, but the idea is universal: one query for the users, one more query for all their tickets, joined or grouped in the database where that work is cheap.

from django.db.models import Count, Q

users = User.objects.filter(active=True).annotate(
    open_tickets=Count("tickets", filter=Q(tickets__status="open")),
)
for user in users:
    rows.append({"name": user.name, "open_tickets": user.open_tickets})

Now the count is computed by the database in a single aggregated query alongside the user fetch. Two queries total instead of three hundred. The page dropped from three seconds to about forty milliseconds, and most of that forty was template rendering, not the database.

A flame graph collapsing after an optimisation

What I took from it

The ORM didn't lie to me, exactly. It did precisely what I asked, which was to lazily fetch related objects on demand. The problem was that "on demand" inside a loop means "once per iteration", and the abstraction made that invisible. A line that reads like a property access was actually a network call.

So now I'm suspicious of any loop that touches a related object, and I keep that query-counting middleware switched on in development permanently. It's a single number on every page, and the day it jumps from single digits into the hundreds, I know exactly what I did and where to look. The slow page never alerted anyone, never threw an error, never did anything but get gradually worse. Those are the ones that get you, and a query counter is cheap insurance against the next one.