Ramblings of an aging IT geek
← Ramblings of an aging IT geek
golang

threading context.Context through everything, and why it's worth it

A practical account of adopting context.Context properly in Go for cancellation and deadlines, including the mistakes I made along the way.

Go source code on a screen

For a long time I treated context.Context as a thing Go made me pass around to keep the compiler and the linters happy. It was the first argument to every function that talked to a database or made an HTTP call, I dutifully threaded it through, and I had only the haziest idea of what it was actually doing. Then I had an outage that context would have prevented, and I went and learned it properly.

The point of context, the thing it is genuinely for, is cancellation and deadlines that propagate. When a request comes in and the client gives up, you want every piece of work spawned on that request's behalf to also give up. The database query, the call to the downstream service, the goroutine you fired off to do something in parallel. Without context, all of that work carries on, oblivious, doing nothing useful and holding resources, long after anyone cares about the answer. That is exactly the outage I had: a downstream service slowed to a crawl, requests piled up, and because nothing was cancelled, the goroutines and connections accumulated until the whole thing fell over.

the mistakes I made first

My early use of context was cargo-culted, and it showed.

The first mistake was storing things in context that had no business being there. Context has a Value method, and it is tempting to use it as a general-purpose bag for passing data down the call stack. Request-scoped values, you tell yourself. In practice I was using it to avoid adding proper function parameters, and the result was code where you could not tell what a function actually depended on without reading its whole body. The rule I now hold to: context carries request-scoped data that crosses API boundaries, like a trace ID or auth info, and nothing else. Business data goes in explicit parameters where the type system can see it.

A terminal with Go code

The second mistake was passing context.Background() everywhere because I could not be bothered to thread the real one through. This quietly defeats the entire mechanism. Background() is the root context that is never cancelled and has no deadline. Every time you reach for it deep in a call chain instead of accepting the context that was handed to you, you sever the chain of cancellation. The work below that point can no longer be told to stop. I had a function six layers down that did this, and it was precisely the function holding the database connection open during my outage.

doing it properly

The mental model that finally made it click: context flows downwards, like water. It enters at the top, usually from the incoming request, and every function that does work on behalf of that request takes it as the first argument and passes it on. You never store it in a struct. You never pass nil. You pass the one you were given, and you only create a new one when you have a genuine reason to add a deadline or a cancellation of your own.

Here is the shape of it that I now reach for without thinking:

func (s *Service) Fetch(ctx context.Context, id string) (*Record, error) {
    ctx, cancel := context.WithTimeout(ctx, 2*time.Second)
    defer cancel()

    row := s.db.QueryRowContext(ctx, "SELECT ... WHERE id = $1", id)

    var r Record
    if err := row.Scan(&r.ID, &r.Name); err != nil {
        return nil, fmt.Errorf("fetch %s: %w", id, err)
    }
    return &r, nil
}

Two things matter in that small block. First, WithTimeout derives a new context from the incoming one, so it inherits the parent's cancellation and adds a two-second ceiling of its own. If the caller gives up, this gives up too; if two seconds pass, this gives up regardless. Second, the defer cancel() is not optional. Even when the operation completes normally, you must call cancel to release the resources the context is holding. Forget it and you leak. The linter will shout at you about this, and for once the linter is right.

A code editor showing a Go function

The other half of doing it properly is actually respecting cancellation when you receive it. Passing the context to QueryRowContext means the database driver will abandon the query if the context is cancelled. But if you have your own loop doing work, you have to check yourself:

for _, item := range items {
    select {
    case <-ctx.Done():
        return ctx.Err()
    default:
        process(item)
    }
}

Without that select, your loop will grind happily through ten thousand items even though the caller hung up after the second one. Cancellation is cooperative. The context can tell you to stop, but it cannot make you.

was it worth it

Threading context through everything is a small, persistent tax. Every function signature gets a bit longer, every call site gets a bit noisier, and the first time you retrofit it into a codebase that grew up without it, you will touch an unreasonable number of files. I did, and it was tedious.

It was also entirely worth it. The next time a downstream service got slow, the deadlines fired, the requests failed fast with a clear timeout error, the goroutines unwound, and the connections went back to the pool. The pile-up that took me down the first time simply did not happen, because the work that nobody was waiting for any more got told to stop. That is the whole promise of context, and once you have seen it actually save you, the tax stops feeling like a tax. It is just how you write Go.