Ramblings of an aging IT geek
← Ramblings of an aging IT geek
golang

where the context goes, the cancellation follows

How context.Context actually earns its keep in Go services for cancellation and deadlines, and the habits that keep it from becoming clutter.

A screen full of Go source code

For my first year of writing Go I treated context.Context as a tax. It was the thing you put as the first argument because the linter wanted it and the standard library wanted it, and then you passed it down and mostly ignored it. I knew it carried cancellation. I did not really feel why that mattered until a service I'd written kept doing work for requests that had hung up minutes ago.

That's the whole point of context, and it took a production incident to make it land. A Context is the thread you pull to make everything downstream stop. When a request is cancelled, when a deadline passes, when the caller gives up and walks away, the context is how that decision propagates all the way down through every function call and every goroutine you spawned, so they can stop too instead of grinding away on work that nobody is waiting for any more.

the failure that taught me

The service fanned out: a request came in, and to answer it I'd call three backend services, then aggregate. I spawned a goroutine per backend and collected the results. It worked beautifully until one backend got slow. The client's HTTP timeout fired at 30 seconds, the client disconnected, and my service carried on. The goroutines kept waiting on the slow backend. The aggregation kept holding a connection. Under load, this meant thousands of goroutines doing work for clients that had left the building, and memory climbed until something fell over.

The fix was not clever. It was threading the request's context through to the backend calls and actually respecting it.

A diagram of code flowing between components

When the HTTP server cancels the request context on disconnect, that cancellation now flows into every http.NewRequestWithContext call I make downstream. The Go HTTP client honours it: the moment the parent context is done, the in-flight request is torn down. The slow backend call returns an error, the goroutine exits, the connection is released. Same code shape, one parameter threaded properly, and the goroutine leak simply stopped happening.

the rules I follow now

A few habits, learned the hard way, that keep context useful instead of decorative:

  • Pass it, never store it. Context is a parameter, always the first one, named ctx. It is not a struct field. The instant you stash a context in a struct you've broken the chain, because that stored context belongs to whichever call happened to create the struct, not to the call that's using it now. This is the mistake I made most often early on.

  • Derive, don't replace. When you need a tighter deadline for one downstream call, you derive a child:

ctx, cancel := context.WithTimeout(ctx, 2*time.Second)
defer cancel()

resp, err := client.Do(req.WithContext(ctx))

The child inherits the parent's cancellation and adds its own deadline. If the parent dies, the child dies. The two-second timeout is a ceiling on this one call, not a floor under the whole request.

  • Always call cancel. WithTimeout and WithCancel return a cancel function and you must call it, even if the operation finished cleanly. defer cancel() on the next line, every time. Skip it and you leak the timer and the goroutine the context spun up to watch the clock. go vet will shout at you, and it's right to.

  • Check ctx.Err() in loops. If you're iterating over a lot of work, a long batch or a stream, check whether the context is done between iterations and bail if it is. There's no point processing item 9,000 of 10,000 for a caller who left at item 12.

for _, item := range items {
    if err := ctx.Err(); err != nil {
        return err
    }
    process(item)
}
  • context.Value is for request-scoped data, sparingly. Trace IDs, the authenticated user, a request-scoped logger: fine. Optional function parameters smuggled in to avoid changing a signature: no. Values are untyped and invisible, so use a private key type and keep it to genuinely cross-cutting concerns. Most of the time you want a real parameter, not a value bag.

Two engineers reviewing code on a monitor

the mental model that finally stuck

I stopped thinking of context as "the cancellation argument" and started thinking of it as the request's lifeline. Every operation that happens on behalf of a request hangs off it. Cut the lifeline at the top, by a timeout, a disconnect, a shutdown, and everything hanging off it falls away cleanly. The job of threading context through your code is the job of making sure nothing is doing work that's become orphaned.

The reason it feels like boilerplate at first is that the benefit is invisible when things go well. A request that completes in 50ms never exercises any of this. It's only when something hangs, when a backend stalls or a client vanishes or you're trying to shut down gracefully and there are goroutines that won't die, that the context you threaded so dutifully turns out to be the difference between a clean stop and a slow leak.

So now I thread it through everything, first argument, derived not stored, cancel always deferred. It is genuinely a small amount of discipline for a service that stops doing pointless work the moment the work becomes pointless. The tax turned out to be insurance, and I'd already paid the excess once.