Ramblings of an aging IT geek
← Ramblings of an aging IT geek
golang

the goroutine leak that hid in plain sight

A slowly climbing goroutine count traced to workers blocked forever on a channel nobody would ever read from again, and the missing cancellation that caused it.

A code editor showing Go goroutine code and a memory graph

The memory graph had a slope to it. Not a cliff, nothing that paged anyone, just a gentle, patient climb over days until a restart reset it and the climb began again. The kind of leak you can ignore for months because the deploy cadence keeps quietly papering over it.

The culprit wasn't memory directly, it was goroutines. runtime.NumGoroutine() told the story: a number that only ever went up. Each incoming request span a worker goroutine that did some work and then sent its result on a channel. Most of the time the caller read the channel and everyone went home. But when the caller timed out and moved on, it stopped reading, and the worker, having finished its work, sat there forever trying to send on a channel with no receiver. Blocked on the send, holding its stack, never collected.

result := make(chan Foo)  // unbuffered
go func() {
    result <- doWork()  // blocks forever if nobody reads
}()

select {
case r := <-result:
    return r
case <-time.After(timeout):
    return nil  // worker is now stuck here for eternity
}

The send is the leak. An unbuffered channel needs a receiver present at the moment of the send, and once the select has given up and returned, that receiver is gone for good. The fix is to give the worker a way out: either buffer the channel by one so the send can complete and the value is simply discarded, or, better, pass a context and have the worker watch for cancellation so it stops the actual work too. A blocked goroutine never shows up as an error. It just quietly costs you a stack frame, forever, and waits for the slope to do the talking.