the average latency is lying to you

A latency graph on a monitor

A dashboard once told me the service averaged 40ms response time. Users told me it was slow. Both were correct, and the gap between them is the whole reason I stopped trusting averages.

The mean is a single number doing a job that needs a distribution. If 95 requests come back in 10ms and five take two seconds, the average is comfortably under 110ms and looks fine on a graph. Meanwhile one in twenty of your users is sat there waiting two seconds, and they are the ones who tell their colleagues your thing is rubbish. The slow requests are not noise to be averaged away. They are the experience for the unlucky.

Percentiles fix this because they refuse to hide the tail. p50 is your median, the typical case. p99 is the promise that 99 out of 100 requests are at least this fast, which is the number that actually maps to "did it feel quick". p99.9 is where the truly nasty surprises live: lock contention, a cold cache, a GC pause, that one endpoint nobody profiled.

The other quiet lie is averaging percentiles across hosts or time buckets. You cannot take the p99 from ten servers and average them into a meaningful p99. Percentiles do not add up like that, and your monitoring tool will let you do it anyway. If you want the real figure you need the underlying histogram, then compute the percentile across the whole population.

So I watch p99 and p99.9 now, with the average relegated to a sanity check. When someone shows me a single "average latency" number and asks why users are complaining, I already know where to look. It is hiding in the tail, where it always is.