A mean latency of 40ms sounds healthy right up until you remember what an average does to a tail. If 99 requests come back in 20ms and the hundredth takes two seconds, your average is barely over 40ms and looks fine, while one user in a hundred just sat there watching a spinner. The average didn't lie exactly, it just quietly buried the only number that mattered.
Watch percentiles instead. The p50 tells you what a typical request feels like; the p99 tells you what your unluckiest one in a hundred feels like, and at any real traffic that's a lot of unlucky people per minute. The gap between them is where your GC pauses, your cold caches, your lock contention and your occasional retried connection all live. A flat p50 with a climbing p99 is a system that's mostly fine and getting worse at exactly the edge nobody's graphing.
And don't average your percentiles across hosts, because that's just a worse average wearing a better number's clothes. If you can, plot the p99 directly. The mean is the most reassuring metric on the dashboard, which is precisely why it's the most dangerous one to trust.