p90 vs p99: Why Not Both?

by

Lauren Rother

Ask someone whether they primarily look at p50, p90, or p99 to figure out where their app is slow and the answers will vary person to person. It will vary by team, by endpoint, and may even vary by week.

Previously, the Akita Metrics and Errors page only showed p90. While p90 across each API endpoint, for all API endpoints is certainly better than nothing, it also doesn’t give you all the information you might want to explore slow endpoints. We’re happy to announce that we’ve extended our latency reporting to include p95, p99, and even ✨p99.9 ✨ with a new latency explorer.

In this post, we’ll go through a quick primer on latency metrics, talk about why you might want to look at p95, p99, p99.9 in addition to p90, and walk through what being able to explore across latency metrics in Akita means for your app. This post may be helpful for anyone trying to make sense of latency metrics for their own performance monitoring.

A quick primer on latency metrics

One of the most common questions about a production system is, “What are my slowest endpoints and is there anything I should do about them?”

Before anyone can answer that, we need to settle on what it means for an endpoint to be “slow.” We measure endpoint slowness in terms of latency. Latency characterizes the time a request takes to process. (At Akita, we’re measuring the time from when the last bytes of the request arrive to when the first bytes of the response are sent.) But endpoints get many calls, each of them processing at different speeds.

So how many calls need to be slow for us to consider the endpoint slow? Half? Most? One really slow call? When talking about latency, you could talk about properties like the maximum (what’s the slowest?), minimum (what’s the fastest?), average, or median (what’s the latency that’s faster than the half of the processing times). Maximum gives you an idea of worst case, where minimum gives you an idea of best case. A low average latency is great, but high average latency could be caused by all calls being somewhat slow, or a small number of calls being really slow. Similarly, high median latency is definitely bad, but a low median doesn’t tell you if half the calls are secretly very slow!

Latency metrics help us characterize the notion of “what percentage of calls do you care about being slow.” So when you look at the pXY number for an endpoint, you’re seeing a number that represents how XY% of API calls were handled in less time. The p90 latency of an endpoint is a latency number higher than 90% of calls to that endpoint. The p50 latency is the median latency, so it is higher than half the call latencies to that endpoint. A higher latency metric, say p99, for your endpoint means that 99% of calls to that endpoint were faster than this number. Targeting a p90 of 10ms means that you want 90% of calls to be faster than 10ms and you’re okay with 10% of calls having a higher than 10ms response time.

Why care about the other pesky p’s?

The p90 latency metric is great for giving you a general sense of the health of your endpoint. And you might wonder why you’d bother with other pesky p’s.

Let’s say you’re getting 8000ms spikes less than 10% of the time for 1 million calls. If one out of every 10 is 8000ms, considering only p90 means you are letting 100,000 calls spike. Let’s say each user makes 100 calls; this means 1,000 users may not be able to successfully interact with your app. Your p90 could very well be 10ms and not reflect those spikes. P90 effectively means you are okay dropping 10% of your user experience on the floor. While this may generally be okay, there may be some endpoints (for instance payments or auth) where you have a higher usability threshold.

Previously in Akita, if you wanted to get a sense of the worst of the worst, you were just out of luck. But now, you can find out what your peak worst is by viewing the p99 of your endpoints directly:

p90 vs p99 on the Akita Metrics and Errors dashboard.

If you’re getting lots of calls to an endpoint, certain slow calls may only happen less than 1% of the time! In this case, you’ll want to explore latencies slower than 99.9% of calls to your endpoint, to really dig into how your app may be dropping some users on the ground and find out how badly they are being dropped.

Compare this (p99.9):

To this (p90):

In many cases, you may not start out with a strong opinion of what percentage of slow calls you might care about. The percentages may vary based on the endpoint or the total volume of calls, and they may change over time as your app and API change. In these cases, what you really want is to compare endpoint performance across latency metrics to understand how slowness is distributed across calls:

See how that one endpoint seemed to have a really mild spike when we considered p90, but had a much larger spike when we considered p99.9? This level of granularity lets you discover and then hone in on potential problems to decide how important they are before they decide for you by causing issues with important users.

Do you know how your app is slow?

No matter what, we believe it’s incredibly informative to explore your app’s performance across different latency metrics. By passively watching your API traffic, Akita makes it possible to do this easily across each of your API endpoints, without requiring you to make any code changes or even include any SDKs.

Want to see how your app is doing? Drop us into your stack and we’ll let you know! Sign up to try us out.