Plug-and-Play Endpoint Views for Metrics & Errors

by

Jean Yang

As production systems get more complex, it’s often hard for developers to quickly identify the source of high latencies and errors.

To help with this, we’re excited to present our new Metrics & Errors page. You can simply set up Akita to watch your API traffic to quickly answer questions about your app’s behavior.

In this blog post, we talk about why it’s so hard to see what’s wrong with your app, Akita’s approach, and what you now get with the launch of our new Metrics & Errors page.

What’s wrong with my app? And why aren’t my other tools telling me?

Say you’re responsible for a web app. Maybe users are telling you that you’re getting errors, but you don’t know where. Or maybe you know that some of the endpoints you care about are slow, but you can’t quite pinpoint the issue.

If you’re like our users, you’re probably already using at least one monitoring tool. You’ve probably got good visibility on a handful of endpoints where you took the time to set up the monitoring and configure the dashboards. But you’re having a hard time figuring out what’s going on with this issue.

You’re not alone. I’m a big fan of many of the other tools out there, but here are some ways we’ve seen other monitoring tools miss issues:

Cloudwatch. Cloudwatch is often the natural first monitoring solution because, if you’re using AWS, all you have to do is turn it on! But the coarse granularity can leave something to be desired. Cloudwatch gives you aggregate metrics, which means you may not get notified about one bad endpoint ruining your user experience if overall latency is fine. And knowing there are errors somewhere in your system doesn’t help if you only care about errors in specific places.
Datadog or New Relic. Many regard Datadog and New Relic as good entry-level application performance monitoring tools. In fact, Datadog is what I recommend to friends as a first monitoring tool. But there are two main issues. First, even the “drop-in” libraries may involve updating other code and dependencies. Second, there’s nontrivial work in making dashboards that teams can use productively. On our own team, it took our most senior engineer a week to “drop in” Datadog’s APM library, due to having to update dependencies. It’s also taken work to figure out normal ranges for metrics, organize our metrics, and understand which customers are impacted by errors.
Honeycomb or Lightstep. Both are excellent power tools that can tell you exactly what you need to know to find almost any application bug. I’m a big fan myself! But a common complaint is that if you haven’t instrumented your system for a particular kind of bug before, or if you haven’t instrumented a particular service or endpoint, you’re not going to have visibility over issues until you know to anticipate them.
Grafana. Many people love Grafana! Its visualization capabilities are unparalleled. But did you set up a dashboard for the bug you had? The answer is often “no.”

We’re not saying you shouldn’t use other monitoring tools, or that they are bad! But every tool has tradeoffs, so it’s good to recognize what they are. For many of the solutions on the market, you get either ease of setup or customizability, but not both.

Drop-in monitoring with Akita

Towards giving software teams an easier option, we at Akita are building a solution for understanding production that requires no code changes and no custom dashboards. Our vision is for users to drop Akita into their systems and instantly start getting data and insights about their app’s production behavior.

Today, Akita is the fastest, easiest way to see what endpoints you have, which endpoints are slow, and which endpoints are throwing errors. Our solution is powered by a technology called eBPF for passively listening to traffic. Because of how we built our solution, our agent does not need to sit in the flow of traffic and the Akita cloud does not see payload data. Akita isn’t the only drop-in solution and it’s not the only eBPF-based solution, but what makes Akita different is our ability to make system information accessible after we collect it. Akita is the most powerful solution for automatically inferring API endpoint structure from traffic. Akita makes it possible to explore and search API models rather than logs, helping with API discovery and more.

Because Akita automatically infers endpoint structure, many of our users have found our solution useful for endpoint-level monitoring across all of their endpoints. But until now, the user experience for monitoring was limited, requiring users to manually go through their endpoints any time they had questions about their endpoints. The data was there, but we knew we had a lot of work to do to make it easier for teams to easily answer the questions they cared about.

Plug-and-play views for Metrics & Errors

With our latest release, you get plug-and-play views over your endpoints, all without making code changes or custom dashboards:

Quickly explore API behavior across your endpoints. Akita automatically infers endpoint structure (including path parameters!) and lets you sort your endpoints by return code, latency, and throughput. This makes it easy to quickly explore the behavior of your slowest endpoints, endpoints with errors, and busiest/quietest endpoints.
Build the endpoint views you care about using search and filters. You can either search for the endpoints you care about, or explore endpoints by HTTP method, return code, or host name in order to build the endpoint views you care about.
Save and share custom views of your Metrics & Errors. You can now send custom views of your Akita dashboards, with filters and endpoints pre-set. Simply share the URL with your teammates on Akita and they will be able to see what you see.

Additionally, you can explore your endpoints across latency metrics and across time ranges.

Try us out

This is all a work in progress: Akita is currently in beta! Sign up here to try it out. We’d love to get your feedback as we build the monitoring tool you love to use.