Oops! Something went wrong while submitting the form.
August 11, 2021
Observability Isn't About Logs, Metrics, and Traces
Share This Article
“How can you be an observability tool that doesn’t focus on logs, metrics, or traces?”
At Akita, we’ve gotten this question a lot. You see, we’re an observability company that didn’t start out focusing on logs, metrics, or traces. Instead, we’re watching API traffic and building models to understand API behavior.
It’s been hard to answer this question because most people think about observability in terms of logs, metrics, or traces. But, as I’ve said in this Tweet, here’s how I see things. Saying observability is about logs, metrics, and traces is like saying programming is about manipulating assembly instructions. Observability is about building models of system behavior and how they change. Today, state-of-the-art tools give you logs, metrics, and traces—and then people build those models in their heads.
Especially as people have begun generalizing observability to other domains, for instance data science, it’s important for people to understand observability in terms of its goals, rather than its implementation. In this post, we’ll talk about how observability tools are really about system understanding, how logs, metrics, and traces are the implementation strategy, and what it might look like to raise the level of abstraction.
So, what is observability?
To understand the high-level goals of observability, let’s look at the websites of two companies that are at the forefront of the devops observability movement, Lightstep and Honeycomb. Here’s what they have to say about the benefits of observability:
“Release features faster” (Honeycomb). If you don’t have a good handle on your system, implementing intended features takes longer because you don’t know how potential changes might affect your system. If you have a good handle on your underlying system, you can move faster.
“Complete system context at scale” (Lightstep). A big part of why you can release features faster is that devops observability tools are giving you context of what your system is doing. This makes it easier to implement features, maintain code, and respond to incidents.
“Have confidence in production” (Honeycomb). In the days of shrink-wrapped software, it was both expensive to make a software error and possible to test your entire, closed software system before shipping out your discs. Today, the fact that software is services externally makes it much cheaper to update—and the fact that it’s services internally makes it much harder for unit and integration tests to explore the full space of software behavior. As a result, it’s important to have a way to understand production behavior quickly and easily.
“A better way to manage change” (Lightstep). An important part of understanding your systems is understanding how they change. Changes are the place to prioritize if your system is too complex to understand entirely. Plus, changes are most likely to take down a system that wasn’t down before!
To generalize, observability is about helping people build models of their systems so they can:
Understand your system better to move faster.
Get visibility into how your system is actually running.
Keep an eye on how your system changes.
Given these goals, the less tools are about logs, metrics, and traces, the better. I like what Michael Hibay says here in response to my Tweet: the less tools require the user to build models in their heads, the more observability is possible.
What logs, metrics, and traces have to do with it
But today, people everywhere are still thinking about observability in terms of log, metrics, and traces. For instance, here’s an article about how observability has become a critical concern for data teams as well as devops teams, with an excerpt below:
And talking about logs, metrics, and traces as the “three pillars of observability” is consistent with the messaging from observability products. For instance, this is from Datadog’s observability page:
And saying that observability’s key elements are logs, metrics, and traces is not wrong. Today, observability tools make it possible to get channels of visibility into system function in exchange for code instrumentation. People who know how they want to understand their systems can get the support they need to collect and query the information to do it.
But saying observability’s “pillars” are metrics, traces, and logs is like saying programming’s key elements are storing data, moving data, and arithmetic operations on data. To build on this analogy: today, compilers and interpreters do most of the job of expressing computations in terms of stores, loads, and arithmetic operations, while programmers get to work with the languages and paradigms that have gotten built on top. I believe this is where observability is headed as well. This leads us to the question: what are the appropriate abstractions on top of logs, metrics, and traces?
API-centric observability: one form of abstraction
Just as there have evolved to be different programming languages for different communities and tasks, there are going to be different tools built on top of logs, metrics, and traces, each making their own set of observability tasks easier.
At Akita, we’ve set out the following challenge for ourselves: what would it look like to build a tool that abstracts over logs, metrics, and traces the way Python abstracts over assembly? Our strategy has been to take an API-centric view, supporting the following use cases:
API-centric monitoring. We provide logs and metrics, without requiring instrumentation, in a per-endpoint way, to make it easier to answer questions having to do with your API endpoints, for instance “What endpoint is slow?” and “What endpoint is sending a lot of data?”
Search over API behavior. Akita supports search over API behavior, rather than over code or logs. This is the most effective way to find, for instance, all uses of certain data types (for instance, sensitive data) or patterns (for instance, use of CSRF tokens without authentication tokens). Users can search over API behavior from test, staging, or production.
Catch breaking API changes. Akita automatically detects added and removed endpoints, fields, and data types, as well as changes to response codes and authentications. Users can run across pull requests in CI/CD, or in staging and production deployments.
The approach we’re taking at Akita is a combination of passively watching API traffic and building API behavior models that understand endpoints, data types, and per-endpoint performance. Passively watching API traffic makes it possible for our solution to not require code instrumentation: you don’t get every single log or trace you might want, but you can drop our solution anywhere there’s network traffic and get results quickly. Automatically modeling API behavior makes it possible for users to quickly answer endpoint-centric questions. You can’t answer every question you might want to know about your API behavior, but the idea is that you can now easily answer many common questions.
With Akita, as with high-level programming languages like Python, you get a 90% solution on Day One. You might not have all the control of getting full logs, metrics, and traces of today’s observability tools, but you get a drop-in solution that takes you a lot of the way quickly, centered around questions you might have around APIs.
What to expect from the future
Especially for those of you out there thinking about how observability applies outside of devops, here’s the main takeaway: observability isn’t about logs, metrics, and traces. It’s really about getting the tools you need to understanding your systems and how they change. Logs, metrics, and traces are implementation strategies—and a good place to start, but you might want to build for higher-level concepts.
What we’re doing at Akita is just one kind of abstraction one can build on top of logs, metrics, and traces. Just as we saw programming languages evolve and make different tasks go from expert-only to very easy in the last few decades, I believe we’re going to see the same with system understanding. We’re just getting to the point where people are understanding logs, metrics, and traces for observability, so we’re really at the beginning of this conversation.
These are exciting times! We’d love to hear your thoughts on what abstractions would be most helpful to build on top. And, if you’re interested in API-centric observability, we’d love to have you try out our beta.