Towards One-Click Observability with API Models

by

Jean Yang

People often ask me what the heck we’re building at Akita.

Back when we were stealthy, I got a lot of inquiries about if we were making a new programming language, as I did in my previous life. We’re not. No, we’re not building a new kind of static analysis either. We’re finally ready to explain our previously indescribable technical approach.

Ineffable — How it used to go when we explained to people what we're doing.

The short of it is this. Modern software systems are complex and heterogeneous. Traditional techniques for taming and understanding programs have become less and less effective. Today, observability solutions are our best bet for understanding our systems, but they are “expert tools.” Today’s observability tools require developers to make sense of logs, metrics, and traces on their own. At Akita, we believe we can fix this by using program analysis techniques to power better developer experience for observability.

In this post, I’ll explain why we need a new kind of software analysis that works across services and APIs, give an overview of other approaches for understanding software, and show how building API behavior models from traffic is The Way.

Motivation: how developers got left behind

For anyone who missed my other blog posts (see here and here), or who wants to hear me rehash my main life motivations using a new analogy, here’s why our software tools are leaving developers behind.

I’ll start by referencing a Tweet from 2013, about how debugging is like being the detective in a crime movie where you are also the murderer. (I’m a sucker for programming analogies and “murder shows,” so I love this.)

Filipe Fortes on debugging — One of the best Tweets about the experience of debugging.

In 2021, debugging is like being the detective of a crime where the murderer is either you or one hundred of your friends you let into your house. What’s happened is that the rise of SaaS and APIs means that your code is constantly interacting with other code from outside your team or even your company. Anyone can break anybody else. Everything happens at around the same time. It’s like one of those detective stories where everybody was in the room at the same time, everybody had a motive, and everybody had access to the relevant weapon.

You’re going to need much better forensics (and help!) to solve this kind of crime. We’ll explain how the solution lies in new software analyses at the API level.

A brief taxonomy of what developers get today

So why build a whole new approach? Let me lay out the state of the art today.

Linting/code search. Probably the most common way developers are finding issues with their code today is with linters, which scan code for common bad patterns and warn on them.

You might recognize: pylint; Sourcegraph; semgrep.
Fans like: lightweight; intuitive to use; eliminates entire classes of bugs.
Haters hate: at the end of the day, souped-up regular expressions are limited in the issues they can find.
Why can't they save us? You’re going to need an intractable amount of arbitrarily intricate patterns to capture all of the things that could go wrong across services.

Static program analysis. Okay, you might be wondering: what if we went deeper than pattern matching? Like that responsible friend or watchful parent, static analyses build models of your program in order to analyze them for all possible bugs.

You might recognize: static type-checkers (Rust, Scala, etc.); Veracode; Sonarqube; Facebook’s internal efforts.
Fans like: being able to rule out entire classes of compile time. This works particularly well if all of the code is written in the same, preferably statically typed, language.
Haters hate: reporting on everything that could go wrong means a lot of potential noise.
Why can't they save us? Noise increases exponentially with the number of reconfigurable software components. Static analyses also typically fall down at network and language boundaries.

Dynamic program analysis. Does only watching what bugs actually happen make things better? Let’s look at dynamic analyses, which run real or virtual processors to examine actual program behavior at runtime.

You might recognize: valgrind; test coverage analyses; Google Thread Sanitizer.
Fans like: only reports on bugs that actually happen for observed runs of the code.
Haters hate: if a bug exists in the code, a dynamic analysis isn’t guaranteed to find it. Performance overheads can also make them prohibitive to run in production.
Why can't they save us? Maybe because of the overheads, they’ve primarily run in test environments. Existing dynamic analyses don’t tend to work across services.

Devops observability. The one kind of tool that has been making it possible to find and fix large numbers of bugs in complex systems is devops observability tools that help collect metrics, logs, and traces describing how incoming calls relate to outgoing calls.

Fans like: getting metrics, logs, traces, and events is the only reasonable way to understand all the emergent behaviors in modern systems!
Haters hate: the high programmer effort to instrument code for maximum benefit. Many say the learning curve for these tools are high; these are “expert tools.”
Why they’ve been our only hope. Today, production is the only place to make sense of cross-service behaviors and devops observability tools are the only way to make sense of what happens in production.

In summary, devops observability is the one class of tool providing visibility into the emergent behaviors across services, but it’s low-level! Metrics, logs, and traces are an assembly language for understanding system behavior. Developers end up having to do a lot of detective work to figure out what’s causing an issue and how to fix it. What if we could bring the power of application-level modeling to this visibility?

What it feels like today for a developer to understand their systems.

Supercharging observability with API models

A lot of people think programming languages research is about designing new programming languages, but it’s ultimately about creating powerful and yet usable techniques for modeling complex systems. Programming language techniques let you do everything from type-checking a program to automatically detecting memory errors to mathematically proving the functional correctness of your operating system implementation.

Even though it’s more complex, there’s no reason why we can’t apply our methods to understanding system behavior across APIs. Here were our goals when we set out to build this new kind of analysis:

Reflects actual program behavior. This rules out static analyses. First, static analyses today completely fall down at language and network boundaries, meaning they are seeing an increasingly small part of the whole picture. Even if this weren’t the case, static analyses are primarily good for reporting on all that’s possible—and most of what’s possible in systems with lots of services and/or SaaS APIs isn’t actually likely. We’re going to need some kind of dynamic analysis and it’s going to need to be able to handle cross-service behavior.
Can find issues without too much noise. If all we wanted to do was detect any behavior changes at all, that would not be too hard. The trickier part is in identifying which changes actually lead to issues.
Requires as little developer effort as possible. One way to cut down noise is to have developers help us out by telling us what they care about. But if there’s one thing we’ve learned from types not getting widely adopted until type inference got better, it’s that developers don’t like to write down anything that doesn’t help build more functionality.

Given these goals, it's clear that existing techniques aren't good enough.

Here’s where our new notion of API models comes in:

Dynamic analysis of API traffic. In order to satisfy our goal of reflecting actual program behavior, we need some way to capture system behavior, including and especially cross-service behavior. It turns out that watching communication across APIs is a great way to do this.
API-centric modeling. In order to satisfy our goal of finding issues without too much noise, we need to model API behavior in a way that abstracts over the details that don’t matter. It turns out that API specifications (for instance, OpenAPI and gRPC), which contain information about endpoints, field, data types, and contracts, provide a good starting abstraction. Here’s an example of the properties that we’re able to determine from modeling.‍

Identify regressions by comparing API models. The next best thing to knowing what a developer intended as the specification is knowing how a previous run of the system behaved. Inferring transparently structured API models (as opposed to something more opaque, like neural networks) makes it possible to automatically identify when properties of interest have changed. This has allowed us to automatically identify potential breaking changes, for example, added and removed fields, modified data formats, and more.

What you can do with an Akita API model diff.

At Akita, we’re building API models by passively watching API traffic (in test, staging, and production environments). Our approach takes inspiration from program synthesis approaches that learn programs by watching interactions with a system (for instance, see DemoMatch). We’ve also taken inspiration from invariant generation approaches (for instance, see Daikon) that automatically learn properties about programs by guessing and testing those properties. Akita innovates on these techniques by analyzing traces of network calls instead of program traces, in order to generate candidate invariants about API behavior. We believe that this kind of modeling makes it possible to super-power observability by reducing the amount of manual cross-service sleuthing a developer needs to do: when checking in code, when monitoring the running system, and when root causing issues.

Crime show image — The future with one-click observability.

This is just the beginning

As software becomes more complex, we’re going to need many kinds of solutions based on inferring properties about our systems. I believe that our approach for modeling API behavior is going to be an important part of a growing ecosystem of related techniques. Especially since this is the early days of these kinds of tools, we’d love to hear from you about your use cases for this kind of analysis, how you would explain this kind of technique to people, and other thoughts you might have.

And if you’re interested in trying Akita out for endpoint and data visibility, API monitoring, or catching breaking changes—we’d love to have you join our beta.

‍

With thanks to Nelson Elhage for comments.