June 3, 2021

Modeling API Traffic to Catch Breaking Changes

by
Jean Yang
Share This Article

On December 11, 1998, NASA launched the Martian Climate Orbiter. On September 23, 1999, communication with the spacecraft was forever lost.

The reason? Miscommunication between software systems. NASA used metric units, while the spacecraft builder Lockheed Martin used “English” units.

At an estimated cost of $200 million, some call this NASA’s most expensive mistake.

With the rise of SaaS and APIs, we’re seeing more and more miscommunication between software systems. As a result of software architectures shifting towards reusable components communicating across the network, miscommunication across those components is a growing issue. While each error may not be as expensive as NASA’s, breaking API changes are a growing source of production outages and lost developer productivity.

At Akita, we’ve been working on making it easier to catch breaking changes across heterogeneous systems—with internal, external, and third-party APIs. When we started, we had two goals for building a low-overhead solution: low performance overhead and low developer effort. The first goal we were able to solve by passively listening to network traffic in non-invasive ways (see here and here), but the goal of low developer effort was harder to achieve. How do you tell developers what breaking changes they might have, without introducing too much noise—and without asking them to write specifications about what’s supposed to happen?

In this post, we’ll show our brand new UI for helping developers catch breaking changes, with minimal developer effort. We’ll talk about what kinds of changes Akita helps flag and explain how Akita catches breaking changes by passively modeling API traffic. If this sounds interesting to you, we’d love to have you join our beta to help us catch breaking changes better!

Exploring Breaking Changes with Akita

When we surveyed developers about the bugs that keep them up at night, we were surprised by how non-scary they sounded. Small syntax discrepancies (for instance, adding an ‘s’) were causing nontrivial production outages! Tiny data format modifications (for instance, from one RFC of date/time to another) were causing production failures! (This matched other data; according to a recent study about cloud production bugs, 21% of cloud production outages come from data format issues.)

Without something like what we’re building at Akita, catching small changes across APIs requires a lot of manual effort. Unit tests, integration tests, and linters aren’t catching even simple cross-service issues. These bugs make it all the way to production, where the developer will get an alert that the service is down. It will first take some time sifting through logs (and, if you’re lucky, metrics and traces) to narrow down the service(s) where the issue is occurring, and then even more time to determine the root cause.

By passively modeling API traffic, Akita is able to catch cross-service issues as early as CI/CD. Say you add a new `alert_me` endpoint that sends SMS messages, but needed to change a field from an international-formatted phone number to a US-formatted phone number in order to make it work. The service you’re changing is not affected by this change—and in fact the tests all pass. But what you might not realize is that the data format you’re now returning in your response is now a different format, which can have a domino effect if your service consumers are expecting the original format.

Akita helps quickly identify potential breaking changes by calling out observed changes in API behavior:

The Akita Console highlighting a breaking change.
The Akita Console highlighting a breaking change.

Here, Akita has identified that the `/api/users` endpoint has been changed in a potentially breaking way, due to a change in the data format. Drilling down into the endpoint gives more information:

Zooming in on what Akita shows for the data format change.
Zooming in on what Akita shows for the data format change.

Integrating Akita into CI/CD means that it’s possible to get a summary of these reports as a GitHub comment on every pull request, making it much more likely that a developer introducing or reviewing a change will catch these kinds of issues before they hit customers. Read more here about running Akita on every pull request.

Akita GitHub comment summarizing changes.
Akita GitHub comment summarizing changes.

Someone reading one of these API model diffs might not be interested in all changes Akita identified. The donuts on this screen allow you to filter on the changes you care about:

Using the Akita filters to show only changes having to do with auth.
Using the Akita filters to show only changes having to do with auth.

The structured way Akita models APIs make it possible to filter specifically on the breaking changes you care about:

Showing the different types of breaking changes Akita can help you catch.

And as for what’s coming next: we’ve been working our way through the list in Shopify’s blog post about the breaking changes they are careful to catch before rolling out API changes. 

How Akita Infers API Models By Watching Traffic

Cool, you might be thinking: if I spent so much time annotating my APIs with specs and data types, the least someone (or some software) could help me with is detecting differences.

But here’s where the magic comes in: Akita infers these properties about your API simply by watching traffic. The Akita agent uses pcap filters to passively watch traffic going across the network. Akita then infers API models representing endpoints, fields, data formats, latency, and more. Users can download these models as annotated API specs. Akita’s passive traffic-watching model is flexible enough to run as early as test or as late as production. (No network tests? Check out our test integrations.)

Today, Akita’s inference is akin to type inference in a compiler. Normally, a compiler parses code written in a language into an abstract syntax tree (AST). The structure of the AST, combined with observations about how values get used, informs the compiler on how to narrow down possible types. With Akita, the starting point is request/response traces rather than a program, so Akita’s API model inference happens in two steps: 1) inferring an “AST” for the API itself and then 2) inferring types, data formats, and other properties about the API endpoints.

It turns out that knowing what the structure of an API endpoint should look like is very powerful: we’re able to do all of our inference so far without needing to use any machine learning. The implication is that our “learning” can work accurately even if you have a small number of API requests/responses. But because there isn’t any statistical extrapolation right now, Akita can’t infer anything about API calls it doesn’t see.

What Akita Helps You Catch

The fact that seemingly simple, avoidable bugs are a major issue was good news to us, as these bugs are much more straightforward to programmatically catch than, say, subtle concurrency issues.

Here are some examples of useful properties Akita’s API models can help you catch automatically. These are all breaking changes that our users have told us are tricky to catch using source diffs or manual inspection of APIs.

Data format changes

The Akita CLI is able to automatically detect a growing set of data formats, including different datetime formats, different phone number formats, email, and more. This allows Akita to detect the following kinds of changes:

  • Subtle data format changes. Akita is able to detect, for instance, if a field that used to be an email is now a more generic string. These usually fly under the radar of typed IDLs and hand-documented APIs because people tend to use the type `string` liberally.
  • Accidental uses of the wrong type, or accidental introduction of a second type. In non-statically typed languages, it is easy to introduce bugs by, for instance, using a `bool` or `int` instead of a string. Akita will point out if a field used to be `string` but is now both `string` and something else. (For the OpenAPI nerds out there, these show up as `oneOf` types.)

Changes to API path arguments

Another kind of breaking API change can come from expectations around endpoint structure. In addition to data formats, Akita API models also infer path arguments, for instance `/api/{arg0}/super_fun_data`. Suppose, for instance, an endpoint starts accepting universally unique identifiers (UUIDs) instead of generic strings for {arg0}. Akita is able to detect and surface the fact that {arg0} now takes UUIDs, helping prevent breaking changes that might occur from using generic strings there.

New sensitive data

Something else that many of our users are interested in is added fields or endpoints containing sensitive data. While these don’t break functionality, they can certainly break privacy guarantees to users if they leak data! Akita is able to detect both additions and any change involving sensitive data, so it’s straightforward to detect these changes.

Help Us Help You Find Breaking Changes

So far, we’ve built out API behavior diffs for REST APIs and support the generation of OpenAPI specs for them. Many people have been asking about GraphQL and gRPC: those are on our roadmap. We’d love to hear from you about what else you’d like to see.

And this is just the beginning of what we’re able to model by watching API traffic. While we’re not quite building the Mars Climate Orbiter, we are building something technically deep that we’re just at the beginning of. There are a lot more properties that are possible to infer passively by watching API traffic, for instance relationships between API calls. Stay tuned!

We’re actively expanding our diffing functionality, so we’d love to hear from you with questions and requests. And, of course, we’d love to have you sign up for our beta and try us out. 😊


With thanks to Cole Schlesinger and Jed Liu for help on this post.

Share This Article

Join Our Private Beta now!

Thank you!

Your submission has been sent.
Oops! Something went wrong while submitting the form.