Oops! Something went wrong while submitting the form.
July 22, 2021
September 29, 2020
Taking Types to the Next Level: Stop API Bugs By Inferring Data Formats
Share This Article
If you work on web apps, you’ve probably been bitten by a sneaky bug. You know, the kind that takes a long time to debug, but is not glamorous to explain. The whitespace errors; the data format errors. The kind of bug that might torment you for a whole weekend, that you emerge with no victorious war story to tell. This is the kind of bug that types have largely solved in languages like TypeScript and MyPy… but that still lurk at the boundaries of APIs.
In our last blog post, we talked about how to catch cross-service bugs by watching API traffic. In this blog post, we show specifically how checking data formats across APIs can catch some nasty bugs.
😈 An Error that Evades Any Type Checker
You finish implementing a new feature. Your unit tests pass and you get that rush of excitement. The integration tests go green; you think today is a good day. Your co-worker and #bff Aki give your changes a +1 and your pull request gets merged. For one moment, everything seems right with the world.
And then, hours later, DISASTER! You’re playing online games with Aki—and doing better than you ever have before—when you get a page. Customer support creates an incident that has to do with your new code. Customers are irate that they can’t log into your website.
You rush home and, after spending the rest of your night combing through logs and writing more tests, you figure it out. It turns out that you accidentally used phone.ToString() instead of phone.ToInternationalString(), causing you to send a domestic phone number instead of an international phone number as a string to a third-party API. The integration tests didn’t cover this because they mocked out this third-party API. But now you’ve lost your rare chance to beat Aki at Fall Guys—and you’ve also tripped up some of your customers.
After the incident, you do a postmortem and you become worried. Aki pointed out that even if you adopt language-level type-checking (like TypeScript or MyPy) to catch simple errors, application-level type checking can’t prevent these kinds of errors because it doesn’t check across APIs. And once the type-checked values hit the wire they all get flattened to the same thing, so you would need some sort of magical tool to check across APIs. But what if such a tool existed!
✅ Find Bugs by Inferring Data Formats
Worry no more! We’ve designed Akita to solve exactly this problem of spotting mismatched data formats across the API. With Akita, you can use our data format detection to easily detect issues like this phone number change, without requiring code changes or proxies, simply by allowing Akita to watch your API.
Because Akita doesn’t require code changes or proxying, our client is flexible enough to run in either production or test, allowing you to compare cross-API data formats in production with cross-API data formats in test. In this case, Akita would alert you that the data format it observed differed from what it’s been observing, alerting you to the fact that something doesn’t check out, allowing this change to never ever hit production in the first place.
Akita is able to identify:
Simple Types: Strings, Integers, Booleans, Floats
Countries - 2 and 3 Letter Country Codes, Names and TLDs
Currencies - Names and Abbreviations
Dates and Times - ISO 8601, RFC 822, RFC 3339, RFC 850, Unix Timestamps and many more
Email Address - RFC 5322 Address and Names
Languages - Language names and ISO 639 2 and 3 Letter abbreviations
Phone Numbers - International and US formatting
URLs - HTTP and HTTPs URLs
Akita works for traffic both to other internal services and for third-party SaaS APIs you might call, like Stripe or Twilio. We’ll talk more in an upcoming blog post about Akita’s specific mechanism for watching outbound API traffic.
⚡️ Powered By API-Level Data Format Inference
Under the hood, Akita is automatically inferring data formats from the API traffic that it sees. When Akita sees a request hit your service, it infers the data format for each argument as part of the automated API spec generation we described in the last post.
It turns out inferring data formats is not as simple as just watching each argument. There might be multiple data formats that a field is eligible for. For instance, `18001112222` could be either a phone number or a time stamp, but a second call to the same endpoint with `1-800-333-4444` makes it clear that parameter is a phone number. A field that accepts strings may accept the occasional email. To infer the most accurate type, Akita compares data from all requests to the same endpoint and identifies the data formats common across all calls. We also use a bit of secret sauce to compare data formats across data flows we detect across your API, but that's the topic of another post.
Of course, there's always the chance that your API successfully accepts, say, phone numbers and timestamps (or email addresses, country codes, and so on) in the same parameter. In that case, we'll let you know all the data formats that fit the data.
There are a couple of cool things about how we’re inferring data formats. First, we’re using type inference to infer types at the API level, rather than analyzing source code. Second, we’re inferring specific data formats, with the ability to tell the difference between different phone number formats, on top of simple types like string or int. A bonus is that since Akita automatically infers the entire API spec under the hood, our type inference can use the structure of that spec. More on this in a later post!
👀 What’s next?
We’ve recently released type inference and would love to have you try it out. We believe we have a one-of-a-kind tool and would love your help in making it as useful as possible in helping developers build great software.