Jan 4, 2022
Apr 14, 2021
Min Read

How Akita Generates HAR Files to Understand Flask APIs

Cole Schlesinger
Akita dog watching across fence. Photo by Hermes Rivera on Unsplash.
Share This Article

Last year, one of our users, Sébastien Portebois, wrote a blog post about using Akita to generate API models from Flask integration tests. At the time, Akita needed to see network traffic to generate API models. Sébastien’s integration tests did not generate network traffic, so he extended the Flask test client to echo API traffic across the network.

Last month, I built on this to finally release Akita’s “official” Flask test integration (docs here). Our integration also instruments the Flask test client, but instead of echoing traffic, it produces HTTP Archive (HAR) files. The Akita command-line interface then processes these HAR files to generate API models that know about endpoints, fields, types, data formats, and more. To use this integration, you simply swap your Flask test client for the Akita test client.

In this blog post, I’ll give an overview of how Akita works, what’s hard about getting Akita to work with non-network integration tests, and how our test client integration works. We’ll dig into the details of extending Flask’s test client. This post may be interesting to anyone looking to add request logging or pre- or post-processing to REST framework test clients.

Note: Flask is just one of the frameworks we’ve built integrations for. Take a look at our docs to see the rest.

How Akita came to deal in HAR files

Last summer, we released the first version of an agent-based tool that automatically generates API models by watching traffic to and from REST services. Our agent used PCAP filters to collect traffic, obfuscated the request/response data, and sent that data to the Akita cloud for generating API models. Users could directly turn API models into API specs, or detect breaking changes by diffing API models generated across different pull requests.

Akita's original architecture.

As our users started making requests to fit their integration needs, our team expanded the different ways to get API traces, sequences of obfuscated requests and responses. We introduced a way to build API models from HTTP Archive (HAR) files, JSON-formatted files typically used to capture information between a web browser and website for performance or other debugging.

After building browser-based HAR ingest, we realized that HAR is a great format for storing data from all of our traffic sources. Given that HTTP is the lingua franca of today’s web, HAR files turn out to be a great intermediate representation for request/response data. It also turns out that most browser developer tools and proxies support the generation of HAR files. For these two reasons, we re-architected our system to use HAR files to send obfuscated traces between the Akita command-line interface and our cloud.

Akita's architecture today.

What about traffic that doesn’t go over the network?

Agent-based packet capture works well for getting API traffic from tests that generate network traffic, as well as from staging and production deployments. But our users started asking: what happens with integration tests that don’t generate network traffic for API calls? We realized that we could use HAR files here as well.

Many of our users are building APIs via popular web frameworks, such as Flask for Python and Express for Node.js. When testing in these frameworks, it is common to not send traffic over the wire, but to call HTTP handlers directly, because this is more efficient. To generate Akita API models from these non-network tests, we’ll need some way of instrumenting these frameworks to capture requests and responses and store them in HAR files.

What about middleware, you ask? Middleware is a common design pattern to wrap a service with plugins that intercept each request and response handled by the service. While instrumenting middleware would be the natural approach (and, in fact, we built Rails middleware to do exactly that), Flask does not run middleware as part of its integration tests. We could ask developers to manually log every request and response in every test, but that’s a lot of work.  Instead, we created a wrapper around Flask’s test client, making it a one-line change to use the Akita test client to run your tests and also output a HAR file containing your tests’ traffic.

How Akita’s test client integration works

The Flask test client abstracts over calls to your service with methods like get(), put(), and post(). This design makes it possible for us to instrument the test client, rather than needing the user to instrument every test. (This is also the approach that Sébastien took with the Akita Flask test workaround.)

As a user, this means you can change one line to import Akita’s HarClient instead of Flask’s FlaskClient and you’re good to go. (You can additionally change a second line to initialize the HarClient with the name of the HAR file to output.) Read on to see how I extended the Flask test client to write request and response data to HAR files. You can see the code for the HarClient class here.

The Akita test client works by extending the Flask test client, FlaskClient, to additionally write to a HAR file when requests are made.  The constructor takes an additional argument specifying where to write the HAR file.

The main action happens in the open function. FlaskClient mimics a REST client, offering get, put, post, and other methods corresponding to RESTful operations. In each of these methods, FlaskClient calls open after doing some preprocessing on the arguments.

The open function is responsible for packaging up the arguments into a request and dispatching it to your service, returning a response object. In that sense, open is a narrow waist. Every request and subsequent response passes through it.

We extended this function to encode each request and response as an entry in the HAR file, using self.har_writer.write_entry. The heavy lifting happens in the _create_wsgi_request function, which mimics the implementation of FlaskClient.open to create a Request object from the arguments passed to open. The tricky bit is in the wsgi_to_har_entry function, which extracts details from WSGI Request and Response objects and converts them into a format for HAR entries. Different frameworks use different representations for requests and responses. If you want to do this for another framework, it will take some sleuthing to find accessors for extracting all the necessary information.

Managing writes to the HAR file is a final, important detail, because many test frameworks run tests concurrently. We created a HarWriter class to encapsulate a thread-safe writer; you can see the definition in the akita_har package here. Under the hood, it enqueues HAR entries from HarWriter.write_entry into a thread-safe queue and uses a background thread to dequeue and write each entry to the target HAR file.

What’s next?

If you read this post because you’re interested in this method of extending test clients, you may be interested to know that you can use the same pattern I showed here for other frameworks that treat test clients the same way. Check out the code for our FastAPI test client here and our Django test client here.

If you read this post because you’re interested in how Akita works, you can read about our other test integrations here. We have several of our repositories available on GitHub, including the open-sourced version of our CLI. If building Akita API models to generate specs and catch behavior changes sounds interesting to you, we’d love to get your feedback on our private beta.

With thanks to Nelson Elhage, Mark Gritter, Jed Liu, and Jean Yang for edits. Cover photo by Hermes Rivera on Unsplash.

Share This Article

Join Our Private Beta now!

Thank you!

Your submission has been sent.
Oops! Something went wrong while submitting the form.