Network protocols start their lives as specifications. There are clear rules about the format of the data, its interpretation, and an allowable range of variation.
But if a protocol is successful, programmers will inevitably adapt it to their own immediate needs, or make false assumptions that become embedded in the system. Specifications are also revised so that what used to be true may now be false.
The focus of this post is on HTTP and JSON, the two protocols providing the foundation of modern APIs—and heck, of the modern Internet. Because they are so common, “everybody knows” how these protocols work. But they are rife with local variances and unexpected corner cases! The foundations of the modern Internet are shakier than we would like to believe. This impacts tool creators and, in turn, anyone who uses APIs or the Internet.
I have written some observations for people building on top of HTTP and JSON, as well as anybody who wonders why the Internet can’t “just work.”
How I became a network protocol archaeologist
I’m a Founding Engineer at Akita, a startup with the goal of making it as easy as possible for developers to understand their web apps. We’re building an API observability tool that passively watches network traffic, reassembles the data, and interprets the higher-level protocols – today, mainly HTTP and JSON. From this we infer API endpoint structures and automatically monitor latency, errors, and more.
Like other teams, we started with a “standard” implementation of parsers for HTTP and JSON: the ones in the Go standard library. When I turned on telemetry that lets us see the parsing errors that our agent encounters, I was surprised at the volume of errors we saw from these components. We had implicitly assumed that everybody was following the specifications for HTTP and JSON. But that’s not really how the world works.
At Akita, we’re building for developers who don’t have the luxury of investing lots of time into their tech stacks. They are mainly interested in getting their jobs done, not ensuring compliance with a specification. If it turns out that they need to use a service that’s not fully compliant, they probably don’t have the time, or the authority, to make a fix. Instead, they’ll adapt and make things work.
On the other hand, just because a team doesn’t use a feature of a particular protocol does not banish it from existence. This can lead to security vulnerabilities when an attacker makes unexpected use of the protocol. Or, a new library or a new service might be difficult to roll out because it does make use of features that were previously ignored. This problem can arise in the other direction too: developers may well assume that an invariant is true when it is only mostly true, or only mostly correct.
And here’s where this causes problems: somebody working only with the spec, or coming in from another team, is blind to the local variances of network protocols. This mismatch can cause hiccups in onboarding onto new APIs or tools, interoperability problems between different projects, or errors that prevent a developer’s favorite client library from being used.
Developed by Sir Tim Berners-Lee and his team in 1989, the HyperText Transport Protocol (HTTP) has become ubiquitous on the Internet. It’s the protocol used by web browsers, and the most common protocol for web APIs, internally and for software-as-a-service. New APIs and new services usually start on top of HTTP instead of developing their own conventions and formats. This is due not just to its familiarity but also due to the support infrastructure of deployed proxies, client software libraries, and shared expectations.
But, familiarity and ubiquity do not imply homogeneity. Let’s look at a few examples from real life where there is a mismatch between how HTTP behaves, what the specification requires, and what developers assume.
Fiction: Private and extension headers added to HTTP should use an “X-” prefix.
HTTP is a plain-text format; its metadata is encoded in key-value pairs called “headers” that indicate properties of the request and response, such as length or type. Developers often believe that if you’re adding a nonstandard HTTP header, you “ought to” mark it with an “X-” prefix to indicate that it’s an extension or private header. (If you look at the traffic sent by the Akita web console, you’ll see a few X-Akita-* headers there too!)
In theory, this convention has been deprecated since 2012, in RFC 6648. The standard says not to do it. However, the desire to use “X-” as a prefix to indicate private extensions has stuck. It’s become part of our idiom for using HTTP.
Some experiments or extensions become standard. But not everybody changes their code simultaneously, so clients and servers have to pay attention to both the “X” and non-”X” versions of a header. Once introduced, a header has to be supported pretty close to forever. This is the reason that the IETF decided to no longer try to specially mark headers in this way.
On the other hand, “X-” itself is no guarantee that the header can be safely ignored, or will not collide with other uses! If everybody uses “X-” plus a short name, then two different organizations picking the same name is quite likely. If you’re adding a new header for private use, you should probably pick a header string that includes your company name or other distinguishing label.
Some standard headers even continue to use the prefix. For example, X-Frame-Options was published in 2013, post-dating the decision to deprecate the “X-” prefix. As a result, both programmers using HTTP and those building tools to understand HTTP cannot ascribe any meaning to the “X-” prefix alone. It might be a nonstandard extension, or an old version of something that is now standard, or a de facto standard that just kept the “X-” anyway!
Fiction: A request will have only one Host header.
The “Host” header identifies which “virtual host” is being accessed: the host and port number of the server to which a request is being sent. This is a “virtual host” because the same server might handle requests for many different domain names. The standard requires that a server reject any HTTP/1.1 request that doesn’t contain exactly one Host header. Even when the full URL including domain name appears in an HTTP request, this header is supposed to be there.
When we turned on telemetry to report parsing errors in our client, imagine our surprise to see that this was false! More than one of our customers were reporting errors due to the presence of multiple HTTP Host headers, which the Go HTTP parser refuses to recognize.
Unfortunately, we don’t actually know the root cause of multiple headers yet. It might be a subtle attack on the HTTP server, although the volume of errors seems too large for this explanation to be correct. It might be an occasional error in some client software, and the server is correctly rejecting such requests. Or it might be that the application our user is monitoring is working fine even with the invalid structure!
We believe we’ll find many cases like this where systems are operating correctly, despite violating the technical requirements of the specification. The lesson we learned as tool builders is that we need to improve our traffic processing capabilities to handle even “incorrect” requests and responses.
Fiction: The Content-Type header identifies the MIME type of the HTTP response body.
It would be great if the HTTP response came with a clear indication of its type: is it an image that should be rendered? (And if so, which image format?) Is it a PDF file? An HTML document? A compressed archive?
This is why we have the Content-Type header, which is supposed to express this information. Unfortunately, the Content-Type header may frequently be absent, incorrect, or misleading. Worse, it may appear multiple times due to bugs or incorrect understanding of the protocol, like the Host header mentioned above.
Content-Type is one of the original sins of the modern Internet. Many file systems did not support a way to attach a MIME type directly to a file; even when file systems do, most applications don’t write them. As a result, file types were often “guessed” by an HTTP server based on file extensions or a configured lookup table.
Applications will frequently send incorrect MIME types, or one that is unknown and unhelpful to the client. A JSON response might be dressed up as an application/* type (but omit the +json suffix that would give it the correct treatment), or mislabeled as text/plain or text/html. Few programming languages have a type system that forbids such errors.
Our telemetry tells us that instead of omitting the Content-type header, which would be acceptable according to the HTTP specification, lots of software uses the empty string “” as a content type! But the empty string is not a MIME type.
It’s no surprise that browser creators quickly adopted a technique of “sniffing” the received data to try to determine the real type of the data. But this created the demand for a way to disable this behavior, and now a server can specify “X-Content-Type-Options: nosniff” to request that the browser believe it and leave the MIME type unchanged. No doubt in another twenty years, we’ll need a “X-Content-Type-Options: nosniff-i-mean-it-this-time” as browser and client programmers work around bugs with faulty “nosniff” headers.
This long-standing confusion around Content-Type means that you may find yourself working with an API that has its types wrong. It may require that you send an expected MIME type in requests that have a body – even if it does not match your data – or send an “incorrect” MIME type back. If a server you use has this bug, you’ll probably just have to adapt, like everybody else. But be aware that the Content-Type header can’t be depended upon!
Fiction: HTTP is reliable
The biggest lie developers tell themselves about HTTP is that it is reliable: data won’t be corrupted or altered. But it’s only sort of reliable.
HTTP is usually (but not always!) carried on TCP, which is a “reliable protocol.” But that means that it has features which attempt to correct for missing data– not that it achieves a particular level of data integrity. TCP places a checksum in every packet, which is validated by the receiver before accepting that packet.
But, sources of error can and do creep in. Bugs and memory corruption can affect data after the checksum has been verified. For example, if the TCP checksum is verified by an offload engine on the network device (NIC), then the transfer into main memory is unprotected by any higher-level checksum. Unfortunately, the device’s DMA engines performing this transfer may occasionally mess up.
Packets on the wire are usually protected both by the TCP checksum and by a transport-layer checksum (such as Ethernet’s CRC.) But a TCP packet may reside in faulty buffer memory, or be corrupted on the way in or out of a network device.
TCP’s checksum is limited to 16 bits, which is really not very much for protecting the large amounts of data sent over TCP every day. If a packet is corrupted on-the-wire but passes the Ethernet CRC, or is corrupted in a location where the Ethernet CRC is absent, then there is a 1 out of 65,536 chance that the TCP checksum looks good anyway. To put it another way, for every gigabyte of data that is corrupted in transit, we should expect about 10 errors will go undetected.
And the rate may be far higher in practice. My ex-colleague Jonathan Stone and his co-author Craig Partridge did a study called When the CRC and TCP checksum disagree which looked at examples where the Ethernet CRC was correct but the TCP checksum failed. From this, they could find real-life examples of errors that were not caught by the link-layer checksums, and from that build a model of how likely it was that the TCP checksum actually detected such errors. Their estimate was that “the [TCP] checksum will fail to detect errors for roughly 1 in 16 million to 10 billion packets”. For every terabyte of data that is sent, we might see as many as 40 errors not caught by checksums!
This class of problem is normally invisible. Errors that occur are usually corrected without any higher-level indication that there is a latent problem. But the potential for uncaught errors is why many large downloads include a SHA checksum to verify the integrity of the downloaded image. Fortunately, a lot of HTTP traffic is now sent over TLS, which offers much stronger checks than just the TCP checksum. But if you’re sending unencrypted HTTP traffic between services, the odds are that you will sooner or later run into data that has been corrupted in transit, even on this “reliable” protocol. Protocols built on top of HTTP should have higher-level checks in place to verify that the data transferred between systems is free of error.
There are more fictions we could explore. There’s the way Unicode should be encoded in a path or query parameter, and the way it actually happens. There’s more than one place authorization information shows up, despite the perfectly good Authorization header. And the HTTP host doesn’t necessarily identify a particular “host” at all! But let’s instead move on to the other half of common APIs: JSON.
JSON and its variants
Fiction: There’s only one JSON
JSON is defined in at least seven different documents, six of them official standards! There’s four different IETF documents: RFC 8259, RFC 7159, RFC 7158, and RFC 4627. The ECMAScript standard also defines JSON, and ECMA 404 is a separate document just for JSON. Finally, json.org has its own definition which is a reference for many programmers who’ve never heard of any of the other documents.
Unfortunately, as we’ll see below, this diversity of documents also leads to a diversity of implementation corner cases, not all of which are handled in the same way.
As an example of the differences between standards, the most recent RFC permits standalone string literals or numeric literals to parse as JSON. But, the earliest RFC does not, and some parsers may still reject the string literal “abc” as valid JSON.
Unfortunately, APIs will typically be imprecise as to whether the JSON they are omitting is plain JSON, a variant, or the JSON5 Data Interchange Format. If JSON, the API may have quirks due to the choice of standard and encoder. A programmer will have to be prepared to debug why the JSON they get is not parsed by the JSON library they’ve chosen.
Even if the encoding is correct, sometimes local conventions will leak into public APIs. As mentioned above, vanilla JSON does not support comments. As a result, some JSON files or even API responses will include field names like “//” or “__comment” to meet this very real human need to explain what’s going on. It’s all too easy for these meta-conventions to leak out and start being treated as real data instead of annotations.
When differences like these arise within a team, it’s often a negotiation about who will budge first. The “standard” (or choice of standard) usually gives way to whatever is easiest to make work. If you’re using a public API, then sometimes the JSON will have to be pre-processed before parsing, or post-processed after parsing, so that your client can make use of the payload.
For a toolmaker, the decision can be harder. Do you offer the widest possible range of support, which could harm compatibility with less-capable libraries? Do you add a lot of flexibility in which convention to follow, possibly confusing programmers with a lot of seemingly-irrelevant choices? Sticking strictly to one version of the standard is no guarantee of success.
Fiction: JSON is UTF-8 encoded.
Among the confusing parts of JSON’s evolution is its character encoding. Current JSON must be encoded as UTF-8 (according to the IETF standard.) But earlier documents permitted UTF-16 or UTF-32. Few implementations made use of these choices; the standard stuffily notes:
However, the vast majority of JSON- based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability.
In the real world, you may not be so lucky, and be working with a “legacy” system that chose UTF-16 encoding. Many parsers fail to handle it, so you might have to switch tools or transcode the data first.
However, the handling of invalid UTF-8 is even more confusing. The formal specification given in RFC8259 lets a JSON document specify a set of Unicode escape sequences that do not identify a valid Unicode character. It notes:
The behavior of software that receives JSON texts containing such values is unpredictable; for example, implementations might return different values for the length of a string value or even suffer fatal runtime exceptions.
In addition to the challenges of handling even valid Unicode correctly, you may also have to worry that an attacker may try invalid Unicode sequences to search for a security vulnerability. Validating input data before using it is always good practice, and just because JSON parses correctly does not serve as complete validation.
Fiction: JSON is easy to parse.
A 2016 study (updated 2018) entitled Parsing JSON is a Minefield compared the results of more than 75 different parsers on a collection of test cases. Most showed significant differences in what they accepted, and which edge cases they handled correctly. Even the JSON checker provided by json.org at the time turned out to accept inputs that did not match the grammar from the same site! For example, it rejected [0e1], which is a valid JSON array containing the number 0.
A more serious class of errors is handling the ability to nest JSON arbitrarily deeply. An implementation is allowed to restrict this maximum depth (or indeed settings like maximum string literal length), but many applications fail to do so. This can lead to stack overflows if a recursive-descent parser attempts to follow thousands of opening braces. Even Go’s standard encoding/json package suffered this problem from its initial release in 2010 until version 1.15 in 2019. See encoding/json: provide a way to limit recursion depth · Issue #31789 · golang/go · GitHub where I reported the error. (This was particularly egregious in Go, as there is no way to recover from stack exhaustion.)
JSON parsers can be vulnerable to many errors that cause crashes or even security vulnerabilities. The language is not, in fact, easy to parse. If you’re parsing JSON, make sure that you’re using a parser that has at least been fuzz-tested. Your parser should also permit you to limit the amount of resources that an attacker can cause you to use by sending a malicious JSON payload.
One of Akita Software’s future goals is to find a way to tell users about variance in how they use JSON, to alert them to possible compatibility or security issues in their JSON-based APIs. Unfortunately, our Go-based agent uses the standard encoding/json package, which simply rejects anything that doesn’t pass Go’s idea of what is valid JSON.
The common theme here is that even these well-known protocols, HTTP and JSON, can exhibit strange behavior or unexpected cases. These are not necessarily “bugs” that prevent the system from operating as intended. Programmers are good at making things work and overcoming obstacles! But often a programmer can only control one end of their connection, and so it doesn’t matter what the spec says – the way they get their job done is by adapting the code to the truth of what goes over the wire.
Application developers need to know what's actually happening, not what they think should be happening. Unexpected differences might cause their preferred library or tool to work in a new environment or with a legacy system. Knowing where the difference actually resides will be the first step towards making a good decision about what to do. This might be re-processing data to make it compliant. It might be relaxing a restriction that doesn’t matter in practice. It could involve switching to a different library that more closely matches the conventions in use. Or, yes, it could mean fixing the other side of the connection to match the specification.
We at Akita believe this is the right way of observing systems: by looking at the truth of what’s actually happening, and describing that in a way that developers can understand. Our tools need to be robust enough to deal with all the complexity that real-world implementations encounter. And if you’re looking for a tool to help do this, try out our beta.
Photo by Yves Tessier. Shared under CC BY-SA 4.0.