Oops! Something went wrong while submitting the form.
July 22, 2021
Where My Specs At: From the Front to the Back
Share This Article
What do no-code, low-code, infrastructure-as-code, and the other hot software trends have in common?
They’re all about figuring out the details based on specifications about what the programmer wants. You probably don’t think about it this way. And you probably don’t care.
In the last few years, there’s been a lot of excitement over automating away more and more of software engineering. You can now make an app from a spreadsheet! Instead of hiring an ops person, you can simulate having one via code! You can now call an API for anything!
But, in part thanks to what all this automation enables, software systems have been quietly getting more and more complex. One place where automation tools can’t help you is with the emergent properties of these systems, which now have lives of their own.
This post is about how, if we turn specs on their head, they can help us through both automation and complexity. I’ll talk about what specs are, why they’re good, where they fall down, and how inferring specs is the way to scale automated reasoning about complex systems.
What’s in a specification?
Here’s a quick primer on what a specification is, versus an implementation:
A specification says what software should do.
The implementation contains the details of how to do it.
There are a lot of philosophical discussions about what’s a specification, versus what’s implementation. We don’t have time for philosophy in this post, so we’re going to move on.
You might have first learned about specifications in school as “design documents” that you had to write before starting coding projects. You might still have to write them at work, sitting through meeting after meeting where you wonder how much the actual implementation will drift. You might also wonder, “Isn’t that what XML was about?” That too.
I’m going to spend most of this post on my favorite kind of specification: specs that are tied to implementations. The simplest of these is a type like `int` or `string`. Slightly fancier are types like “positive int” or “string that user Alice has access to see.” Specifications can be super rich in their descriptive power: the first paper I ever wrote was about using first-order logic to specify type safety and memory safety properties about assembly code, for the purpose of proving properties about an operating system. The cool thing about new no-code and low-code tools is that they’re also getting a spec from the developer, but in an implicit, ergonomic way.
Specs help you scale
Tying specifications to implementations has two main benefits for scaling systems.
First, specifications make it possible to automatically check parts of your system, enabling you to more easily grow the system larger. For instance, in any software system, knowing that a value is guaranteed to be a given type (type safety) saves developers time, even if someone hasn’t proven type safety down to the assembly. The help with scaling is a big part of why Facebook, Google, and Microsoft have put so much work into statically type-checking dynamically typed languages.
Specifications also make it possible to automatically implement parts of your systems, making it easier to build bigger systems. What’s closest to the no-code and low-code tools is work in program synthesis, which studies how to automatically generate programs from program fragments, high-level specifications, types, and more. (To give an example: some coauthors and I recently published a program synthesis paper, about automatically rewriting code to insert information flow security checks by using the type-checker to figure out where checks are missing.) You might also be familiar with directly executing logical specifications in languages like Prolog and Datalog. And with some of these new low-code tools, you can get functionality in minutes from using a spreadsheet that would have taken days or weeks to code up otherwise.
But specs have a scaling problem
But specifications can only help you if the specification is known. For instance, with the rise of SaaS and APIs, needing to know the specification is becoming a bigger and bigger problem.
Here’s specs start breaking down in modern microservices. If you’re calling one other service, or maybe a third-party API like Stripe or Twilio, you have a decent idea of what you expect it to do. But more and more, you’re likely to find yourself calling services using inputs to your own API, or outputs from other APIs, you’re from one service to pass to another service. There might be a specification—explicit or implicit—for what happens across individual APIs, but what happens across all the services and APIs enters the realm of emergent behaviors.
It’s not that there aren’t ways to specify what happens across a big, messy distributed system. Here are a couple of approaches that would make it much easier to plug into tools and automation across software systems, but the barrier is getting programmers writing them:
API specifications. In the last few years, there has been a lot of standardization around the protocols for expressing specifications for individual APIs, for instance HTTP/REST, gRPC, and GraphQL. But what my team and I have found through our user research with software teams is that even API specs can be hard to come by! Many developers, especially those building internal APIs and/or part of fast-moving companies, don’t keep their API specifications up-to-date.
Formal model of distributed systems. You might have also heard of TLA+, a modeling language that people have used for proving the correctness of distributed consensus protocols and such. But come on, this is web programming, where people didn’t use types until they didn’t have to write them explicitly anymore!! And the systems that people modeled with TLA+ are a lot less complex than the “microservice death stars” we’re seeing pop up.
This leaves us stuck not having specifications for properties like how services are connected across the API graph, which APIs might be interacting to cause performance bottlenecks, and how data flows across services.
From the front to the back
It turns out that there is a way to get specs from these complex systems with emergent behaviors. We just have to change our expectations.
Here are the main obstacles to getting specs that talk about our complex systems that are full of services and APIs:
The specification hesitancy problem. People don’t like to write specifications. They don’t like to write types. There’s a long history of this throughout the history of computing.
The lack-of-specification problem. There’s the additional problem that, in these complex, accidentally distributed systems, people don’t even know what the cross-service behavior should be, so they couldn’t write specifications if they wanted to.
In short, people don’t want to write specs and wouldn’t be able to even if they wanted to.
My one grad school mentor said that how he got all of his ideas was thinking about what assumptions everybody else had, that didn’t need to be assumptions. Here’s what we assume with the two problems from before:
Developers need to write specifications. We’ve assumed this because, up to now, where else would you get the specification? But the rising prevalence of type inference should give us hope.
A specification needs to be something that drives an implementation. Many of the communities that have been enthusiastic about specs have preached a specification-first way of building. But, to be honest, even as a specs enthusiast I’d often prefer to write them afterward because I don’t have the details to specify everything up front.
If we overturn both assumptions, we’re back in business.
It turns out that a lot of people have figured out how to infer specifications from program executions. Check out, for instance, Ras Bodik et al’s work on specification mining. The Daikon invariant detector has shown just how far you can get by the “guess and check” method of assigning program properties that are always true. One of my favorite papers, about the Clearview system, shows how to use Daikon-style invariant detection for anomaly detection good enough to defend against a real-life red-team. And Facebook has been using invariant detection effectively at scale, for example for learning authorization rules.
At Akita, we’ve been working on taking this idea of inferring specifications and applying it to system behavior. This is a blog post about specs and general, so I’ll just say a couple of things to convince you that it’s possible and leave the rest for another post:
We treat services in a blackbox way and focus instead on cross-service interactions.
Instead of looking at program executions, we look at API traffic traces.
We’re starting with invariants like endpoint structures and types (what you’d find in an API spec) and building up from there to do more invariants and also properties of the API graph (more like what you’d find in a TLA+ model).
While some may hope that deep learning can solve all of our programming problems, people are going to want transparency and control somewhere for problems of sufficient complexity on systems of some scale. I believe this transparency and control is going to come in the form of inferring specifications. At Akita, for instance, we’re all in on inferring API-centric specifications towards one-click software observability.
What we’re doing at Akita is just one example of what you can do by inferring specifications. Specifications are super powerful—and they’re so much more than XML, fancy types, or TLA+.
I’d love to see more people thinking about specifications that get inferred “in the back” of implementation. And I’d love to hear about the kinds of problems people would like the solve this way, whether or not they have to do with API-centric observability!