Every Call You Make: Why Watching Traffic (and eBPF) Is the Future of Developer Tools

by

Jean Yang

In the last few years, we’ve seen a proliferation of tools based on watching network traffic. Maybe you’ve heard the buzz around eBPF.

Recently, eBPF expert Brendan Gregg tweeted that eBPF-based tools are going to shake up the observability industry because developers love ease of use. For background: eBPF (extended Berkeley Packet Filter) lets programs run without needing to add modules or modify kernel source code. eBPF enables drop-in solutions and is, among other things, the way to build a tool that drops into a system to watch network traffic.

An eBPF expert's take on how eBPF will "shake up" observability.

I agree with Brendan. But I’ll raise him one. My prediction: traffic-based tools are going to shake up the entire developer tools industry. Developers love to write code that works. It’s getting harder and harder to do this without understanding your production behavior.

By the end of this post, I hope to have convinced you that the fastest way to understand production behavior is by watching API traffic. And that insights from traffic will improve tooling across software development.

Drop-in solutions for understanding prod means watching traffic

In the last decade, we’ve seen a bifurcation in what people see as the “source of truth” in an application. There has been a “shift left” movement to push more system understanding into technologies around testing and static (compile-time) code analysis. At the same time, we’ve seen the growing popularity of the “test in production” philosophy, centered around software processes to better understand production behavior.

After years of working on “shift left” solutions, it’s no secret that I’m now firmly on the side of “test in production.” As an undergraduate, I once attended a guest lecture by MIT professor Gerald Jay Sussman (co-author of the famed Structure and Interpretation of Computer Programs), who said that people become programmers because “it’s fun to play God.” That might have been true Back in the Day. But the increase in codebase complexity, combined with the rise of microservices and SaaS, means that programmers are now playing the role of systems biologist, or even archaeologist.

It’s also no secret that I remain unsatisfied with the logging and tracing that “test in production” evangelizes today. Today’s best practices for testing in production involve disciplined instrumentation of code for logs and traces. When the team writing the code is the one who cares about the observability coming out of it, this works great. But there are also many situations where this doesn’t work so well, especially on the “99% developer” teams that I’ve encountered. Employee churn, code longevity, and third-party integrations are just a few of the reasons that cause this approach to fall down. More and more, nontechnical teams across the company will need to understand what software is doing in production. We’re going to need solutions that don’t require product managers to instrument code or look at dashboards.

Now let me walk you through the series of deductions that led me to conclude drop-in means watching traffic. If we agree that we want a drop-in solution, then we can’t require the user to change code. To achieve this, it’s possible to either use a language-based or framework-based analysis, or a blackbox solution that lives at the level of the network or operating system (cough, eBPF). There are two obstacles standing in the way of analyzing the code: dynamically typed languages (caused by people not wanting to write types) and massive tech stack heterogeneity (caused, in part, by the rise of microservices and SaaS). But! The rise of microservices and SaaS has also meant that more and more of the action in a system goes across APIs. So looking at that traffic is becoming a better and better idea. ∎

Drop-in traffic-based solutions have worked well for API security

To understand the benefits of watching traffic, let’s look at a space where traffic-based solutions have worked well.

For a while now, security teams have already been quietly and effectively using traffic to understand the systems they’re responsible for. API security companies like Salt Security and Noname Security, for instance, offer solutions that drop in and watch API traffic to detect potentially malicious activity. Mark O’Neill, who is a VP Analyst and Chief of Research for Software Engineering at Gartner, replied to one of my tweets that there is a lot of demand from buyers in the security space to understand their systems in a traffic-centric way.

Straight from the expert: security teams want to discover APIs—and API issues—via traffic.

It makes sense that security teams are among the earliest adopters of traffic-based solutions. They need to figure out a lot of what’s going on across the entire company without involving app developers, who are often incentivized to spend as little time as possible on security. Security teams are often highly technical and wouldn’t have trouble learning to use tools based on traffic.

Security teams had to adopt traffic-based solutions because they don’t have full understanding or visibility into the code, yet they lose their weekends and holidays if something goes wrong in the code. Software teams are increasingly finding themselves in this situation as the modernization and evolution of code bases makes them lose visibility of their functionality. Among software teams, however, traffic-based tools haven’t yet become standard practice.

Why traffic-based solutions aren’t (yet) as popular in dev tools

What does it take for software teams to accept traffic-based tools? My take: a mindset shift among developers, combined with better developer experience for eBPF-based tools.

To give an idea of how many developers regard eBPF, here’s a (very reasonable and intelligent) response I once received to a cold recruiting reachout:

A reasonable and fairly common view on eBPF-based tools.

In some ways, this person is completely right. Most developers are missing basic tools for understanding their systems. And most of today’s eBPF-based observability tools are power tools for experts that aren’t solving these simple problems. This person and I both agree that “99% developers” need easier solutions to observability. Where we disagree: I believe eBPF is the way to get there.

Now for what I mean by the mindset shift. Most developers—and developer tools—operate based on the assumption that app teams are in control of their code: they have full access to the source code and they understand what it is doing. Under this assumption, logging and log analysis make a lot of sense as the most straightforward way to deliver monitoring value. But for teams who don’t have full access to all the source code (as is the case when using third-party services), or who don’t have full understanding of the source code (as is the case with any legacy code), or who have lost track of the nuances of how their features interact with each other, they will want a more drop-in solution that doesn’t require modifying or recompiling the code to get the right logging in place.

Then there’s the other piece: the developer experience of eBPF-based tools. As I mentioned, most tools have used eBPF as a way to help developers “miss no signal” from their systems. When you’re building for a software team that is in control over their code, it makes sense to build a “power tool” that gives developers a fire hose of everything they might want to know. But a solution for developers who aren’t experts in the current system, who might not have the time to do so, can’t just dump every event and expect the developer to make sense of it. I agree with my correspondent that what we need is more automated analysis—but on top of the data we get through eBPF.

A vision for improving developer experience for traffic-based tools

Either software complexity will cause software development to fold in on itself, or our industry will find ways to make it easier for any team to quickly understand what they need to know about a complex, heterogeneous software system whether or not they wrote it.

In my version of this future, traffic is the new code and eBPF is the new assembly language. By exposing the ability to drop into systems and watch behavior, eBPF enables the drop-in solutions we so desperately need. But the power of these tools will come not from being able to observe all properties of the system, but from building abstractions on top of the ability to watch everything. Wanted: the low-code and no-code equivalents of monitoring and observability.

Here’s what I believe will enable better developer experience for drop-in, eBPF-based tools, that I would like to see more tool developers work on:

Automated traffic analysis. Shift right: analyze traffic instead of code! Tools that can automatically make sense of large volumes of traffic to pull out the interesting parts will go a long way in helping developers understand what is going on. And the further "left" (earlier in the development cycle) tools can push this information, the more effective developers can be at building in a way that anticipates production behavior. (My biased take: leverage the decades of work on structure analysis programs and program traces will give more mileage here than a straight-up AI-based solution.)
Combining automation with developer insight. Today, tools expect developers to do either everything (logging and instrumentation) or nothing (rely on the AI). The most productive user interactions here will leverage the insights that developers do have about the system, without requiring them to understand the entire system. The ideal way to do this likely helps the developer understand production implications while coding. I would also like to see more logging and instrumentation become automatically inferred.
Better ways of communicating with the developer. Today, most tools are set up for the developer to give direction to the machine. But as software systems develop more emergent behaviors, it’s increasingly critical to figure out good ways for software to communicate back to the developer, in ways outside of the logging and instrumentation that the developer has already defined. Again, the further "left" it's possible to push this information, the more it saves developer headache.(Our take at Akita: use API specs as the foundation. More here.)

As they mature, these tools aren’t just going to be confined to monitoring and observability for devops purposes, but data from traffic will come to power all parts of the software development process, from planning to development to testing. And yes, eBPF itself needs innovation, as is the point of Brendan Gregg's thread, but a lot of what will increase impact is innovation on the usability around eBPF and traffic-based tools. I’m excited to see investment in more—and more automated—tools for using production data to improve developer experience across all tools. And I’d love to see you join the movement. 👷🏻

‍

P.S. If you’re curious about what it looks like to use eBPF to help understand API traffic, check out what we’re doing at Akita. (Docs here.) Our solution provides drop-in monitoring for HTTP APIs by automatically inferring endpoint structures to support near real-time, per-endpoint error and latency monitoring. We’d love to get your feedback from trying out our beta.