This is a guest post by Guilherme Mori, CTO at zMatch. Guilherme has previously worked in startups for over twelve years in development and devops roles. If you are interested in building the future of electric mobility and clean energy in Brazil, zMatch is hiring!
You might think that, as the CTO of an early-stage clean energy startup, I get to build a lot from scratch.
For better or for worse, this isn’t the case. I spend a lot of my time solving the problems of the past. I know that this isn’t the story that people like to tell. It’s far more popular to talk about sexy new tech. But I suspect that a lot of you out there are like me. I believe that talking more about these issues is the best way to make things better.
This blog post is about how legacy code haunts me even at my small startup, the problems that keep my team from moving faster, what I want to see in a practical solution, and what I’ve been doing in the meantime.
How my startup ended up with a legacy subsystem
First, a little about me. I’m Guilherme, the CTO of zMatch, a clean energy startup in Brazil. We are passionate about clean and sustainable energy-powered machines. zMatch is looking to electrify Brazil's fleet by matching clean energy with smart electric mobility. As an early-stage company, our main goal right now is to understand consumers and market to offer the best mobility solution, what often is called in the startup world product-market fit (PMF). My job as CTO is to ensure that the technology, product discovery, and strategy are aligned with what we believe the product should be and deliver as quickly as possible.
I joined zMatch a few months ago as the first full-time technical person. Previously, the team consisted of both founders (co-CEOs), the CFO and the three heads (operations, energy and mobility), with third-party contractors writing the code. When I joined, I inherited a refurbished and reused monolith code base, with lots of unused and commented dependent code. My job was to get the code in shape in preparation for launch.
Today, zMatch is post-launch (check us out if you want to find incredible machines and or clean energy in Brazil!) and I run a team of three engineers, a product manager and a product designer. In the last few months, my team and I have written and broken apart several microservices from the main monolith. Even though we have spent the past two months rewriting our old code, many of our problems still come from having to solve the problems of the past. While starting out with contractor code certainly made things harder for us, it was also the only solution available to my team without anyone in-house to oversee software development.
What keeps my team from moving faster
Here’s what a week in my life looks like. We have weekly planning meetings to avoid misdirection and make sure we keep on track with all products being developed. Although we spend at least half of the time migrating and maintaining existing code, we generate value by shipping new features at least once every two weeks. However, this is not enough to achieve PMF, we need to keep moving forward to get a product the market loves and buys. For us, getting to PMF is existential: the goal is to show we can dominate the Brazilian clean energy and electric mobility markets and the market does not care about our technical debt. My job is to make sure the engineering and product team makes it possible to give access to clean energy, provide the best electric vehicle (EV) sharing experience and the best knowledge base about electric cars and its infrastructure.
Because we have not had time to stop the world and clean up our technical debt, we need to find ways to move fast on top of a less-than-perfect code base, where it’s not always easy to understand what’s going on. There is duplicated code, unused code, lack of uniformity in patterns, lack of tests, and more from the legacy code, in which we are working to address most of the biggest issues. We’ve been decommissioning the old code in pieces by transforming the monolith into smaller microservices. It’s been hard because implementations are not clear enough, having multiple points of entries for the same data, but being processed in different forms. The lack of consistency makes maintenance and migration harder and more time-consuming than anticipated.
How tools can help developers like me
Here’s the issue I’ve found with many of today’s tools. There’s either no configurability, or they assume the ability to sink a lot more time in configuration than my team and I are able to spare. For instance, we started out using AWS Cloudwatch for alerts in our Fargate/Pulumi system, but we were only getting aggregate alerts. There are certain API endpoints we care more about than others, so these alerts were noisy. We ended up having to turn them off.
It’s not that I don’t know what our code base “should” look like. It’s that, for the stage of our company, the existing code we have, and our current team size, I know it’s not possible for us to even get close to the ideal that many developer tools advertise being able to help us get to. (And I know many other companies that are in the same position!) When I look for tools for my team, I look for tools that:
- Help me understand the existing code we have. A major source of our problems is moving fast with the system we have. For instance, when we update code, we want confidence that the new code is interacting with our existing systems. Any tool that doesn’t assume I can write everything from scratch, that additionally helps me work with my existing code, is helpful in my book.
- Take little work to set up, maintain and learn. Many of our team’s issues come from not understanding our existing code as well as we would like, so any tool that requires us to do instrumentation or make custom dashboards requires a deeper knowledge of our systems than we currently have. Also, being easy to learn and understand the output is critical due to the lack of time we have to look into everything.
- Help me focus on the parts of my system I care most about. We’re moving fast and I know that there are endpoints that aren’t as fast as other ones. As people do at any startup, we’ve judiciously cut corners based on the tradeoffs we needed to make. As a result, aggregate tools like Cloudwatch aren’t helpful for us. I know that not all of my endpoints are fast, and looking at aggregated data hides the ones that should be as fast as possible!
As I’ve written in a previous blog post, I first discovered drop-in observability with Akita because I was looking for a tool to understand our existing code. Today, now that we’ve shored up a lot of our code base, we use Akita for drop-in per-endpoint monitoring. Our monitoring consists primarily of checking Akita and AWS ECS on live dashboards. We’ve been working with Akita as a design partner on their new alerts feature, which I’m looking forward to trying out.
Before zMatch, I’ve worked at a variety of tech companies, from ideation to hypergrowth, and we all have our share of needing to solve the past, even when you’ve achieved product-market fit and are executing on a focused solution. The minute you have deadlines, or people leaving the company, you end up in a similar situation to the one I described. And unless you can devote whole teams to cleanup and migration efforts, you end up spending a lot of time reconciling your ideal tech stack with the tech stack you inherited.
I think it’s very fun and meaningful to work at an impactful startup like zMatch, especially given the pace we’re required to move to win the market, but I also understand that it’s at the cost of not cleaning up tech debt the way I might if I were at a company like Meta or Google. I’m glad to see tools starting to build for our situation. I would love to see more. (And if moving fast while working on clean tech and smart mobility sounds fun, we’re hiring: feel free to reach me out to talk about it!)
Photo by yeswanth M on Unsplash.