The Codebase That Lasts Twice As Long Costs Half As Much
Coding for The Messy Reality
No codebase was meant to be legacy code. And yet, here you are, plowing through 5000+ line source files so littered with undocumented side effects that touching any line could cause a failure on the other side of the repository. Even with a valiant attempt at refactoring, a codebase like this will probably deteriorate faster than you can patch it. This is life in a software debtor’s prison.
Time to start fresh and begin building a brand new system! One that is clean and modern. Version 2.0 we’ll call it. We will learn from our mistakes and get it right with an all new team.
This time it’ll be different.
The Greenfield-Bedlam Cycle
Take a deep breath.
That tangled codebase you’re looking at started its life with similar aspirations. And then reality hit: a quick fix here, a compromise there, some over-eager abstraction in the heat of the moment, and before long, it’s death by a thousand corner cuts.
With stakeholders to please and deadlines to meet, the technical debt piles up, until all that remains of the original architecture is a chalk outline in the rough shape of a strategy design pattern.
Once a codebase has been through the grinder of business for a fiscal year or two, the debt can become so expensive that a certain class of organization (that rhymes with schmenterprise) would rather just send it adrift on an ice floe and start over from scratch.
I call it the Greenfield-Bedlam Cycle of enterprise software development; when a large enterprise keeps building the same software solution over and over, only to have it become a nightmare of incalculable risk within months of deployment. When this happens, the old solution is scrapped and the cycle starts over with a brand new Greenfield project.
Even in lean organizations with stronger technical vision, I have encountered the view that non-trivial codebases cannot be kept clean and maintainable over longer timeframes. At least not in fast-evolving fields like front end web development, where the same framework library meta-framework will happily reinvent itself five times in ten years, each time creating a trail of dead remains of legacy code scattered across the web.
You can’t blame someone for having a cynical stance on the longevity of things when the very house they live in was built on quicksand.
Confronting The Messy Reality
How do we break the Greenfield-Bedlam Cycle? Can we somehow stop the atrophy of our codebases and create software solutions that last longer - much longer - and that remains functional, maintainable, and extensible for years or decades, without sacrificing development speed or relying on mythical 10x developers? And can we achieve this not just occasionally by happy accident, but methodically, repeatably?
I believe so, and the key to unlocking this power is acknowledging the imperfections of technology and developers, and the messy reality that your code will meet once it hits production. As a developer or software architect, you cannot opt out of reality, so you have to design around it.
Messy Reality Number 1: Requirements change
All project requirements are based on assumptions and limited information. In a matter of hours, assumptions can be invalidated or leadership priorities can change. Perhaps you simply encounter overwhelming success and need to scale beyond your wildest projections.
How would you change your design to face the reality that requirements could change at any point in almost any way?
Messy Reality Number 2: External circumstances change
Any new codebase is designed based on a version of reality that will not exist in 18 months. Technological innovation could make part of your solution obsolete, a competitor could launch a killer app that pulls the rug out from under your product, security vulnerabilities may be uncovered, or dependent libraries are abandoned by their maintainers, forcing you to replace them.
In short: Reality drifts
How would you change your design to face the reality that almost any technology it uses could require replacement on short notice?
Messy Reality Number 3: Mistakes are made
Even a stack of mind-boggling complexity can be maintainable by a highly skilled developer with in-depth knowledge of the tech, the domain, the patterns and paradigms. That person could never have an ‘off’ day; and they would always be given all the time and resources they need. Does that sound like the world of enterprise? I have it on good authority that it’s not the world of FAANG either.
In all likelihood, the developer assigned to work on your stack on a given day is going to be on a deadline or unfamiliar with some part of the domain or tech. They might be lazy (in the “move fast and break things” sense of the word), or just having a bad day. A lot of the developers who are going to work on your codebase in 18 months haven’t even been hired yet, and some are still in school or interviewing for their first consulting job.
The fact is, your code is going to see some hasty workarounds in its life, and not even the most meticulously designed software architecture survives first contact with a junior developer. That is, not unless it was built to adapt and endure – to resist bad changes and even self-heal over time.
How would you change your design to face the reality that the developers maintaining your solution will be making mistakes like they’re going out of fashion?
Principles of long-lived software design
How do we architect systems to survive and even thrive in a messy reality?
Principle 1: Know the developers
“People are part of the system. The design should match the user’s experience, expectations, and mental models.”
To achieve longevity in our solutions, we must consider the skill and time constraints of the end users (developers/maintainers) and design a system with the largest-possible Pit of Success for the average developer to stumble towards.
This often means “killing your darlings” by meeting the developers where they are, not where you wish they were. There could be a significant skill gap between where the team is today and where they need to be to maintain the system you’re proposing. Few companies will invest in a comprehensive training program to not only re-train your current developers, but also bring all future hires up to speed – including after you leave or get promoted to senior management?
Even if it breaks your heart; that cool technology you like probably has to go.
Knowing the developers can also mean creating frameworks or abstractions that remove boilerplate and flatten the learning curve, or relying on patterns and even naming conventions that are already familiar to the devs.
In 2014, I had to build a frontend framework for a team of backend developers, and rather than try to convert them all to JavaScript enthusiasts, I created an HTMX-like markup syntax and async DOM manipulation component allowing them to build and maintain interactive frontend features without ever having to leave their familiar C#.
Finally, ‘knowing the developers’ means helping the developers know. Supporting their daily workflow by investing in writing rich documentation, templates and developer tooling, and spending time evangelizing the system, will reduce the friction of working on the system, and pay dividends in the long run.
You want both existing and future team members to feel intuitively familiar going into the codebase. This will allow them to make robust changes from day one.
Principle 2: Keep It Simple
Designing for longevity might sound like advocating for highly abstracted Ravioli Code, but that is not the case. For instance, in the early phases of development, you can safely assume that each iteration will be so different that any abstraction you built in the previous one will have to be scrapped anyway. The pragmatic way to design for such drastic change is to cobble together an MVP with bash scripts and bubble gum, since the next version will start from scratch either way.
Any complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system.
Gall’s Law
As a rule, abstractions should be avoided until they can pay for themselves by taming parts of the code which have become unwieldy. Pragmatic, well-timed abstraction is beneficial, whereas premature abstraction violates Gall’s Law and leads to broken software.
Start Simple. You Ain’t Gonna Need It.
Principle 3: Design for change
Try to ensure that the systems you design are easy to change.
Naming things
Make the code - and repo structure - intuitive, adhering to the Principle of Least Surprise. If a developer doesn’t find the natural place to apply a change in the first or second place they look, they might just come up with a creative solution instead. Now you have two places where authorization rules are defined. Next week it could be three. Every ‘surprise’ in your design will attract code rot over time.
Encapsulation
Identify natural domain boundaries and encapsulate them with clear contracts. Rather than a technical boundary defined by microservice gateways and deployment bundles, or an organizational boundary defined by lines of management, a ‘domain boundary’ encapsulates a family of concepts which are likely to change together because they are intrinsically coupled, such as Users and UserProfiles.
Encapsulation by good domain boundaries simplifies future refactoring and aligns with real world aspects of your business/product, which are naturally long-lived.
The paved road
Prefer the well established stack over the ‘best’ one, since the 75% ‘batteries included’ solution that just works beats the 100% solution that requires constant tinkering because the stack is unstable and the developer ecosystem doesn’t exist yet. The exception is when you’re building systems at massive scale, where the absolute value of even a small incremental improvement can dramatically outweigh the cost of a team tinkering away on the 100% solution.
For open source dependencies, consider the size of the community and the number of contributors. If the project is maintained by a single person, it makes for a risky dependency. How frequently are major versions released? That is how often breaking changes are introduced. Do the maintainers keep updating and patching older versions, or will your team be forced to migrate to whole new versions every year to address unpatched security vulnerabilities?
When it comes to cloud platforms and application frameworks, these are load-bearing structures for your application, and you should choose the most stable and well supported option available. When it comes to libraries and services, you can afford to be more experimental as long as you take the necessary precautions, which brings us to…
Loose coupling
Keep systems loosely coupled: if you have fifty services all sending their logs to some 3rd party analytics tool, then consider creating a custom facade library to wrap all communication with the 3rd party tool. That way, when the day comes that the company changes logging tools (a safe bet), you only have to change the facade library in one place rather than all fifty services.
…but be pragmatic: If you can’t achieve looser coupling without adding a ton of complexity (stomping all over the KISS principle), you are probably better off accepting a bit of tight coupling. Certain key decisions, such as your choice of cloud provider, are one way doors with no easy way out, and while this may feel risky, it can bring great benefits; and one should never trade ten birds in the hand for one in the bush.
Open standards
Make sure that all critical business data is stored and accessible over open or de facto standards. ODBC is an open standard, as is Apache Arrow, whereas AWS S3 is an example of a defacto standard.
Stay away from closed proprietary data formats to prevent data lock-in and maximise the chances of future tool integrations just working out of the box.
Go build things to last
“What are you waiting for?! DO IT! JUST DO IT! YES, YOU CAN! JUST DO IT! If you’re tired of starting over, STOP GIVING UP!”
~ Shia LaBeouf, possibly referring to this article
Once you begin thinking of code in terms of longevity, it changes your approach to most aspects of software development.
The value of good documentation skyrockets when code needs to outlive multiple teams of maintainers. You still take on tech debt, but you may determine that some classes of tech debt (e.g. a quick and dirty one-off script or MVP) are acceptable while others (unintuitive naming or convention-breaking folder structures) are not. Your policies and preconceptions about automated tests, encapsulation, etc. – all need to be revisited when designing systems to stay in production for a decade or longer.
In return for this, however, you get a codebase that actively resists bad code; that neatly allows for innovation or changing requirements; a codebase that stays lean while maintaining high developer productivity for years – and which doesn’t require a costly bottom-up rebuild for every 2-3 years.
The result is faster development, higher quality, happier developers, and all of that at a significantly lower total cost of ownership.