How to start your code on the right track.
Goal
- Point you towards the skills you need keep code easy to maintain
- We want to prevent the "How did this code base get so out of control?" situation
What does the problem look like?
The most common situations I have found a company in are:
- Has an existing legacy code base that is so convoluted it is to late to fix now
- Is rewriting their legacy code, fully expecting not to make the same mistakes, but is in reality just creating a mess just as bad, with less familiar manifestations of the same mistakes
- Is a startup or has a greenfield project, taking full advantage of the clean slate to avoid the horrors of an overly complex codebase, while unknowingly creating a future with a convoluted legacy code base that will be too late to fix by the time the excessive complexity is noticed
The problem of complexity management with respect to a code base is a topic I have been focusing on for decades. Here I list the 3 most important principles to come out of this research, but first, let me precisely define the problem.
Problem Statement
As the number of features approaches infinity, the cost to add one additional feature should approach a constant.
Stating the problem this way emphasizes that I am only concerned with solutions that can scale to any amount of complexity. First you have to get a handle on the overall complexity of your application. Only then does it make sense to worry about choices made at the individual file level. So what are my 3 principles?
(1) Single Entry Point
Have only a single entry point, and keep logic out of entry points, including frameworks and containers. Having multiple entry points means you have logic somewhere to choose between the entry points, and that logic will be in a place that is hard to test. Choosing an entry point is a type of application behavior, and application behavior belongs in a programming language, where it is easy to test. A detailed example of what I mean by this can be found here. Included is a link to a 10 minute video that explains how and why this works.
(2) Isolate Non-Determinism
You want to keep the parts of the application you control (deterministic) separate from parts of the application you do not control (non-deterministic). You control anything that is define by the rules of your programming language, for example how an "if" statement works, or how you iterate over a sequence. You do not control anything that depends on the operating system rather than the rules of the programming language. This includes the network, filesystem, environment variables, system clock, etc. A detailed example demonstrating how to take code with a mix of determinism and non-determinism, and convert that to code that is easy to test, can be found here. This also includes a link to a 23 minute video that walks you though every step.
(3) Use tooling to detect unintentional increases in complexity
Use tooling to detect certain types of complexity. Humans are good at deciding what makes sense within the local context of the code they are looking at and the immediate collaborators to that code. Humans are terrible at keeping track of complexity as the amount of code to maintain continually increases. This is where tooling comes in. Something has to be in charge of noticing the subtle increases in complexity with regard to the big picture while the developer is focused on smaller parts. One aspect of complexity worth paying attention to is dependency cycles. How well the stable dependencies principle has been followed is a pretty good indicator of how difficult it is to add new features over time, and the existence of a dependency cycle almost certainly indicates a violation of the stable dependencies principle. It is easy to fix cyclic dependencies if they are noticed soon. Let a cycle exist in the code too long, and you invariably end up with a tangled mess of dependencies that have to be comprehended all at once to be comprehended at all. For languages targeting the Java Virtual Machine, I have created the detangler application. This allows you to fail the build once your dependency complexity gets so bad that the dependencies actually cycle back upon themselves.
Test Driven Design
Writing your tests first and letting that drive the design of your code can help guide you according to the 3 principles listed here. Your test will force you to handle the behavior in code your test has access to, preventing multiple entry points. Testing non-determinism is hard, which will encourage you to hide that non-determinism behind some sort of contract (interface, trait, protocol). Finally, writing the tests first helps prevent you from introducing cycles, because the existence of a cycle is going to force you to jump between the tests instead of focusing on one test at a time.
Foundational Theory - Further Reading
- Structured Design
- Larry Constantine identifies and defines two intertwined characteristics central to understanding code maintainability
- Coupling
- Cohesion
- Larry Constantine identifies and defines two intertwined characteristics central to understanding code maintainability
- Object Oriented Software Construction
- Bertrand Meyer defines and coins the phrase "Design by Contract"