Why your code is so hard to understand

“What the hell was I thinking?!?”

It’s 1:30AM and I am staring at a piece of code I wrote no more than a month ago. At the time it seemed like a work of art. It all made sense. It was elegant and simple and amazing. Not anymore. I have a deadline tomorrow and discovered a bug a few hours ago. What seemed simple and logical at the time just doesn’t make sense anymore. Surely if I wrote the code I should be smart enough to understand it?

After one too many experiences like this I started thinking seriously about why my code makes perfect sense while I am writing it but looks like gibberish when I go back to it a few weeks or months later.

Problem #1, overly complex mental models.

The first step in understanding why your code is hard to read when you come back to it after a break is understanding how we mentally model problems. Almost all the code you write is trying to solve a real world problem. Before you can write any code you need to understand the problem you are trying to solve. This is often the hardest step in programming.

In order to solve any real world problem we first need to form a mental model of that problem. Think of this as the intent of your program. Next you need to form a model of a solution that will achieve your programs intent. Lets call this the semantic model. Never confuse the intent of your program with your solution to that intent. We tend to think primarily in terms of solutions, and often bypass the formation of a model of intent.

Your next step is to form the simplest semantic model possible. This is the second place things can go wrong. If you don’t take the time to really understand the problem you are trying to solve you tend to stumble onto a model as you code. If on the other hand you really think about what you are trying to do you can often come up with a much simpler model that is sufficient to achieve your original intent.

Eliminating as much of this accidental complexity as possible is crucial if you want easy to maintain, simple code. The problems we are trying to solve are complex enough. Don’t add to it if you don’t have to.

Problem #2, poor translation of semantic models into code.

Once you have formed the best semantic model you can it’s time to translate that into code. We’ll call this the syntactic model. You are trying to translate the meaning of your semantic model into syntax that a computer can understand.

If you have an amazing semantic model but then mess it up in the translation to code you are going to have a hard time when you need to come back to change your code at a later stage. When you have the semantic model fresh in your mind it’s easy to map your code onto it. It’s not hard to remember that a variable named “x” is actually the date a record was created and “y” the date it was deleted. When you come back 3 months later you don’t have this semantic model in your head so now those same variable names make no sense.

Your task in translating a semantic model into syntax is to try and leave as many clues as possible that will allow you to rebuild the semantic model when you come back at a later time.

So how do you do this?

Class structure and names.

If you are using an OO language try and keep your class structure and names as close to your semantic model as possible. Domain Driven Design is a movement that places extreme importance on this practice. Even if you don’t buy into the full DDD approach you should think very carefully about class structure and names. Each class is a clue you leave for yourself and others that will help you re-build your mental model when you return later.

Variable, parameter and method names.

Try avoid generic variable and method names. Don’t call a method “Process” when “PaySalesCommision” makes more sense. Don’t call a variable “x” when it should be “currentContract”. Don’t have a parameter named “input” when “outstandingInvoices” is better.

Single responsibility principle (SRP).

The SRP is one of the core Object Oriented Design Principles and ties in with good class and variable names. It states that any class or method should do one thing and one thing only. If you want to give classes and methods meaningful names they need to have a single well defined purpose. If a single class reads and writes from your database, calculates sales tax, notifies clients of a sale and generates an invoice you aren’t going to have much luck giving it a good name. I often end up refactoring a class because I struggle to give it a short enough name that describes everything it does. For a longer discussion on the SRP and other OO principles have a look at my post on Object Oriented Design

Appropriate comments.

If you need to do something for a reason that isn’t made clear in your code have pity on your future self and leave a note describing why you had to do it. Comments tend to get stale quickly so I prefer having the code as self describing as possible and the comments are there to say why you had to do something, not how it was done.

Problem #3, not enough chunking.

Chunking in psychology is defined as the grouping of information as a single entity. So how does this apply to programming? As you gain experience as a developer you start to see repeating patterns that crop up over and over again in your solutions. The highly influential Design Patterns: Elements of Reusable Object-Oriented Software was the first book to list and explain some of these patterns. Chunking doesn’t only apply to design patterns and OO though. In functional programming (FP) there are a number of well known standard functions that serve the same purpose. Algorithms are another form of chunking (more on this later).

When you use chunking (design patterns, algorithms and standard functions) appropriately it allows you to stop thinking about how the code you write does something and instead think about what it does. This reduces the distance between your syntactic model (your code) and the semantic model (the model in your head). The shorter this distance the easier it is to re-build your mental model when you return to your code at a later stage.

If you are interested in learning more about the functions used in FP have a look at my article on functional programming for web developers.

Problem #4, obscured usage.

Up to now we have mainly spoken about how to structure your classes, methods and variable names. Another important part of your mental model is understanding how these methods are supposed to be used. Once again this is quite clear when you initially form your mental model. When you come back later it’s often quite difficult to reconstruct all the intended uses of your classes and methods. Usually this is because different usages are scattered throughout the rest of your program. Sometimes even across many different projects.

This is where I find test cases to be very useful. Besides the obvious benefits associated with knowing if a change broke your code, tests provide a full set of example use cases for your code. Instead of having to trawl through a hundred files, looking for references you can get a full picture just by looking at your tests.

Bear in mind that in order for this to be useful you need to have a complete set of test cases. If your tests only cover some of your intended uses you are going to be in trouble later on if you assume the tests are complete.

Problem #5, no clear path between the different models.

Often your code is technically very good, and extremely elegant, but there is a very unnatural jump from program intent to semantic model to code. It’s important to consider the transparency of the stack of models you select. The journey from the program intent to semantic model to code needs to be as smooth as possible. You should be able to see all the way through each model to the problem. It may at times be better to choose a particular class structure or algorithm not for its elegance in isolation, but for its ability to connect the various models and leave a natural path towards reconstructing intent. As you go from abstract program intent to concrete code the choices you make should be driven by the clarity with which you’re able to represent the more abstract model below it.

Problem #6, inventing algorithms.

Often we as programmers think we are inventing algorithms to solve our problems. This is hardly ever the case. In almost all cases there are existing algorithms that can be put together to solve your problem. Algorithms like Dijkstra’s algorithm, levenshtein distance, voronoi tessellations etc. Programming for the most part consists of choosing existing algorithms in the right combination to solve your problem. If you are inventing new algorithms you either don’t know the right algorithm or are working on your PhD thesis.

Conclusion.

In the end it boils down to this: as a programmer your goal is to construct the simplest possible semantic model that would solve your problem. Translate that semantic model as closely as possible into a syntactic model (code) and provide as many clues as possible so that whomever looks at your code after you can re-create the same semantic model you originally had in mind.

Imagine you are leaving breadcrumbs behind you as you walk through the brightly lit forest of your code. Trust me, when you need to find your way back later on, that forest is going to seem dark and misty and foreboding.

It sounds simple but in reality it is very difficult to do well.

A special thanks to Nic Young and Ulvi Guliyev for their input on this article

• • •

Originally published at syoung.org on November 3, 2014.