Useful Abstractions - /*Commented Out*/

“Everything should be made as simple as possible, but no simpler.” So goes the quote attributed to Einstein. This is excellent advice for programmers, but the last part should be especially noted. How do you know when you’ve reached that point at which things can’t be made simpler? And once you know, or think you know, how simple is simple enough, how to you achieve that simplicity? My answer to the first question is that you have to build a mental model of the space of program design, a process that takes time, trial and error, and many lines of code written, run and debugged. My answer to the second, when it comes to programming, is abstraction.

Every line of code comes with a cost. Every class, method and variable adds functionality but also contributes to the total amount of code that must be maintained, tested, and understood by future custodians. Abstraction is necessary to keeping down both the number of lines of code needed and the mental effort required to understand the code that is there. Any codebase of any significant size will contain more code than any individual can keep in his or her head, and the trick is to cut down on the number of things that the programmer has to think about at one time. Object-oriented abstractions allow you to hide details in the definition of a type and combine them by composition or inheritance; functional abstractions allow you to hide details within the definition of functions and combine them with parameters and returned values.

And abstraction, in programming, is accomplished through indirection. Indirection is what allows you to move the details to a higher level in a class hierarchy, or to defer the details until later by passing them as a function argument.

Complexity is the ever-present enemy that programmers must contend with, but not only does it never sleep, it comes in different disguises. We can think about horizontal and vertical complexity, both of which burden you with too much to think about at one time. The former is a matter of too much going on in one place, while the later requires you to think too much about how one part may be effected by other parts, sometimes in obscure ways. This might mean having to be aware of the effects of global state on a piece of code, of having to navigate a many-layered class hierarchy, or having to dig through a deep call stack. And the cause of each of these sources of complexity is indirection.

But wait, didn’t we say earlier that indirection is what gets us abstraction, and abstraction is what we need to keep our code base from becoming a mess? We did, and it is, but the reality is that more level of indirection create their own problems. So how do we reconcile these two facts?

What Makes Them Useful

Abstractions are necessary in programming, but I believe that in order to be useful abstractions, they have to pay for themselves. What I mean by this is that, for an abstraction to be worth the complexity it is adding, it has to pay for itself by removing complexity, either immediately or by making it easier to extend code in the future. In order to make a new abstraction worth that bit of extra space that it will take up, the tests and documentation that will need to be written and maintained for it, and in the mental real estate it will occupy in the minds of the developers working with it, its existence should, ideally, be justified by the code that you are able to remove once it exists, as well as the code you won’t have to write in the future because of it. Abstracting is investing in the future of your code, the dividends of which investment are paid when you have code that is easier to modify, easier to understand and just less of it.

But there are bad investments as well as good, and there are abstractions which don’t add anything more to the project than more indirection. Sometimes what is needed is to remove a level of inheritance, when collapsing a class hierarchy allows you to remove some code that was not providing any actual value.

This is the reason for the YAGNI principle: creating code that you think you might need at some point but which has no current use is a good way to create complexity and add code that does not pay for itself. This is also why the proliferation of Listeners, Builders, Factories, BuilderListenerFactories and the rest of the design pattern zoo, often spotted in Java code, causes eyes to roll. This is not to imply that there is not a place for design patterns in a programmer’s toolkit, just that their indiscriminate use adds complexity rather than reducing it.

Abstractions enable modularity, another key weapon in the fight against complexity. This much is easily understood, but once you sit down to write that modular code, choices need to be made about what exactly you make modular, because just as adding indirection can either help or hinder your enterprise, depending on how judiciously it is used, code that is divided into modules may not really be encapsulating anything useful. Understanding the importance of modularity is a starting point, but it is not an automatic ticket to the Promised Land, flowing with milk, honey and reusable code.

So, according to the way of looking at things I’ve laid out above, indirection enables abstraction, which enables modularity, all of which help us deal with complexity. At the same time, every indirection, abstraction and module comes along with its own contribution to the total complexity of a codebase, which is why the right modules have to be created based on the right abstractions. So then, how do we know when we are on the right track?

Useful Abstractions and Where to Find Them

While I will not say that a top-down, preconceived architecture cannot capture useful abstractions, it is my experience that they tend to bubble up out of your code by themselves, as long as you keep your eyes open for them. Before you write a line of code, you have to have some general sense of what that code is going to do, of how it fits into a bigger picture, even if it is the first line and currently the whole of your program. After writing many lines of code, patterns will inevitably appear, and this is when you can start developing abstractions of those patterns which will eventually pay off in reduced complexity.

This is why refactoring is so important: because the ultimate shape of your program, what needs to be abstracted, is not likely going to be obvious before you have some working code in place, and when an opportunity presents itself to abstract some functionality, you need to be ready to make those changes just to keep the constant accretion of code from becoming as organized as a municipal landfill.

Code and Pasta

Code has been compared to various types of pasta, and given how many varieties there are I wonder when someone is going to come up with a way to distinguish not only between spaghetti and lasagna code, but linguini, fettuccini and penne. In any case, while an apt metaphor, there are other metaphors that we can use to reason about what good code is like, and one that I’d like to propose is that good code is like an easily navigable map. Having a set of good abstractions to work with helps you create a mental topology of the program. You will have to deal with the low-level details in the routine maintenance of any program, but you will be able to start out from broad generalizations of functionality and high-level abstractions, from which the valleys of the specific details can be viewed, rather than floating in a sea of specifics with no clear relations.