The Art of Abstraction

Written on 2019-06-17

“Software's Primary Imperative has to be managing complexity”, says Steve McConnell in his book on software construction. In the first article of this series, I already said that reducing complexity makes software simultaneously more reliable, understandable, and extendable. Now, we are going to take a look at how that is possible.

The key concepts here are encapsulation, abstraction, and modularisation. Unfortunately, these terms are used semi-interchangeably in the literature, even though they have slightly different connotations. For the purposes of this article, I'm going to refer to encapsulation as the overarching principle, with the other two as more specific applications of the basic idea. Let's look at each in turn.



“Simple is better than complex. Complex is better than complicated.”

The diagram above and the accompanying quote from the Zen of Python illustrate what encapsulation does. Encapsulation is all about taking a complicated system (center) and splitting it up into a group of maximally self-contained subsystems (right).

There are two main reasons for doing this:

  1. Each subsystem can be designed, developed, and debugged in isolation. You don't have to keep the entire program in mind while working on any one part.

  2. Subsystems can be considered as “black boxes”: as long as you know what they do, how they do it is irrelevant. This means you can think about the entire program without having to keep in mind all the details of each individual component – it helps you see the big picture.

It should be fairly obvious how this makes thinking about a large system much easier. It is also worth noting that encapsulation can take place at multiple levels, such as the function, the class, or the package.

(Another positive side effect of encapsulation is that it makes code reuse easier. Ideally, you may even be able to take an old subsystem/class/function and copy-paste it straight into a new project.)

So how do you go about splitting up a system?


The first way is to cut it up into layers. This “vertical encapsulation” is usually referred to as abstraction.

It is something we do every day. Consider the sentence “Flocks of pigeons flew over the city.” This statement treats a flock as a single object (although it's really composed of hundreds of individual birds), as it does the city (even though this too is composed of hundreds of buildings). The word “flock” abstracts the concept of a group of birds into a single term, making it easier to talk about. If you're a molecular biologist, the word “bird” may in itself already be an abstraction for a conglomerate of different organs, tissues, and cells – and so forth back to first principles.

But it is rather cumbersome to talk about “a loose association of avian organisms in powered aerial motion over a municipal collection of buildings”. In natural language, as in programming, you want to be using an appropriate degree of abstraction. Programming is about solving problems, and problems are easiest to think and write about if you do so at the highest possible level of abstraction. Therefore, abstraction in programming is all about hiding the implementation details.

Consider the following diagram of the functions in a simple graphics library:


I wrote this library a year ago to help some colleagues of mine who were using LED panels to investigate butterfly navigation behaviour. The library itself (inside the bounded box) consists of a collection of shape functions (upper section), which are implemented using two core drawing functions (lower section). Most user code only has to use the extended functions, and the extended functions only ever call the core functions.

This is good for users of the library, because they can simply combine shape objects, and don't have to worry about the nitty-gritty, like how to draw a smooth circle on a gridded screen. But even better, this design makes the library highly portable. If you need to support a new type of display, literally the only thing you have to change are the core functions – because they are the only parts of the code that need to know anything about the hardware. Everything else will run just as it always has.

(In fact, I developed this library with a character-based screen, added support for the LED panels, ported it from Python to Common Lisp, and have just been asked to adapt it to a new generation of LED panels. Abstraction made it easy.)

One of the most advanced forms of building abstraction layers is to actually write your own computer language, targeted right at your problem area. (These are known as Domain-Specific Languages, or DSL.) I have used this once – to implement a text-adventure game, I created a simple language to describe game worlds in. In the right circumstances, DSL can be a powerful technique that makes for highly succinct and readable top-level code.

So, to sum up this section: the idea is to build “abstraction barriers” by hiding the implementation details. When each level of code only has to know about the level just below it, you can solve your problems more easily because you are working at the highest level of abstraction.


If abstraction is vertical encapsulation, then modularisation is encapsulation done horizontally. In this approach, the components of a software system are split up and grouped according to the tasks they perform.

The aim here is to make sure that code that is doing a similar job is close together. This means that it is quick to find as well as easily extendable, and even replaceable.

For example, it is common to have subsystems for file I/O (including things like logging or error handling), network code, the user interface, or the actual application logic.

One popular design concept is known as Model-View-Controller, illustrated here in the simplified class diagram of my simulator Ecologia:


The basic idea here is to keep the user interface (a GUI in this case) strictly separate from the logic. (“Logic” in this context refers to the rules governing a program's behaviour, in this instance the ecological entities and relationships that are being modelled.) To do so, there are three packages involved: model, view, and controller. controller starts the program and initialises both model (the logic) and view (the user interface). It also offers some code that is needed by both sides – e.g. the EcologiaIO class. Lastly, it serves as an information exchange. Using the central data repository World, it passes user commands from view to model and information about the state of the simulation back the other way. Because of this set up, these two packages are actually not at all directly connected. Each only has to deal with the controller.

Separating model from view like this prevents the program code becoming an entangled mess of interleaved logic and interface code, which is very hard to read and understand. It has the added benefit that if you ever want to change the interface type (for example to a web or a commandline interface), you simply replace the view package and you are good to go. Nothing needs to be changed anywhere else.

The strict separation enforced here is a good example of the “loose coupling” concept, which states that there should be as few interactions between subsystems as possible. (Remember, encapsulation is about “maximally self-contained subsystems”.) Connecting two subsystems only on a “need to know” basis is a requirement for the first advantage of encapsulation mentioned above: the ability to consider a subsystem in isolation from the rest of the program.

One last word about modularisation and coupling. One form of coupling that should be avoided if at all possible is global data (i.e. variables that are accessible from everywhere in the code). Global data is bad for at least three reasons. First, it often means that a lot of the code has to know implementation details (e.g. that a configuration table is stored as a dict). If the implementation ever changes, you are going to have to manually change a lot of code in a lot of places. Secondly, functions that depend on global state can be a right devil to test and debug, because global state can change in unexpected ways at runtime. (This gives rise to one of the key tenets of the Function Programming paradigm.) And thirdly, programs using global data have to be very careful that this is always manipulated in the right order, and never in an invalid manner.

Of course, there will often be things that seem like they have to be known by every part of the code. Often, however, this can be avoided by passing these objects, or the relevant parts of them, down to the desired function via that function's parameter list. Alternatively, one can “hide” global variables behind access routines in a separate subsystem. This abstracts away the details of the implementation, and also helps to control how and when the data can be changed.

If you're tempted to use global data, remember McConnell's injunction: “The road to programming hell is paved with global variables.” (“Although,” as the Zen of Python goes on to say, “practicality beats purity.”)

Read on

  1. Principles of Software Development

  2. Understandable Software

  3. >> The Art of Abstraction <<

  4. Dealing with Errors

  5. Programming Tools: Languages

  6. Programming Tools: Paradigms

  7. Developing in a Team

  8. Final Thoughts

Tagged as computers, programming

Unless otherwise credited all material Creative Commons License by Daniel Vedder.
Subscribe with RSS or Atom. Powered by c()λeslaw.