Written on 2019-06-24
The Jargon File defines programming as: “A pastime similar to banging one's head against a wall, but with fewer opportunities for reward.” Every programmer knows the frustration of looking for bugs that just won't be found. In fact, the majority of a software's development cycle is usually devoted not to the original writing, but to the subsequent debugging. Somebody who is good at finding and fixing mistakes therefore not only produces more reliable code, but is also a more efficient developer. So what techniques can we use to find bugs, or, if possible, prevent them occurring in the first place?
The good news for computational scientists is that finding software errors has many parallels to something we all know how to do: scientific research. Both begin as you notice a strange phenomenon. Looking at this phenomenon, you formulate a hypothesis as to its possible cause. The hypothesis is tested, evaluated, adapted, or replaced, until you understand the causal mechanisms underlying your original phenomenon.
The scientific method as described above is of prime importance when looking for bugs. Although it can be tempting to dive in, modifying the program more or less at random, this never a good idea. (Said technique is known as “shotgun debugging” and frequently ends up creating more bugs than it solves…) Basically: don't guess! Understand the problem, then solve its cause.
Before you can fix a bug, you first have to find the code that causes it. This is best done with a divide-and-conquer approach. Analyse your program methodically, eliminating sections as you prove that they cannot contain the error. Iterate until you have pin-pointed the exact class/function/line/statement that is the problem.
In some nasty cases, you may have to stabilise the error before you can find it. Not all bugs are always apparent – some only manifest themselves on a specific configuration of the input data or the system state. If you manage to reliably reproduce such a bug, you've often gone a long way towards nailing it down.
Perhaps the best-known debugging aid is a humble print statement. Especially in very small programs, all that may be needed to locate an error is to insert strategic print statements telling you what part of the program is currently executing, or the value of certain variables.
A slightly more advanced version of this is to utilise a log file. Instead of printing debug statements straight to the screen, have them saved to a file. This makes them easier to work with afterwards and can also be used when a program is not run from the terminal. Actually, good logging practice can sometimes help diagnose an error without having to touch the code at all. (Side note: A good idea for logging is to introduce “verbosity levels” to govern how much information is emitted, depending on some user setting. Common values are: no logging/only errors/basic runtime information/extended debugging information.)
The most advanced debugging tools are known as (surprise surprise) debuggers. These are programs, usually language-specific, that can be used for in-depth inspections of other, running, programs. For example, one can use them to set break points at which program execution should pause. Using the debugger, one can then inspect the values of all current variables, or “step through” the continued execution of the source code until one finds the misbehaving statement. Thankfully, many modern development environments come with such capabilities out of the box.
Generally, preventing errors in the first place is much more efficient than having to find them after they've crept in. One concept that is important in this respect is that of defensive programming.
The basic idea here is to actively ensure that things are as they should be. Oftentimes, program crashes are caused by things external to the code itself: bad user input, a file that doesn't exist, a file that exists although it shouldn't, a network connection that has been interrupted. In defensive programming, the software checks for these possibilities before relying on the external resource. In a classic example, code that carries out some calculation with user input should check that what the user inputted was actually a number.
If a problem has been detected, the program then needs to decide what to do. Should it try to fix the problem by itself? (For example by substituting a known valid value, or by waiting for the connection to come back.) Or should it inform the user and either continue with the problem or quit completely? Which route the developer chooses to take depends on the kind of problem that is encountered, but also on the type of software. In some applications, it is of prime importance that a program be correct, i.e. that it never produces a result that is not spot on. In other applications, it may be more important to ensure that the program keeps running, even if things aren't perfect – i.e. that it is robust.
A second habit that helps reduce the number of errors is that of refactoring. Refactoring is about improving the quality of working code. As you work on your program, you should make it a habit to leave the code in a better state than you found it. If you see code duplication, move the duplicates into a single function. If you see ugly or difficult code, make it more readable. If you see global variables, hide them behind access routines. And so forth…
The last big thing to do to decrease your bug count is to anticipate that you will make mistakes, and to test your software accordingly. (This is a huge topic, I will only skim it very briefly here.)
At the very least, you should test every component before you add it to your system, and then test the system as a whole. The easiest way of testing components is by using a REPL (read-eval-print loop). This is an environment provided especially by interpreted languages that allows the programmer to interactively execute code. For example, you can write a function and load it by copying it into the REPL. Then you can set about testing it. Does it produce correct output with typical input values? What happens when the input isn't typical? (Too large, too small, not present at all…) How do external conditions affect its working? Once you are confident the function does what it's supposed to, you can move on to the next one.
This is a very quick method of developing, and one well suited to exploratory programming (where you don't necessarily know the end result yet). However, as the stakes increase, your testing procedures have to become more thorough. Instead of the ad-hoc approach of the REPL, it is good practice to write some functions explicitly to test other functions – these are called unit tests. The more important correctness is for your program, the greater the percentage of it that should have tests written for it.
Over time, the test functions accrue into a testing framework, which should be run regularly to ensure recent changes haven't broken anything (a procedure known as regression testing).
In fact, there is a large school of thought that advocates writing your tests before you write your actual code. This test driven development (TDD) helps catch errors earlier and has the added benefit of getting developers to think more rigorously about the code they are going to write.
One final thought to keep in mind during all stages of the development cycle is that some areas of a program's code tend to be more error-prone than others. I'm not entirely sure why this is (perhaps they are the most complex), but numerous studies in industry attest to the fact. So it is probably a very good idea to try and identify these critical sections in your own code and pay special attention to them.