Written on 2019-06-03
Having begun our series on software development with a broad look at the basic principles, let us now get down to the nitty-gritty. We said the three key aims of a developer should be software that is reliable, understandable, and extendable. As the second of these is probably the easiest, let us start with that. So how do you write software that is easy to understand?
Well, really it's very simple. All it takes is some good habits in three areas: code layout, code structure, and code documentation. Let's have a look.
Code layout is all about the visual appearance of your code on your screen. Hard-to-read code is often obvious at first glance. Likewise, eminently readable code has a certain elegance to its structure that is visible even before one has actually read it.
The most important thing here is proper indentation. (This won't be news to anybody who's been programming for a while, but I regularly have to point it out to students in our programming classes, so I'm including it anyway.) Indenting each line of code so that it is one tab (three or four spaces) further in than the opening line of its block makes a world of difference to legibility. That way, the reader knows at a glance what block the current line belongs to, like in this Java example:
// The function header is on the top level
public void longMultiply(int a, int b) {
int result = a; // now we are inside the function block
while (b > 0) {
result = result + a; // and this is the inner loop block
b = b - 1;
}
return result; // now we're back to the function block
}
It gets a bit more complicated when we come to the subject of line breaks, such as when you have really long and complicated conditions in an if-clause. Here, you can use a combination of parentheses and line breaks to show which expressions belong together. Fortunately, almost all modern IDEs and editors have good indentation behaviour built in by default, so this really isn't hard. The key idea to remember is that layout should show logic.
Apart from indentation, there are a few other things to keep in mind. One is to limit the line length. Some projects have a strict line length limit of (traditionally) 80 characters. You don't need to be religious about this, but lines longer than 100 characters do become a lot harder to read.
Also, be liberal in your use of white space (spaces, tabs, and empty lines). Dense code is tough to wade through, so give your reader some breathing space. Of course, one can also overdo it, but you'll learn to find a good measure.
The principle way of structuring code is the use of functions. For some reason I have yet to understand, many biologists seem to be reluctant to put functions in their code – to their own detriment.
Functions are wonderful things. They enable code reuse, so you don't have to
type that list of commands over for each analysis you do – you simply stick it
in a function and call the function. Plus, they aid legibility. What's easier to
understand in a source file, eight lines of gobble-de-gook R code or a single
call to doANOVA()
?
Functions are most useful when you keep them short. They help reduce complexity by breaking down a long program into small, manageable chunks that are easy to understand. If you need to scroll to finish reading your function, you've basically lost that advantage. An ideal length is probably no more than 20 lines. Short functions are easy to read, easy to understand, easy to test. So break up that 70-line-monster.
If you find your project growing well beyond the 500 lines-of-code mark, you should also seriously start to think about splitting it up into multiple files. Moving around inside very long files is cumbersome, and remembering what is where becomes increasingly more difficult. Better to group related functions in separate files. This makes it much easier to develop a mental model of the whole program, and finding code becomes quicker, too.
Documenting your project sounds like a lot of work, but actually, it starts with
the small things. Proper variable names, for instance. Calling your integer
no_carnivores
instead of nc
, or using expressive function names like
competeHerbivores()
is really helpful. In a perfect-world ideal scenario,
you won't even need comments, because your code will be so clean and readable
that it's self-documenting code.
Of course, we don't live in a perfect world, so comments are necessary nonetheless. At the minimum, each file, class, and function ought to have a brief header comment describing what it does. (Some trivial functions can be skipped, but that should definitely be the exception.) You can also use comments as section headers, dividing up the functions in your files or the statements in your functions into subgroups. In some cases, you may also have some difficult code that needs to be explained in plain English. However, you should first try to see whether you can rewrite the relevant section instead, so that it doesn't need to be commented anymore. (Usually, it is possible…)
Remember that comments should state intent, not mechanism. I can see what the
program does by reading the code, I don't need the comment to repeat that for me.
What the comment should tell me is why the code does what it does. I can see
that the code checks whether ttl == 0
, what I need to know is that this is
used to avoid an infinity loop when an animal is searching for a free spot on a
full landscape.
Comments are great when one is trawling through the source code itself, but when
you first encounter a software project, you first want to get a higher-level
perspective. Good practice is to include a text file called README
in the top
project directory. This can either be a brief explanation of the project in
itself, or contain pointers to the rest of the documentation. Another good
tradition in the Open Source world is to include a second document called
HACKING
(often in a folder called doc
), with slightly more detailed
information about the architecture and conventions of your software. (The bigger
your project, the more useful this becomes.)
Lastly, having online/HTML documentation can be a big bonus. Many languages have
libraries for generating this kind of documentation from the comments in your
source code. (Some, like Java's javadoc
, are already built in.) This will give
you fancy-looking, interactive doc pages detailing every function in every file
of your project – at virtually no extra cost to you. After all, you're already
commenting your functions, aren't you? 😉
Tagged as computers, programming