Written on 2019-07-30
Although a lot of scientists who write code usually work on their own, there will always be occasions when one becomes part of a development team. This could be because a junior colleague is joining your project (or vice versa), or because the software you are working on is so large and complex it requires the joint efforts of several people to complete. These scenarios not only make the principles we have already discussed more important (readable code, good architecture, etc.), they also necessitate a whole new set of procedures.
(If you already know about git, you can skip this section.)
The first issue you'll face is how to pass your source files around. Everybody needs an up-to-date version and you mustn't break anything if two people have changed the same file – how do you do that?
For this purpose, there is what is called version control software, the most popular of which is git. In a normal git setup, the code is hosted in a central repository. Each individual developer can check out this code (i.e. download it to his own machine) and change it at will. Each batch of changes is committed – this creates a “snapshot” of the project in the current condition. Finally, the developer pushes his changes back up to the master repository, from were others can pull them into their own local copies. Git merges all changes from all developers so that nothing is accidentally lost or overwritten. (As there are plenty of excellent tutorials out there on how to use git, I won't go into details. See here, here, here, or here for more.)
The most popular git hosting site by far is Github, which you have almost certainly heard of before. It is free and allows you to easily share your code with just about anyone. A popular alternative is Gitlab, which has the advantage that you can host it yourself onsite.
An important point to make is that you should be using git anyway, even if you're just working by yourself. The series of commits creates a great timeline to follow your project's progress, and the ability to jump back to a given commit means you've always got backups in case you screw something up. Also, sharing your work with others is dead easy if you've already got it sitting on Github.
Developing software in a team offers enormous benefits, if done right. For a start, the quality of the software itself can increase dramatically, because there are more minds thinking about it and more eyes looking out for mistakes. At the same time, the team benefits too, as its members exchange ideas and share their experience and knowledge. This effect is especially strong for junior team members, who can learn rapidly from observing the way more experienced developers work.
Over the last few decades, industry practice has identified various techniques to maximise these benefits. Two of the most efficient are pair programming and code inspections, which I will briefly outline here. Both of these have been found to have an error-detection rate of around 40-70%, which is even higher than good testing achieves. (Disclaimer: I've never actually had the chance to try these out myself, sadly. I'm mostly going by McConnell's Code Complete 2 and various online sources such as this.)
Pair programming is a bit of an unusual approach in that it entails two programmers working on the same computer. One, the driver, is the one actually typing into the keyboard. The other, the navigator, watches for mistakes and thinks ahead about what must be done next. The two partners keep up a conversation on what they are doing or thinking and regularly switch roles.
This technique is especially effective when working on difficult sections of the code. Pairs keep each other focussed, spot more mistakes and think of more possible approaches than an individual developer would. Together, they deliver higher quality code in a shorter time than alone.
However, it takes a bit of practice to get used to this style of coding. Not all pairs combine well, and not all code is amenable to the technique's specific benefits. When it works, though, it works well; and developers often enjoy the team-oriented approach.
Code inspections are a highly formalised version of code reviews that are intended to find errors during the development phase. An inspection will be chaired by a designated moderator (not the author of the inspected code!). This team member sends out a copy of the code to a small number of reviewers (about one to four) and arranges a meeting for the actual discussion. Each reviewer prepares by reading the code himself and annotating any problems he finds, based on a checklist of common mistakes/good practice. During the meeting, the code is read through section by section as the reviewers give and discuss their comments. Importantly, the aim of the meeting is not to find solutions, but purely to identify problems. Each problem is recorded and the complete list is later handed to the author for fixing.
Inspections are a very thorough tool that has proved to be highly effective at finding errors. As McConnell writes:
A study of large programs found that each hour spent on inspections avoided an average of 33 hours of maintenance work and that inspections were up to 20 times more efficient than testing (Russell 1991).
Although they might seem overdone in their formality, experience has shown that less structured code reviews usually perform significantly worse. One challenge in the inspection is to keep the discussion on point and technical. This is where the moderator comes in, as it is his task to ensure that the discussion doesn't drift off-topic or become personal (attacking the author).
– – –
Postscript. The NASA space shuttle software is renowned as being the most bug-free software ever created. Through stringent development practices, its engineering team managed to reach a rate of just one error in half a million lines of code. (Industry average is 1-25 errors in 1000 LOC!) For an instructive write-up of how they do this, see here.
(I will link the future parts of this article series here once they are published.)