Good Coding Practices

Descriptions of good coding practices are often vague, e.g. "write good comments". Or they are specific but well known, e.g. "avoid magic numbers". Below are some specific coding practices that are less well known, and that I only realised after some time.

You need a regression test suite

You need a regression test suite. You need to be able to run it with a single command. You need to run it frequently, preferably at least before every commit. You need to maintain it.

If you don't:

Without a regression test suite, when someone breaks some code, it won't be discovered until after the change is committed, possibly well after. In which case it won't be clear which commit caused the breakage, you won't know whose fault it is, and you won't have a clear starting point for determining the problem. In contrast, with a regression test suite that demonstrates the breakage, the broken code probably won't be committed in the first place.

The regression tests define your program's envelope of known behaviour. If you have a good regression test suite, you can make any change and if all the regression tests pass, you can commit with confidence knowing that the new version is just as good as the old version, because it is still within your envelope of known behaviour. If you add a new test at the same time, you can be even more confident because you have tightened your envelope of known behaviour.

Writing and maintaining a regression test does take time and effort. But that time and effort will be paid back many times over later on. And it's amazing how much confidence a good test suite gives you that your program works.

If that still doesn't convince you, here's another way to look at it. What is a bug? It is a deviation between a program's expected behaviour and its actual behaviour. What defines the expected behaviour? In theory, it's a comprehensive specification written before coding began, but few real programs have such a thing. But a good regression test suite serves as this specification. Even better, it's a specification that can be automatically checked.

Some good advice:

Run your tests often.
Don't let them get stale.
Rejoice when they pass.
Rejoice when they fail.

Coding for Testability

In order to automatically test your code, it usually must be deterministic. This requirement can (and should) change the way you write your code.

For example, I once wrote a memory profiler. It produced graphs that showed a program's memory consumption. The x-axis on the graph was time, measured in milliseconds. This meant the output wasn't deterministic, so I couldn't automatically test it with a regression test suite, and as a result it was quite buggy.

Much later, I rewrote it. In doing so, I added a command-line option that controlled the time unit used for the x-axis. One of the possibilities was to use the number of allocated/deallocated bytes as a time unit. It's not a useful option for most users, but it makes the output deterministic, and therefore I was able to write a thorough regression test suite for this version of the profiler. (The test suite cannot test the code responsible for the millisecond timing, but it can test the other 99% of the code.) As a result of this change, the new version is much more reliable than the old version.

Code is a historical record

Code is read by both machines and humans. For humans, it also acts as a historical record. Keep that in mind.

An example: Often I see a commit with a log message that lovingly explains a small change made to fix a subtle problem, but adds no comments to the code. Don't do this! Put that careful description in a comment, where people can actually see it. (Commit logs are basically invisible; even if they are auto-emailed to all developers, they are soon forgotten, and they don't benefit people not on the email list.) That comment is not a blemish but an invaluable record of an unusual case that someone didn't anticipate. If the bug-fix was pre-empted by a lengthy email exchange, include some or all of that exchange if it helps.


There are two kinds of comments: high-level comments, and local comments. High-level comments are typically at the top of a module, and give an overview of how the module works. They can be invaluable.

Local comments can also be invaluable. The best local comments are those that describe something that is not obvious from the code, something that might surprise somebody. In particular, if some code looks like it could be simplified, but there's some non-obvious reason why it cannot, that code should have a comment on it explaining why. This ties in with the point above about code being a historical record -- code that has had subtle bugs fixed is often code that looks like it could be simplified.

It can be difficult to write really good local comments when you write new code, because you understand it too well. It's easier when you forget how the code works, and then come back to it months (or years) later and have to re-understand it. That's when you realise what should be in the comments. So: improve the comments then. Then when you have to come back yet again in the future, it will be easier. It will also help others understand your code.

Also, I try to write comments that are complete sentences, because they tend to be more comprehensible. For example, "This check makes sure the addition didn't overflow." is easier to understand than "check add didn't overflow".

Be considerate

The most important advice: be considerate. To those who will read and maintain the code afterwards. Think about what your code will look like to someone who doesn't know how it works. And remember that this person may be you in six months' time.