October 8, 2014

...Learn TDD with Codemanship

Programming Well (Part II)

This is a short series of blog posts aimed at teachers and students who are interested in progressing on to more challenging programming projects. Here we explore the disciplines programmers need to apply to make it easier to add new features and change existing features in our programs, which ultimately makes it easier for us to incorporate our learning and make our programs better.


In my previous post, we looked at why it can be very helpful to be able to test (and re-test) our programs very frequently, and how automating those tests can enable this.

We also looked at why it's very important to write code that other programmers can easily understand.

In this post, we'll talk about how we break down our programs into smaller units, and how we can organise those units to make changing our programs easier.

3. Compose programs out of small, simple pieces that do one job each

Most modern programming languages give us mechanisms for breaking down our code into smaller, simpler and more manageable "chunks". This enables us to tackle more complex programming problems without having to consider the whole problem at once, and it also enables us to make chunks of our code reusable, so we don't have to keep repeating ourselves, and ultimately we write less code.

This principle is applicable at multiple levels of code organisation:
* Blocks of code can be broken down into functions that do a specific job
* Groups of related functions - functions that are used together to do a job - can be split out into their own modules
* Groups of related modules - modules that are used together for a purpose - can be packaged together into a unit of release (e.g., an executable program file, or a code library)

If our goal is to make change easier, breaking the code down into cohesive units like this - units of code that are used together, and are likely to change together - helps to localise the impact of changes.

We balance the need for cohesion in functions, modules and packages with our need to not repeat ourselves, reusing code whenever we get the chance to minimise how much code we have to write.

And it's important that each unit of code has only one reason to change. If I write a function that does two distinct jobs, I lose the ability to change how one of those jobs is done without affecting the other. I also lose the ability to reuse the code that does each of those jobs independently of the other.

Let's take a look at a coupe of specific examples, one at the function level, another at the module level:

In this example, the Java function has two distinct jobs: it reads the scores from a file, and then it calculates the total score.

This presents us with several potential barriers to change.

Firstly, this is harder to test. If I write an automated test for this function, it could fail for a whole bunch of reasons (wrong file path, data in file not formatted correctly, total calculated incorrectly, and so on.) So if anything breaks, I'm probably going to end up in the debugger trying to figure out what actually went wrong.

It also prevents me from changing how either of these jobs is to be done without potentially affecting the other. It's all in the same function, and functions either work as a whole or they don't. Break one part, you break all of it. Ideally, functions should only have one reason to fail.

It's also a barrier to reuse. Maybe when I want to calculate a total, I don't always want scores to be read from a file. Maybe when I want scores read from a file, it's not always because I want to calculate their total. I might want to reuse either of these chunks of code independently of the other.

ON DUPLICATION OF CODE: Reuse is how programmers remove duplication from their code. Instead of, say, copying and pasting a block of code when we want new code to do something similar, we can create a function that accepts parameters and reuse that instead, and we can group functions together into reusable modules, and modules into reusable packages, and so on. The benefit of doing this is that wehn we need to make a change to duplicated logic, we only need to make that change in one place. If we copied and pasted, we'd have to change it everywhere we pasted it - effectively multiplying the effort of making that change.

When I refactor this code so that each function has a distinct responsibility - only one reason to fail - we can see how the code becomes not only easier to test, but easier to change independently and to reuse independently, which makes accomodating change even easier going forward.

In this refactored version of the code, these two distinct jobs of reading the scores from a file and adding up the total are delegated by the original function to two new functions, each with a distinct responsibility.

The function totalScore() is now just composed of calls to these other functions that do the actual work. totalScore doesn't know how the work is done, and reading scores and totalling scores know nothing about each other. As we'll see, composing our program in such a way that all the different bits - all playing their particular role - know as little about the details of how each other does their work becomes very important in making change easier.

Also notice that the two comments in the original function have disappeared? I took the opportunity to incorporate the useful information in the comments into the new functions I created. The original function now pretty much just tells the story that the comments told us. This is an example of how breaking our program down can also help us to make it easier to read and understand.

Which brings me to our second example, and...

4. Hide the details of how work is done, and make it so we can swap those details easily

In our refactored example above, the details of how scores are read from a file, and how those scores are totalled, are hidden from the totalScores() function. This gives us a degree of flexibility, allowing us to compose our program in different ways (e.g., we can total scores that weren't read from a file, or read scores for some other purpose.)

This flexibility - the ability to compose programs in differet ways while making only small changes to the code - can become a very powerful tool for making change easier. The more ways there are to compose the working units of our program, the more things we can make our program do with less effort.

In object oriented programming, using languages like Java and Python, we can go a step further in achieving greater flexibility.

In our example, even though we now have reading scores and totalling scores neatly separated, there is still rigidity in our code. What if I want to read scores from different sources? Let's say, for example, that our program is intended to be used in different settings - sometimes a desktop application that stores the scores in a file on the hard drive, sometimes a web application that stores the scores in a database like MySQL or Microsoft Access. How could we organise our code so that we can actually choose our strategy for storing scores without having to change a lot of the code?

Refactoring the code further, I can move the function that reads scores into its own class (an object oriented module from which we create instances - "objects" - that do the work), and the function that totals the scores into another class. Each class has a distinct job - a single responsibility.

Then - and this is the key to this trick - I set up an interface for each of those classes, and make it so that our RobotScoreboard only knows about the interfaces. I do this by passing in instances of the reader and the totaller in the constructor - a special function that we call when an object is created to properly initialise it before can use it - referencing only the interfaces when I do. So RobotScoreboard has no knowledge of exatly what kind of object will be doing the work. All it knows is what functions it will be able to call on these objects, which is all the interface defines.

If I wanted to use my RobotScoreboard with scores stored in a MySQL database, I write a new class that implements the same ScoresReader interface, and pass an instance of that into the constructor instead.

This gives me the ability to substitute different implementations of ScoresReader and Totaller without having to make any changes to RobotScoreboard, and to compose the program in many different ways.

This is something that's very difficult to do in programming languages that don't have a built-in mechanism for making it possible to subsitute one implementation of a function or a module with another without changing the code that uses those them. Which is why we very highly recommend that on more challenging programming projects, you use languages that support what we call polymorphism (that is, the ability for functions or modules to look the same from the outside but work differently on the inside).

DESIGNING FOR TESTABILITY: Another thing worth noting in this example is that, by introducing this extra level of flexibility, it now potentially becomes easier to write unit tests for our RobotScoreboard. Recall from the previous blog post how important it can be that we test our program as frequently as possible. For this to be possible, our tests need to run quickly. Reading data from files, or from databases, takes considerable computing resources, and tests that read files or depend on databases tend to run slowly. In our refactored version of the code, it now becomes possible for me to write a fake implementation of the ScoresReader - one that just builds a list of scores in program code, rather than reading it from a file - purely for the purposes of our tests. Making our programs easier and quicker to test automatically is another great benefit of organising our code in this sort of way.

Inexperienced programmers often have a problem with organising their programs into small, reusable, subsititible chunks like this. They see lots of new little functions and new modules and say "It's more complicated". But if you look closely, at the individual "chunks", you see that they are each simpler. when we lump all our code together into gib functions that do lots of things, and big modules with many, many functions, we lose a great deal of flexibility.

The key is to keep the individual parts as simple as we can, and to compose our programs out of them in such a way that we can more easily swap parts with new or different parts with less impact on the rest of the code.

In the next blog post, we'll look at how we manage our code and how we can more effectively work with other programmers on the same programs.

Posted 6 years, 9 months ago on October 8, 2014