March 21, 2014

...Learn TDD with Codemanship

Software Correctness - How For & Why They?

In recent weeks, this has been coming up regularly. So I think it's probably time we had a little chat about software correctness.

I get a sense that a lot of younger developers have skipped the theory on this, and I felt it would be good to cover that here so I can at least point people at the blog. It's sort of my FAQ.

First of all, what do we mean by "software correctness" (or "program correctness", as us old timers might call it)?

To risk being glib, a program - any executable body of code - is only correct if it does what we expect it should, and only when we expect that it should.

So, a program to calculate the square root of a number is only correct if the result multipled by itself is equal to the input. And we might expect that such a program will only work correctly if the input isn't zero or less.

Tony Hoare's major contribution to the field of software engineering is a precise definition of program correctness:

{P} C {Q}

C is the program. Q is what must be true after C has successfully executed. (The outcome.) And P is precisely when we can expect C to achieve Q. In other words, P describes when it is valid to invoke the program C.

e.g., {input > 0} Square Root {result * result = input }

These three elements, pre-condition, program/function and post-condition taken together are called a Hoare Triple. As strikingly simple as this definition of program corretness is, it's turned out to be very powerful logic, forming the basis of a great deal of what we think of as "software engineering".

To see how this translates into something practical, let's take a look at an example:

This is a simple algorithm for calculating the average price of a collection of houses. What might the Hoare Triple look like for this simple program (using pseudo-code)?

{ count of house > 0 } average() { result = sum of house prices/count of house}

You'll notice probably straight away that the outcome could form the basis of unit test assertions. Many of us are already testing post-conditions in this way. So, hooray for our side.

But what about the pre-condition? What does it mean when we say that the house array must not be empty for the program to work?

We have three possible choices;

1. Change the post-condition to handle this scenario.

In other words, the method average() will always handle any input.

For example, here I've changed the initialised count of houses to 1 if there are zero houses. That way, even if the array is empty, the total will always be divided by at least one. (And the average for an empty array will be zero, which kids of makes sense.)

2. Guard the body of average() from inputs that will break it.

This approach is called Defensive Programming.
Bear in mind now that calling code will need to know how to handle this exception meaningfully.

3. Only ever call average() when there's at least one house.

This approach is called Design By Contract.

It puts the responsibility on the calling code to ensure it doesn't break the pre-condition when invoking average().

Typically, developers practicing DbC will use assertions embedded in their code, the checking of which at runtime can be switched on and off, so we can have assertion checking during testing but then switch them off once we're happy to release the software. The distinction between failing assertions and having our code throw exceptions when rules are broken is very clear: in Design By Contract, when assertions fail it's because our code is wrong!

The advantage of DbC is that it tends to allow us to write cleaner, simpler implementations, since we assume that pre-conditions are satisfied and don't have to write extra code to handle a bunch of extra edge cases.

Remember that in strategies 1. and 2., handling the edge case is part of the program's correct behaviour. In DbC, if that edge case ever comes up, the program is broken and needs to be fixed.

The important thing to remember is that, whether it's handled in the post-condition (e.g., average price of zero houses = 0), whether it's guarded against before the body of the program/function, or whether it's forbidden to invoke methods when pre-conditions are broken, the interaction between client code and supplier code (the caller and the callee) must be correct overall. It has to be handled meaningfully somewhere.

When it comes to assuring ourselves that our code is correct, there's a cornucopia of techniques we can employ, ranging from testing to mathematical proofs of correctness (often by proving it correct for one case, then for all N+1 cases by induction).

But, regardless of the technique, a testable definition of correctess gives us the formal foundation we need to at the very least ask "What do mean by 'correct'?", and is the basis for almost all of them.

I'll finish off with one more example to illustrate. Let's think about how our definition of program correctness might be exploited in rigorous code inspections.

Revisit my early implied definition of "program"; what am I really saying? When it comes to testing and verification - especially inspections - I consider a program to be any chunk of executable code, from a software executable, down to individual functions, and even individual program statements and expressions. To me, these are all "programs" that have rules for their correctess - pre- and post-conditions.

If you're using a modern IDE like Eclipse (okay, maybe not that modern...), your refactoring tools can guide you as to where these units of executable code are. Essentially, if you can extract it into its own method (perhaps with a bit of jiggery pokery turning multiple return values into fields etc), then it has pre- and post-conditions.

Just for illustration, mainly, I've refactored the example program into composed methods, each containing the smallest unit of executable code from the which the complete program is composed.

Theoretically, everyone of these little methods could be tested individually. I'm not suggesting we should write unit tests for them all, of course. But stop and think about the granularity of your testing practices. In a guided inspection, where we walk through the code, guided by test inputs, we could be asking of all these teeny-tiny blocks of code: "What must this do, and when will it work? When won't it work?" You'd be surprised how easy it is to miss pre-conditions.

So there you have: the theoretical basis for the lion's share of software testng and verification - software correctness.

Posted 8 years, 3 months ago on March 21, 2014