November 19, 2010

...Learn TDD with Codemanship

Bug In Mutation Testing Example & TDD Misconceptions

Ian Calvert quite rightly points out a bug that I hadn't spotted in the recursive algorithm for generating Fibonacci numbers I used as an illustration for mutation testing in a previous blog post.

The example uses Java int's, but the largest Fibonacci numbers allowed by the requirements won't fit into a signed int. I could have used long, which fixes that problem.

Ian then raised an interesting point. He was concerned that test-driving the implementation drove out a very inefficient recursive algorithm. This is a common criticism leveled at TDD.

In this particular case, though, I deliberately didn't test-drive the implementation. I just wrote it and then wrote tests that would reach all the code but that I knew would not be testing it very thoroughly.

And I'm well aware that an iterative algorithm would have been much more efficient. I chose the recursive algorithm because it's a bit simpler and easier to follow for illustration purposes.

The question of whether TDD leads to inefficient algorithm design is interesting, because it highlights a common misconception about TDD which I hear a lot from people who aren't sold on it - namely that TDD means not thinking ahead about design.

With the Fibonacci Numbers kata, I have a roadmap in my head and I have a rough idea of where I'm headed in terms of the finished solution. Sometimes I take the recursive road, sometimes I take the iterative road. I find them both interesting and useful for practicing TDD. If I was writing commercial software and had performance requirements to worry about, I would probably go with the most efficient implementation. If I have no performance requirements, then I don't much care. Simplest is best in 95% of cases, and I'm not a great one for optimising my code in the absence of performance requirements. (Although, in fairness, the iterative vs. recursive Fibonacci algorithm is an extreme example of performance disparity of implementations.)

If someone came along and said "this program is too slow, make it faster", it would be but a trifle to substitute an iterative solution, and having tests in place makes it much safer to do that.

Of course, the tests wouldn't guarantee correctness. Even for this trivial example, it would take 45 tests to cover every input in the range F(8) -> F(50) plus out-of-range exceptions. If this code was in the kernel of a control system being sent to Mars, it may be worth writing all those tests, or using a model checker, or doing a mathematical proof of correctness.

Techniques like these can complement TDD when reliability really matters. Another great technique is inspections, which I'm presuming is how Ian found the bug that hundreds of us had missed, by examining the code on GitHub.

There are also many exploratory testing techniques, and parameterised testing techniques that can help. Microsoft's Pex might have drawn out the F(49) test case that would cause an overflow error in a C# example.

I do not, and would never claim that TDD is all you need to produce reliable code. Nor do I claim that TDD is a great tool for algorithm design. There is no replacement for thinking about design in your head. The real discipline of TDD is being able to think ahead without coding ahead. Breaking the problem down into testable steps is the art of test-driven development.

Even when I have a pretty good idea what the solution will be, I discipline myself to take baby steps and get feedback along the way. I also find it incredibly useful to follow a practice that encourages me to work in very small cycles, because that creates a natural reminder to refactor and keep your code clean. When I don't test-drive my code, I find myself refactoring less frequently and my code turns out not as clean and maintainable.

It is perhaps understandable that people new to TDD or skeptical of it draw concern from the trivial (but still perfectly possible to get wrong, as my int faux pas illustrates) examples we tend to use to illustrate the practices.

The reality is, as born out by various studies and comparisons of code bases where TDD was applied vs. code bases where it wasn't (including one done where Ian works), teams that practice TDD usually see a net benefit in code quality, with the potential project benefits and business benefits that brings.

Let's be honest, Big O notation doesn't come up in commercial software development often. In large distributed systems the performance problems tend to be, well, large and distributed. Over the last couple of decades, I've learned to make the code work first and then worry about speed and efficiency. Get the tests passing and then, if you need to, optimise using the tests as your safety net. This is one of the reasons why we have a refactoring called "Substitute Algorithm".

Inefficient algorithm design is usually the least of most developers' problems, but even when it is, the fact that we wrote the tests first no more invalidates TDD than writing it in Java invalidates Java. Tests in TDD are behavioural specifications, not implementation designs.

In the case of the Fibonacci Numbers kata, I could work through the exact same list of tests and come out the other end with an iterative algorithm. 50% of the time I practice it, I do. So if the exact same tests can "drive" two different implementations, I can only conclude that there must be some other factor that leads to an inefficient implementation. The tests tell me what my implementation must do, but not how it should do it.

Wether it's recursive or iterative, the tests for my Fibonacci Number Generator don't care. Externally they exhibit identical behaviour, even if they don't exhibit identical performance. Add a response time unit test to that (e.g., it must do it 100 times in under 5 seconds) and my recursive implementation will fail.

So I put my hands up to the bug. It is indeed a big oversight and reinforces the need to complement unit testing (since in this particular case we're not talking about TDD) with other testing and verification techniques, especially when reliability really matters. Wrists duly slapped and thanks for reporting it, Ian.

The fact that TDD is not perfect or that it does not solve design problems for you does not invalidate TDD. No technique short of mathematical proof of correctness - checked and double-checked and triple-checked, of course, because proofs can also be wrong - will absolutely guarantee bug-free code every time. That code created following TDD tends to be significantly more reliable and significantly more maintainable than code that isn't is beyond dispute, and the evidence clearly indicates that we can reap that benefit at minimal or no extra cost. To me, therefore, TDD is a proven and profitable technique and has earned it's place among the other techniques, like inspections, we can confidently apply to producing good working software.

Posted 10 years, 5 months ago on November 19, 2010