January 17, 2006

...Learn TDD with Codemanship

Solution Space & Evolutionary Design

Richard Dawkins' biomorph program, outlined in his book The Blind Watchmaker has been a source of inspiration recently.

In his program, he models a simple evolutionary process in which biomorphs - 2-dimensional forms - are generated according to instructions given in a set of 9 digits that represent the biomorph's "genes". The rules of the simulation are very simple. Over successive generations, one of the 9 "genes" is allowed to vary at random, and a new generation of biomorph is born.

He illustrated how you can map the biomorphs into a grid representing all of the possible gene combinations, and the process of evolution was essentially the process of "walking through biomorph space" one grid space at a time (since each mutation only allowed one variable to change).

To model a process of natural selection, you could specify simple selection rules that decide wether a biomorph in any cell in the grid will survive and reproduce or die - effectively bringing an end to that particular evolutionary path. Picturing this in biomorph space, there would be safe cells where biomorphs could survive, and danger cells where they would die.

And this got me thinking about the process of software development. We often hear geneticists referring to DNA as a sort of computer program that carries instructions on how an organism will grow. Some organisms have successful DNA, and some do not - just like the biomorphs (only with a frighteningly larger degree of complexity).

We might think of computer programs as a form of DNA. Just as the 9 "genes" carried by the biomorphs defined a set of all possible biomorphs, organised into a higher-dimensional biomorph space, a programming language defines a set of all possible programs that can be written in that language.

And computer programs are subject to a form of selection - well, they are if we care if they're going to be useful! The rules of selection for computer programs are defined by that program's requirements. We could build any kind of program, but only some of those programs will satisfy the requirements. Within the set of all possible programs, there is a solution space that satisfies the requirements - safe cells in the infinite grid where our program does what it's supposed to do.

Test-driven development (and similarly scenario-driven approaches) can be modeled using this metaphor. To begin with, we have zero program instructions (imagine Dawkins' biomorph with all the 9 genes set to zero) and no rules for selection. Every cell in the inifite space of possible programs is a safe cell, and we are starting an evolutionary walk through that space at the origin.

We make our first move by defining one simple selection rule - one test - that defines a set of safe cells in the grid. The cell our program is currently in becomes a danger cell, because the new selection rule tells us that no code at all will fail the test. (In TDD terms, red light). So we have to adapt our program until it reaches the nearest safe cell. (Green light.

Then we add another selection rule (test) that puts our program in a danger cell again. And so we have to adapt the program until it reaches the next nearest safe cell. And so on.

Refactoring is slightly different. In refactoring, you could argue that we're not adding any new selection rules, but are simply adapting the program to find a more optimal way of being in the existing safe zone. Okay, when I think about it, we probably are adding new selection rules - but they may be rules that depend on design quality, for example, rather than program logic.

So we could view TDD - which is often referred to as a form of evolutionary design (and quite appropriately, it seems) - as a walk through solution space, incrementally removing safe cells in the grid and jumping to the next nearest safe cells.

Thinking in terms of solution space helps with another interesting problem. Let's simplify things a little and think about a much smaller space of possibilities. Let's imagine we have two dice, and they define a space of 6 x 6 possible states. Now let's create a selection rule: we want to throw a 7. This could be achieved by throwing 1 + 6, or 2 + 5, or 3 + 4, and so on. In fact, there are 6 ways of throwing a 7. The requirement to throw a 7 defines a solution space with 6 cells in it. There's only one way of throwing a 2. So the solution space for throwing a 2 has only one cell in it. We are six times more likely to find a solution to 7 than to 2.

In the average software project, we will have to find countless solutions to problems big and small, and with each there's a probability that we won't find a solution in the time we have available. If we think of a software project as a succession of throws of the dice, then our chances of successfully satisfying all of the requirements can be increased by keeping the solution space as wide as possible.

If we think of throwing a 7 as a requirement, we might think of throwing 3 + 4 as a design that satisfies that requirement. A problem I've come across in most projects is that what developers are told are requirements actually turn out to be designs. Instead of being told "we need some way of keeping customers posted about new products", the requirements specification might state "we need to send an email to the customer when a product has been added or updated in the database". This is a much more complex example of asking for 3 + 4 when what you really want is 7.

So my advice would be to keep the requirements as non-prescriptive as possible. If you're trapped in a tight solution space and you're running out of throws of the dice, a good strategy might be to revisit the requirement, take a step back, and ask "why do they want this?" (and don't forget to ask them instead of making up your own answers - but that's for another time!)
Posted 1 week, 3 days ago on January 17, 2006