September 21, 2006

...Learn TDD with Codemanship

Why Do Defects Cluster?

One of the great mysteries of cosmology was why the Universe is "lumpy". When I mix a coloured dye in with water, the dye billows and swirls and forms all sorts of interesting shapes. But eventually it spreads out evenly (at least, evenly to the naked eye) through the water. Thermodynamics gives things a tendency to get less lumpy over time. Why hasn't the Universe done this? Advanced imaging techniques show an early Universe just after the big bang that was smooth and even. Where did planets and stars and galaxies and wotnot come from?

The first part of the answer might lie in simple probability and nonlinear dynamics.



Imagine the early Universe is made up of very small lumps. Each lump has mass m. And these tiny lumps are wandering randomly around, passing a distance d away from each other. The pull of gravity between two of these lumps is determined by m and d. If the pull of gravity is big enough, they will be pulled together to form a new lump of mass 2m.

A bigger lump has a greater chance of attracting more lumps to it, forming even bigger lumps. This creates a positive feedback loop where the bigger the lump gets, the more likely it is to get bigger. I'm fond of the term statistical gravity as a way to describe this effect. It's the same mechanism that makes us more likely to see a film because so many other people have seen it . (or to buy an operating system because nearly everyone else is using it...)

A lump of mass 3m is less likely than a lump of mass 2m, and a lump of mass 10,0000,0000,00000,0000,0000m is extremely unlikely indeed. But with enough lumps (enough throws of the dice, if you like) such huge lumps can and will appear.

Where the very little lumps came from in the first place is another tough question. One theory relies on the wierdness of Quantum Mechanics to allow enough random splodges of mass-energy to appear just long enough for our particle soup to be possible. Like the origin of life, once the process gets going there's really no stopping it. The jury's still out on how the process got going in the first place, though. (My money's on the random quantum splodges theory.)

What I like about this model is that it's jolly helpful in explaining why code defects are also "lumpy". Putting aside where the original defects came from in the first place, let's think about the process of fixing defects. When I fix a defect, I change some code. Whenever I create or chage code, there's a probability that I will introduce a new defect (or three). When I fix these new defects, I change more code, which creates a higher probability of more new defects. And there we have our positive feedback cycle - in this case, a vicious circle - that explains why defects might tend to cluster in lumps unevenly distributed throughout the code.

There are other factors that make intriducing defects more likely in the first place, like the complexity of the code, for example. But even if every part of the code was identical in those qualities, we would still get defect clustering because of statistical gravity.
Posted 14 years, 1 month ago on September 21, 2006