June 22, 2007

...Learn TDD with Codemanship

Code Ageing - An Open Experiment

I'm always amazed at those scientists who can take away a sample of dung, and come back and tell us where the animal lived, what it ate, how hot it was during the day and often it mated.

If we consider an organism as a very complex information system, arguably we shouldn't be surprised that we can deduce so much about it just by examining its output. Information flows through an organism courtesy of its metabolism, and the lifestyle of the organism will have had a very considerable impact on that metabolism.

A computer program is another - slightly less (but quickly catching up) - complex information system that also has a kind of metabolism. Information flows through it through the action of programmers (human or machine), and I' going to argue - because I'm like that - that we might be able to tell a lot about the process of development just by examing the end product. In other words, we can make educated guesses as to what tools, techniques, practices and processes the developers used just by analysing their code.

Take code-and-fix development as an example. Could we look at 100,000 lines of code and say "there's been a lot of bug fixing going on here"? Do bug fixes leave footprints (or scars)?

Could we deduce the amount and frequency of refactoring that went on? Could we tell if the team integrated once a week, once every day or every few minutes?

Getting even more ambitiuous, could we spot an absentee customer or a wavering project sponsor? Does plan-driven development make a difference to the resulting code, and could we spot its tell-tale signature?

And if not from a snapshot of the code, perhaps from a series of snapshots showing how the code changed over time?

Who knows?

But one thing I am sure about is that it's jolly well-worth finding out, don't you think?

A good start might be to collect data about the overall software/system lifecycle. My theory about code ageing suggests a growth curve that slows down, reaching a sort of plateau quite early on in the system lifecycle - just as it is with most organisms. The shape of this curve, how and particularly quickly it tails off, might give an indication of a code-and-fix approach and/or a lack of attention to maintainability (e.g., not enough refactoring).

If we plotted graphs of code size over time, and of design quality over time, would we see a clear correlation? What is their relationship?

I have my string suspicions, but - like eferyone else - no actual hard evidence to back it up. Just personal experience and anecdoates. Boy, would I have loved to have that data to hand in the any, many heated discussions I've been in about how much attention to design quality is "too much". I bet many of you have been there, too. You want more slack for critical ongoing refactoring, but the managers just don't get the relationship between code quality and productivity. Quality always seems to lose, and as a result, productivity drops and the customer loses, too.

In my last post I used a code analysis tool called NDepend to illustate how a design principle could be enforced as a constraint on the code - creating an adaptive tension, if you like, that steers the evolution of the software towards a better quality of design.

NDepend can also be used to measure a whole bunch of code metrics relating to size, complexity and design quality. Perhaps if we got a big enough sample of greenfield projects, collecting size and quality data over a period of, say, 12-24 months, a clear and irrefutable trend would emerge that links productivity to code quality.

If you've always wanted to have that sort of hard evidence to hand, then perhaps you'd like to participate (?)
Posted 13 years, 5 months ago on June 22, 2007