June 18, 2007

...Learn TDD with Codemanship

Brownfield or Greenfield? - A Proposed Formula

More thoughts on the governance problem of whether to build on existing code or start again from scratch. I think I may be able to simplify the problem and isolate a handful of key variables that could be used to guesstimate which approach might yield more code after a fixed number of iterations.

Okay, first the caveats:

1. This is a very simplified model
2. More code != more value (well, not necessarily, anyway)

But in an imaginary scenario where 2 teams would write roughly the same code, one team starting from scratch and the other building on existing code, then it is a valid comparison of the relative output of each team. (Well, probably...)

Now brace yourself, because what follows can only be described as a mathematical formula. I know that most managers - and Agile coaches, too - run screaming for the hills when they see mathematical symbols, decrying it as rocket science, witchcraft, voodoo and such like.

So, I apologise in advance for those of you whose sensibilities are offended as I unveil:

This is probably going to need some explanation.

Firstly, the amount of reused code is not a measure of how much legacy code we start with. It's a measure of how much legacy code remains unchanged after we're done.

How can we know this in advance? Well, one way to put a rough figure on this is to do a % change comparison between the functionality of the proposed system and the functionality of the existing system. Brace yourselves for yet more metrics nonsense, as I propose function points - yes, function points! - as a means to achieving this. Okay, I can see how agitated function points makes you. Let's call them "story points" instead, shall we? Is that better?

Okay, so let's tot up the total story points for the existing system - either by going back through your planning estimates, or (if you didn't do any planning) by jotting down key system test scenarios and giving them story points.

Now let's tot up what percentage of those storuies will change in the new version of the system, and - if we're being especially clever - to what extent.

So, if I have an application with 346 story points in total, and 125 story points-worth of that will have to change, then I'm left with about 221 story points-worth of unchanged code - or about 64%. Well, maybe. If that application has about 100,000 lines of code, we might reuse about 64,000 lines. That's the ballpark we're probably in.

Secondly, productivity drops as code gets more complex. It's a simple as that. Despite all the Agile community's protestations about setting an indefinitely sustainable pace, the reality is that even the best teams who take the greatest care over their code still find that - over time - the going gets tougher. This is inevitable. Just as, no matter how much care we take of ourselves, we all grow old and eventually die.

But some teams - those teams that get into good habits like test-driven development and refactoring - create code that ages less rapidly. The rate at which their code ages, and at which their productivity falls as time passes, is a vital component in this calculation.

To illustrate how this might turn out in a couple of scenarios, let's feed in some data. Let's split into Team Babbage and Team Gorman.

Team Babbage decide to build on existing code, reusing about 30,000 lines. Their initial velocity - measured (please forgive me) in Lines of Code - is 250 per iteration. And their velocity drops by 5% in each iteration, because they are crap and don't put much effort into maintaining code quality.

Team Gorman decides to start with a clean slate. As a result of having no legacy nastiness to contend with, their initial velocity is much higher - 1500 LOC per iteration. And, because any team with the name "Gorman" in it is going to invest considerable effort into code quality, their productivity drops at the much more genteel rate of 2.5% per iteration.

Compare the progress made by both teams after 10 iterations:

Team Babbage After 10 Iterations

Team Gorman After 10 Iterations

In a fair fight, where the ratio of lines of code to story points is equal for both teams (although, in reality, Team Babbage - who don't refactor - are probably introducing more duplication into their code), after just 10 iterations Team Babbage come out clear leaders, and therefore the decision to reuse was credibly justified.

Team Babbage After 50 Iterations

Team Gorman After 50 Iterations

After 50 iterations, though, Team Gorman come out the winners - by a mile! So, in this instance the decision to start from scratch - and to take more care over code quality - paid dividends.

Also, after the 50th iteration, Team Gorman's productivity is much higher and their delivery curve has still some way to go before it flattens out. Team Babbage, on the other hand, have reached a plateau where their productivity has dropped to almost zero. So not only has the decision to start from scratch yielded more features, but it has left Team Gorman in a better position to add yet more new features in iterations 51 and beyond.

The final key variable in this decision is the scope of the functionality gap between the existing code and what's required in the next version. A small gap might be best served by building on the existing software. A big gap might make starting from scratch the best option. Again, a bit of jiggery pokery using story points (or function points, god help me!) could help us guesstimate how big this gap is going to be. Is it 10 iterations' worth, or 50?

Finally, we shouldn't forget just how critical the rate at which productivity falls is to this whole equation. And therefore we should never underestimate just how critical code quality is, and the good habits that help to preserve it, in the software lifecycle.
Posted 13 years, 6 months ago on June 18, 2007