April 22, 2012
Towards A More Empirical Understanding of The Effects of TDD (That Really Matter)There's been a handful of empirical studies done into the effects of Test-driven Development over the last decade.
Looking at the state of the art in this area of research, as a long-time TDD practitioner and coach I'm left feeling less than satisfied at the way these studies were conducted, and therefore the quality of the results.
Firstly, I'm not entirely satisfied that all of the studies were conducted by people who really understood what TDD is. They tend not to mention refactoring, for example. As refactoring is roughly half of the effort in TDD, this seems like a major oversight.
Secondly, studies seem to be largely conducted on groups who are introduced to the disciplines of TDD at the start. I know from years of experience training and coaching teams in TDD that the learning curve can be formidable, and that TDD can take hundreds of hours of good practice to really get the hang of. The idea of giving a group a crash course in it, and then setting them a small exercise from which my study will derive, seems unrealistic. A team that's been doing for it for more than a year is likely to produce measurably different results in both software quality (internal and external) and productivity.
Thirdly, studies do not account for the specific way in which TDD is being practiced within the teams. There are different schools of TDD, and a whole spectrum of possible ways it can be practiced. Team A might be writing tests at a high level, for example, and these tests may make many assertions. Team B might be rigorously applying the "tests should test one thing" rule. Both teams could be following the golden rule of TDD, namely that they don't write production code until a failing test requires it.
And finally, these studies aren't asking the important question; namely, what effect does TDD have on things that would matter to a business? What effect does it have on feature request cycle times? What effect does it have on sustainability of innovation?
Tell your CTO that TDD will reduce bug counts, or that it doesn't cost more to do, and he or she is likely to shrug their shoulders and say "so what?" Tell them that TDD can help reduce cycle times to less than a week, or that teams that do it well are able to sustain a reasonable pace of change for years on the same code base, and they may sit up and take notice.
IT managers are so used to telling businesses they'll have to wait six months to get that feature that marketing needed yesterday, or telling them that they can't have it because it's just too expensive to make changes to a legacy system, then you may be carried aloft like conquering heros if you can offer them a way out of that.
Even though the studies are flawed, though, they still tend to conclude that TDD has benefits. Code that is test-driven tends to be simpler, and have lower bug counts. And there's a real mix of results regarding productivity - so much so that it's reasonable to conclude that TDD has little impact on schedules or development costs in the short-to-medium term. And that's with sample groups who are usually just beginning with TDD.
I consider the study conducted at the BBC by Kerry Jones (now at social TV start-up Zeebox) and myself to be one of the better ones. It's using data comparisons from real-world projects and over a long term (1 year), and the developers participating went through not just a crash course in TDD but a fairly rigorous 6 month peer-learning exercise, with regular weekly practice and a practical TDD skills assessment which they all passed. They were all demonstrably capable of practicing TDD in roughly the same way.
Where we suffered was lack of useful data beyond the code itself. Like most organisations who do software development, teams at the BBC do not know how many person-hours go into different activities, or what the cycle time of feature requests is, or even how many bugs are reported with each release.
Anecdotally, they reported that on one project where the team practiced TDD fairly rigorously right from the start, the frequency of live releases was greater than at any time on any previous project. So frequent, in fact, that it was edging towards what we might recognise as "continuous deployment". Again, anecdotally, we heard reports that if the code passed all of the automated tests, the business was satisfied that it was fit for a release, and that lengthy acceptance testing phases were not considered necessary.
These are just anecdotes, though. We have hard evidence to support our claim that TDD improved code quality, but only the usual ghost stories to support any claims beyond that.
What was frustrating at the time, and this is usually the case, is that all the raw data we needed was probably there somewhere. Project management must surely know how many people put in how many days on each release. They must surely know when a feature was first added to the backlog, and when the working code went live.
Couple this with a bug-tracking database, a source code repository and the usual Scrum/Kanban data and I would have everything I need to tie it all together. The hardest piece of the jigsaw to find is how the code is being written. For that, you really need to see it being written. Just as it is with history, there's only so much you can learn from examining ancient artifacts. There's no substitute for a high-fidelity account from somewhere who was there.
If I was conducting an academic study on this now, I'd ask for several sources of useful data:
1. The source code repository containing a complete version history
2. The defect tracking database associated with that code
3. The complete project history (release/iteration plans & actuals, use case/user story estimates & actuals, backlogs, burndowns, staffing etc)
4. Something that would allow me to see code being written (e.g., screencasts made of developers working on the code, IDE session recordings)
Using this data, I could visualise the arc of a software product over its lifetime up to now and look for any correlations between TDD and other coding practices and the shape of the arc.
Software has a tendency to plateau, sooner or later. At some point, the cost of changing it outweighs the benefit of doing so. At this stage, we have a legacy system: namely one that is critical to the continued operation of a business while simultaneously being a significant impediment to the evolution of that business. Like old age, every software system has this coming. And it's arguably the default state of the majority of systems in use today. Which means that it's the default state for the majority of businesses that rely on legacy systems.
But some software reaches this plateau long before others, just as some people age faster than others. If we can postpone the inevitable for longer, our software can live a more active and fulfilling life for longer, and our businesses can stay adaptive using those systems for longer.
It's my theory that business evolution exhibits a sort of punctuated equilibrium. They tend to spend most of their time in prolonged phases of equilibrium, when things don't change much, and then suddenly - due to a new opportunity or threat or some other sudden change in the conditions that surround them - they frantically reinvent themselves to adapt and to stay alive through another prolonged phase of equilibrium.
Quite often, it's these short phases of organised panic that tend to give rise to the ambitious new IT projects, as businesses discover the prohibitive cost of teaching their legacy systems new tricks. It's often accompanied by major structural changes within the organisation and massive upheaval.
This isn't the best way to build a "learning organisation". The same principle applies to major "big bang" software releases and major organisational change programmes - when we change 1,001 things at once, we lose the ability to learn one lesson at a time. Maybe 499 of those changes were the wrong changes, but rolling back 499 changes without throwing the baby out with the bathwater is fiendishly difficult. Like software, businesses succeed or fail as a whole.
It falls on us to develop software in a way that supports continuous, sustainable business evolution and to help build real learning organisations - organisations that can learn one lesson at a time, and sustain the pace of learning indefinitely.
It is my belief that programming practices like TDD, refactoring and continuous integration can help to achieve this. But it's just a belief, based on wishy-washy personal experience. I have seen a ghost. Now I need to get some instruments in there, collect some hard data, and prove it to the rest of the world.
Posted 8 years, 9 months ago on April 22, 2012