November 1, 2007

...Learn TDD with Codemanship

Does Automating Metrics Help For Comparing Apples With Apples?

One of the toughest aspects of measurement is consistency. Even when we're measuring code, it's surprsingly easy to apply a metric differently every time you do it.

Take something as eyewateringly simple and straightforward as unit test coverage. I'm sure we all know what I mean when I say unit test coverage equals the percentage of lines of code executed in the running of a suite of unit tests. Don't we?

Well, first of all, what do we mean by a "line of code"? Not every line of code we write is actually executable, for example, and therefore couldn't be excecuted in a unit test. So do we ignore these non-executable lines of code? That sounds reasonable enough, I think.

And what about code that's generated by my development tools - like GUI code generated by a WYSIWIG UI editor? Do I need to test all that code, too? Perhaps we should ignore that stuff, too - rather than wasting a lot of effort testing the code generator itself.

And should I include all the executable code in the system? Should I include JavaScript designed to run in a web browser? And SQL stored procedures? And instructions stored in XML files (e.g., for object-relational mapping or a model-view-controller configuration)? Should I include anything stored anywhere in my system that could possibly go wrong?

Even when we restrict our definition of "executable code" to, say, "all the Java code in this Eclipse project", we still struggle to achieve total consistency and repeatability for our measurements.

Right now, I'm struggling with code coverage metrics for .NET projects using a tool called NCover. What I've been doing is opening up Visual Studio solutions, building the code and then using the TestDriven.NET Visual Studio add-in to run all of the unit tests and report coverage results using the NCover Explorer. What could be simpler? Except the coverage reports I'm getting don't include all of the .NET projects in each solution. Projects which tests don't touch don't appear to be included in the results. Or are they included in the numbers, but the tool just doesn't the project itself in the list? Without further experimentation, I just can't say.

What I can say is that I can easily imagine 5 different people measuring the same code with the same tools and coming up with 5 different answers. Just like I can imagine 5 different people deploying the exact same web solution to the exact same servers, and ending up with 5 different configurations.

Indeed, I can imagine - and have experienced first hand - the same person measuring the same code using the same tools and coming with two different sets of figures. So when our dashboard tells us that Project X's test coverage went up 2% in the last week, we can't rule out a signficant margin for error. Or when the dashboard tells us that Project X has 15% higher test coverage than Project Y, we have to take that with an equally healthy pinch of salt. It makes comparing apples with apples very difficult, and reduces my confidence in the metrics.

And I can only conclude that automation is probably part of the solution. Applied to the same code base, automated collection of metrics would at least help to ensure repeatability and consistency at the project level. Across projects, a combination of reuse - applying exactly the same automated metrics - and perhaps even automating the process for generating the scripts that automate metrics collection and reporting (e.g., generating them from a Visual Studio solution) might help.

But boy, doesn't that sound like a lot of work?

Posted 13 years, 1 month ago on November 1, 2007