January 3, 2008

...Learn TDD with Codemanship

Detecting Code Smells - Requires Formal Definitions?

One of the problems with the current state of thinking in software design is that, even at the most disciplined end of the spectrum, it's still far too woolly. Take "code smells", for example. These are effectively software design anti-patterns, giving us a shorthand for describing design problems in the same way that patterns give us a shorthand for describing design solutions.

But I find a lot of my time is wasted trying to convince development teams that their code really does contain one or more design smells, and that's because the definition of these anti-patterns is ambiguous and handwavy enough that we can argue the toss over whether a chunk of code really is an example of that design smell or not.

For example, a common code smell in multi-tier business and web applications is Data Class. Cohesion is one of the goals of good software design. We want to package things that are likely to change together in the same organisational units - methods, classes, packages, applications and so on. Logic acts on data, and therefore when we change the definition of our data, the logic that applies to it is likely to change, too. So we should strive to package data and behaviour together. In simpler terms, methods that access fields in classes need to be attached to the classes that contain those fields.

So when I see a Java class with just a bunch of simple, one line getters and setters and no actual real behaviour of any consequence, I shout "Data Class!" At which point, 50% of the developers say "Oh, but there are methods, and they do access the fields in the same class." And I say, "oh, but getters and setters don't count", and they reply "says who?", and on the debate goes, long into the night.

They have a point, though: who said that getters and setters don't count? I haven't seen it described anywhere, not definitively. And at what point does a class become a "data class". If a class has 50 fields, and one significant method that accesses 5 of them, is that a data class? What if it has 10 fields and two significant methods that access 5 of them?

Right now, I'm thinking about finding ways to automatically detect code smells - so I have to figure out a way to programmatically classify code, which means I need a computable definition of each design smell. And guess what - that's not currently possible. Not unless I invent my own computable definitions. And then lots of people will disagree with my interpretations. And round and round we'll go until one of us gets dizzy or throws up into our laptop bag.

To illustrate my point, consider this precise definition - using UML and OCL - of what I think might constitute a data class.

For those of you who don't speak Object Constraint Language*, it simply states that a class is a data class if any if its fields is not accessed by at least one method that isn't a getter or setter. Or something along those lines, anyway. (My OCL is getting rustier every day, I'm afraid - so apologies if it doesn't actually say that at all...)

But is that what a data class really is? Always? Or is is not so black and white? Are there degrees of "data class-iness"? Is my first example more "data classy" than my second example?

Whatever the real answer, what might really help now is a working, testable definition, so we can at least find examples to refute it and evolve our understanding. This has been sadly lacking in recent years, and as a result, I feel that our progress in understanding good design has stalled somewhat. Time to get the wheels moving again?

I think so, anyway.

* .NET OCL tutorial is here

Posted 13 years, 7 months ago on January 3, 2008