September 13, 2007

...Learn TDD with Codemanship

Reused Abstractions & Probability of Change

Hey, guess what?

Just for a change, I'm going to talk about the problem of change propagation. Mmmm. That'll be nice, won't it?

Yesterday I suggested a metric for the probability of change propagation, which - because I was using a forest fire analogy - I've called flammability.

I suggested that some arrangements of the same trees might be more flammable or less flammable than others. In particular, whether having air flowing away from trees that are less likely to have a burning match dropped on them would reduce overall flammability.

Relating this back to software, I'm asking if the received wisdom of having dependencies go towards types that are less likely to change - because change propagates backwards along dependencies - actually reduces the "flammability" of code, effectively minimising the ripple effect that can cause small changes to propagate throughout the software.

I don't know the answer to that, or even anything close. But I have been thinking about this question of what makes types less likely to change?

The actual received wisdom is that you should prefer to depend upon abstractions - abstract classes and interfaces. But why are these less likely to change than concrete classes?

If I write a class called Foo with 200 lines of code in it, and then declare it abstract, does it suddenly, magically become less likely to change? I don't think so. What about an interface IFoo? Well, I think that would be less likely to change, based on experience. But why?

And we're back in the forest again...

Re-imagine our original forest, but this time the trees are of different sizes. Now when we drop a match, the probability that it will land on a specific tree will depend on the amount of tree there is for it to land on. A tree twice as big will have double the probability that a match will land on it.

Code, when all is said and done, is just a big bunch of information. When we make a change to the code, we're just changing that information. When we organise information into modules of different sizes, we effectively divvy up the probability that when a change needs to be made, it'll have to be made in one of those modules. The more information in a module, the more likely it is that's where the change will have to happen.

An interface like IFoo contains far less information than our concrete Foo class, with it's 200 LOC. It's a much smaller target, and therefore change is less likely to fall on it.

It matters less that IFoo is an interface, necessarily. What matters is that IFoo is small. If Foo was a class with a bunch of abstract ("pure virtual") methods, it would probably be every bit as likely to change as if we'd declared it as an interface. The fact that it is an interface - in the way programming languages take that to mean - probably has little bearing, then, on our principle of preferring to depend upon abstractions.

In other words, just because you've got lots of dependencies on abstract classes and interfaces, that doesn't necessarily mean that your code is less flammable. These are arbitrary distinctions we make, either to send out a signal to say "I mean this type to be less volatile", or as convenience for some tool or framework like JMock.

That's not to suggest, though, that the concept of a subtype doesn't have any bearing on matters. If a type T defines all the possible instances of objects of that types (e.g., every possible bank account), then a subtype of T, let's called it S, defines a set of objects such that every object in S is necessarily a member of T, but not vice-versa. For example, every settlement account is necessarily a bank account, but not every bank account is necessarily a settlement account.

Here we're interested in the relative size of set membership for a type. If T has more members than S, then we'd be better off depending on T. Why?

In a pathetically simple illustration, imagine some client, C, that can be coupled either to T or to S. If C depends on S, and we change T, then - since every S is necessarily a T, we might need to change C. And in the same scenario, if we change S, we might also need to change C. With me so far? Excellent. Nearly there now...

If C is coupled to T, and we change T then C might have to change. But if we change S, C won't need to change, since every S is necessarily a T, no matter what we do to S. We can only force a change in C by changing T.

QED. Well, almost. Actually there are a couple of caveats:

1. S is smaller than T. If every instance of T is actually an instance of S, too, then we lose the benefit.
2. We are observing the Liskov Substition design principle, namely that an instance of any type can be substituted with an instance of any of its subtypes with no ill effects. If S = T, and we don't observe this principle, we actually could break C by changing the behaviour of S.

How fascinating, I hear you cry. Well, actually, yes it is. Rather.

You see, for a few years now I've held the unjustified prejudice that it's bad to create arbitrary abstractions that are only implemented or extended once - e.g., interfaces for mocking that aren't used anywhere else. In my irrational mind, I had it that if I declared an interface, it was because there were at least two types that needed to expose the same set of operations. To me, code abstractions are mechanisms for factoring out commonality (you may know it as "duplication") across types.

So I have ranted and raved and frothed and vigorously gesticulated about a complementary design principle that I call the Reused Abstractions principle. That is to say, that - in my opinion, which I have been unable to back up or defend with data or a reasoned argument of any kind - it is better to depend upon abstractions that are implemented or extended more than once.

Now I see a genuine rational argument swinging into view. The reason why it's better to depend upon reused abstractions is because that way we know the abstraction's set of instances very probably is larger than those of it's subtypes.

I bloody knew it!

(Okay - another caveat, my reasoning could turn out to be bogus. Please drop me a line if you see the hole.)

Posted 13 years, 6 months ago on September 13, 2007