May 10, 2013

Making The Untestable Testable With Mocks - Resist Temptation To Bake In A Bad Design

Just a quick note before my next pairing session about using mock object frameworks to make untestable code testable.

Mocking frameworks have grown in their sophistication, for sure. But I fear they may have mutated into testing tools, rather than the design aids that their originators intended.

Say, for example, you're trying to write unit tests for some legacy code that depends on a static method which accesses the file system. We want unit tests that run quickly, and reading and writing files means slow unit tests. So we want somehow that we can invoke the methods we want to test without them calling that static method.

Enter stage right: UberMock (or whatever you're using). UberMock solves this problem with some metaprogramming jiggery-pokery that makes it possible to specify that a mock version of a static method be invoked at runtime. We write unit tests that set up expectations on that mock static method call. That is to say: we expose an internal detail that the static method - in mock form - should be invoked.

That's a legacy code "gotcha". We now have unit tests. Hoorah! But these unit tests depend on this internal design detail. And make no mistake - it's a design flaw we'll want to get rid of later.

If we decide, after we've got some tests around it, to refactor this horrid code so that we're observing the Open-Closed Principle (The "O" in "SOLID" - meaning that classes should be open for extension but closed for modification, which is not possible when we depend on static methods that can't be susbtituted with overrided implementations without the aforementioned meta-programming jiggery-pokery), we cannot do so without re-writing our tests.

The tests we write that depend on internal design details of legacy code effectively bake in that legacy design, making refactoring doubly difficult at the very least.

If our ultimate aim is to invert that dependency on a static method, so that the code now relies on some dependency-injected abstraction, it tends to work out easier in the long run to put that abstraction in place first, and then use mocks to unit test that code.

Don't bake in a design that you'll later need to change

It's a little chicken-and-egg, I grant you. Ideally, we'd want unit tests around that code before we tried to introduce the abstraction, but how do we do that - without baking in the old design - until the abstraction's in place.

It's one of those situations where, I'm afraid, the answer is that you're going to have to be disciplined about it. There's usually no quick fix. You may have to rely on slow and cumbersome system tests for a while. Or even - gulp - manual testing.

But experience has taught me that, in the final reckoning, it can be well worth it to avoid pouring quick-drying cement on an already rigid and brittle design.

Ah, and I hear my next victim customer a-calling. I'm free!


January 12, 2013

The World-Famous Legacy Code Singleton Fudge

When refactoring legacy code that relies on static methods to access external systems - for example, for data access - our first goal is usually to make the code unit-testable. Therefore, we seek to invert that dependency on a static method to make it substitutible.

I demonstrated in previous blog posts how we do this in fairly simple situations. The dance is pretty straightfoward: you turn the static method into an instance method and then do a "Find and Replace" to swap references to the class that method is on into with "new ClassName().", so the target of invocation is now an instance. We then give client code a way to do the old polymorphic switcheroo by injecting that instance into, say, the constructor of the class where it's being used.

As easy as cake.

What often comes up though is a more complex, and less than ideal situation. What if our static method is being accessed by many, many classes? We'd have to introduce dependency injection into every class that uses it, and then - if those classes are some way down the call stack - into the every link in the chain to pass references from the top down to where they're used. This could be a lot of work, and I've not found a quick automated way of doing that. And what if - horror of horros - the methods that access our static data access method are also static?

Take this example. Here's a pretty nasty data access class that offers static methods for updating and retreiving object state to an external database.



Imagine that -for reasons best known to themselves - the developers use these methods inside every single business object in their middle tier. Like:



Our goal is to get decent automated unit test assurance in place quickly, so we can then safely set about refactoring this whole can of worms properly.

In this situation, I've often used a fudge to get my tests in place. The fudge being to deliberately introduce a Singleton. (Gasp!)



I first extract a new class that contains the implementations of the Update and Fetch methods, and leave delegate methods in our original DataAccess class. Then I extract an interface from this new DataAccessImpl class (IDataAccess) so we can make those methods polymorphic.

Here comes the fudge - I then introduce a static method for setting the data access implementation at runtime. so our unit tests can set it as a mock or a stub, and our production code can set it once as the real McCoy.

Yes, pretty rank. But we've made our application logic unit-testable at a relatively low expense, and can now start writing those tests that will be the safety net for unpicking this whole mess once and for all. To unpick it first and then write the tests would be too risky, in my experience.




October 28, 2012

Refactoring Legacy Code #2 - Making Web Apps More Unit-Testable

Following on from that last post about refactoring legacy classes that depend on external systems (like a database) - which has been read by literally dozens of people, and that's no idle boast - I also get asked a lot about making web applications unit-testable.

Taking classic ASP.NET as a typical example - and again using a toy but typical example - the problem is also external dependencies. When we reference ASP.NET objects like Session and Request, we tie our code to the ASP.NET process and the lifecycle of our web forms. We can't just create na instance of a web form's class and start invoking methods on the controls on our page, because outside of ASP.NET, those objects won't be there.



Our goal in making our legacy code unit-testable is to be able to test as much of the logic of our app as possible quickly and effectively, and to do this we need to isolate as much code as we can from external dependencies like these.

I'm a big believer that server pages and web forms should do as little as possible. Really, they should just be a very thin film of glue that binds the logic of user interactions and the display - which, if we think about it, is only marginally about Session and Request and HTML controls - with the meat and potatoes seperated away from knowledge of those details.

I might start to refactor this by extracting the meat and potatoes, complete with ASP.NET dependencies, into its own method.



Next, if I'm looking for some way to write the order data to the page without actually referencing the page or any of its controls, I need to extract methods that I can use to delegate this work through.



Now, for the magic. You'll like this. Not a lot, but you'll like it. If I make these helper methods for writing customer data to the web form public, I can extract an interface on the form's class, and have our controlling method speak to the form through that interface.



Next, we need to tackle that reference to Session. There are many different ways of breaking this dependency, but the simplest here might be to hide it behind another extracted helper method as a stepping stone to where I want to go next.



Now, I could just extact another interface on our form's class and pass that in. But I'm guessing we may want to have a shared abstraction we can reuse in a wider set of situations. Basically, imagine we don't want to implement SetSessionVariable (and, presumably, GetSessionVariable) on every web form. So, I'm going to extract a new class, and then extract an interface on that class.





Now we have a DisplayCustomerWithOrders method that depends only on abstractions for Session and for the web form - importantly, abstractions we control.

Next, I would extract this method into its own class. if you like, we can all it a "controller". (Let's make that one sacrifice to appease the gods of enterprise architecture.)





Now we're really getting somewhere. As it stands we could move CustomerController into a new .NET library, along with the interfaces it depends on, and this would all be unit-testable without the need to be running in the ASP.NET process.

We've got as bit of tweaking to do, first, though. For starters, if we follow the rule (not blindly, but with sound reason) that objects should be born with their collaborators, then let's refactor CustomerController along those lines, so any other controller methods we add can access the userSession and the view.

And while we're about it, we should make it possible for us to inject our DataRepository, so we can write unit tests that won't hit a real database.





We now have a controller that's isolated from the front and the back end of this application, and can be unit-tested using, for example, mock objects to check that it calls for the right customer and tells the view to set the right customer field and order values on the GUI.

A little bit of clean-up in our web form's class, just to tie up the loose ends...



The observant among you will have noticed that our refactored ASP.NET web form class is not smaller than it was. This is because my example is very simple in terms of business and control logic, and also because we only had one event to deal with. If this web form had multiple event handlers, and our business logic was more sophisticated, like in a real application, then the ratio of unit-testable code to web form code would normally start to tip in our favour.

It's often feasible to end up with 90% or more of our code to end up in unit-testable classes when we abstract away the external stuff like GUIs and databases, and make them substitutible for testing and other purposes.

Again, while all this refactoring was going on, I was disciplined enough to run a basic Selenium test script after each individual step to make sure the app was still working. But at the earliest opportunity, I would start writing unit tests to check the logic. Selenium's dandy and all, but when you have 10,000 business rules to check, testing them through a web browser requires a lot of down-time.




October 27, 2012

Refactoring Legacy Code #1 - Making Classes Unit-Testable

A question that often comes up is "how can we get unit tests around our legacy code?"

The problem is usually one of dependencies that make it impossible to separate your code from external software, such as database servers and web/app servers, so your code won't work unless those external systems are there.

Here's a toy - but typical - example, inspired by some refactoring I worked with a client on recently:



We want to write unit tests for our CustomerServices class, but the calls to the static data access methods on DataRepository - which, let's presume, involve a visit to a database of some kind - mean that we can't test the code unless that database is there. We could write tests for the code as it is, but those tests will run much slower and we may have to run database scripts and wotnot to set up test data.

To write unit tests around CustomerServices that will run entirely in this application's process and won't involve external databases, we need to refactor the code so that we can substitute some kind of test double - e.g., a stub - where we're currently invoking static methods.

While we're about it, can you see why I tend to favour instance methods by default?

We can achieve this in several steps. First, let's turn these static methods into instance methods. (They're stateless, so it's pretty straightforward. A find-and-replace in both the CustomerServices module and the DataRepository will do the trick. Replace "static " with "", and in CustomerServices, replace "DataRepository" with "new DataRepository()".)



Next, because my end goal is to stub DataRepository in my tests, I want to abstract it for those purposes. So I'm going to extract an interface.



Next, I want to inject instances of IDataRepository into CustomerServices, rather than having them created internally. I could introduce them as parameters to both methods, but that seems like a kind of duplication. Better, I think, that CustomerServices should be born with its collaborators, so I'm going to inject it in through the constructor.

I'll do this is two steps. First, introduce a field of type IDataRepository that's initialised in the constructor. Then introduce it as a constructor parameter. A doddle in Resharper.



This is what we call a dependency inversion. Our high-level module CustomerServices depended on low-level details. Now it depends only on an abstraction, IDataRepository.

So I can write unit tests that pass in a stub that implements IDataRepository that allows me to inject test data menaingfully, and the web service that uses CustomerServices can pass in a real DataRepository that will connect it to the database.



Once I've got some tests around what Michael feathers call an "inflection point" in his excellent book Working Effectively With Legacy Code, it's safer to do more fine-grained refactorings on the internal design.

BUT...

Was it safe to do the refactorings I did to get those tests in place in the first place? This is where the chicken meets its own egg, so to speak.

Our code couldn;t unit tested because we needed to refactor it to make that possible. But it's not safe to refactor without testing throughout to make sure we haven't borked the software.

What I would do at this delicate stage is opt for a number of potential choices to assure myself my code was still working.

I could, for example, have written those unit tests first, and just taken the hit that it would involve trips to a database until I could remove that dependency.

Or I could have chosen a higher-level inflection point and written some automated tests at that level. Often, this means system tests (e.g., GUI), or unit tests that call remote web services.

Or, if you don't have the skills or the tools necessary to automate system tests, the last resort might be - gasp! - manual testing. Yes, sometime we just have to run the app and click some buttons.

My advice is, if you are going to do manual testing, you need to be extra disciplined about it. Write proper scripts, choose real and meaningful test data, and be vigilant as to the results, down to the smallest detail.

Yes, it's a pain in the behind. But that's why we wanted to get some unit tests in there, right?

In the next post, I'll touch on a common problem in web apps - dependencies on HTTP session and application objects.





September 25, 2012

Revisiting Unified Principles of Dependency Management (In Lieu Of 100 Tweets)

Some years ago, I published a slide deck on OO design principles that's proven to be quite popular (about 50,000 downloads) on the parlezuml.com web sites.

I'm ashamed to say, due to forgetfulness on my part, the metrics suggested for each principles have long fallen out favour on my own work.

SOLID has formed the basis of how we explain OO design principles for probably 15 or more years, and it's easy to forget that there's nothing scientific about SOLID. The principles are not a theoretically complete explanation, nor are they scientifically tested.

We also inherited (no pun intended) different design principles to think about dependencies at different levels of code organisation.

I went into the wilderness for a couple of years and really dug deep to try and get OO design principles straight in my own mind. I wanted to examine the mechanics of it - the "physics" of dependency management, if you like.

Network models have become popular in physics to explain certain kinds of phenomena, ranging from earthquakes to the runs on the financial markets. It occurred to me that any sound principles of OO design ought to be based on models of propagation through networks.

I built simulations to explore propagation scenarios in simplified models of code dependency networks, and from that formed a set of unified dependency management principles that, I believe, apply at any level of code organisation, and not just in OO programming.

My Four Principles of Dependency Management have an order of precedence.

1. Minimise Dependencies - the simpler our code, the less "things" we have referring to other "things"

2. Localise Dependencies - for the code we have to write, as much as possible, "things" should be packaged - in units of code organisation - together with the "things" they depend on

3. Stabilise Dependencies - of course, we can't put our entire dependency network in the same function (that would be silly). For starters, it's at odds with minimising our dependencies, since modularity is the mechanism for removing duplication, and modularisation inevitably requires some dependencies to cross the boundaries between modules (using the most general meaning of "module" to mean a unit of code reuse - which could be a function or could be an entire system in a network of systems). When dependencies have to cross those boundaries, they should point towards things that are less likely - e.g., harder - to change. This can help to localise the spread of changes across our network of dependencies, in much the same way that a run on the banks is less likely if banks only lend to other banks that are less likely to default.

4. Abstract Dependencies - when we have to depend on something, but still need to accomodate change into system somehow, the easiest way to that is to make things that are depended upon easier to substitute. It's for much the same reason that we favour modular computer hardware. We can evolve and improve our computer by swapping out components with newer ones. To make this possible, computer components need to communicate through standard interfaces. These industry abstractions make make it possible for me to swap out my memory with larger or faster memory, or my hard drive, or my graphics card. If ATI graphics cards had an ATI-specific interface, and NVidia cards had NVidia-specific interfaces, this would not be possible.

I've found it easier to apply these 4 principles at method, class, package and system level, and much easier to explain them. At each level of code organisation, we just need to substitute the right "things" into the formula.

Measuring how well our code follows these principles is easier, too.

1. Measuring the size or complexity of code at various levels of organisation is a doddle. Most tools will do that for you. e.g., method length, method cyclomatic complexity, class size (number of methods), package size (number of classes), and so on.

2. Let's take classes as an example: if classes have, on average, high internal cohesion - that is, the features of that class reference each other a lot - and low external coupling with features of other classes, it could be said that we have localised dependencies. It's the ratio between cohesion and coupling that paints that picture.

3. & 4. Are interrelated. Robert Martin's metrics for Abstractness, Instability and Distance From The Main Sequence are a good fit, once we've generalised them to make it possible to calculate A, I and D for methods, classes, packages and systems.

But what about Interface Segregation and Single Responsibility? The research I did for myself strongly suggested that if your code is simple, cohesive and loosely coupled and your dependencies tend to point in the right direction, these things are of little consequence. They are all sort of covered by the underlying mechanics of code dependencies and therefore these four principles. An interface that only includes methods used by a specific client is, in my opinion, more abstract than an interface that includes methods the client doesn't use. And we tend to find that when we scatter responsibilities across classes, or have classes that do too much, that's covered by 1. and 2.




September 13, 2012

We Can Learn A Lot About Collaborative Design From Aardman

Scientists have learned a great deal about humans by studying other animals and looking for similar attributes (and differences) that mark out what it means to be "human".

In particular, we've learned an enormous amount from studying our closest cousins, the Great Apes.

I've been pondering what software development's closest cousins might be, and what we could learn from them.

While watching Aardman's The Pirates In An Adventure With Scientists, it suddenly struck me that perhaps the endevour that most closxely resembles software development is animation.



We face strikingly similar problems to animators.

Firstly, we're both trying to tell compelling stories. Software, when it's done well, has a clear narrative, just like an animated movie. This narrative can be expressed in many ways, and - just as it is with animation - the process of producing working software can be thought of as telling and re-telling the story, adding more detail and refining it until the story's told in executable code.

The second similarity is that we both have to overcome the extreme difficulty of taking care of millions of tiny details without losing sight of the big picture.

Programming is inherently fiddly; far too fiddly for most people to be bothered with. What other kind of person would devote the lion's share of their lives to the kind of minutiae we do? Well, animators for one.

A single animation unit working on a film like "Pirates" might produce 4 seconds of usable action in a week. Each second of film is made up of 24 frames, each of which has to be painstakingly manipulated, with dozens of details changing from frame to frame that they have to keep track of.

And yet, working one frame at a time, tracking miriad interconnected elements, Aardman are able to produce something miraculous; something that many live action films fail to capture - comic timing.

Fight scenes, chase scenes, comedy - all of this is hard enough to get right shooting at 24 frames a second. To execute it so perfectly working one individual frame at a time requires something that, sadly, too many software teams lack - a clear vision.

The split-second timing and the exquisite dynamics of an Aardman animation are no accident. The mechanics of the overall narrative, every scene and every shot are carefully choreographed with storyboards, animatics (more animation) and with people performing the action to match the voice recordings of the actors, so that the animators can see how it should look and work towards realising that vision.

And with as many as 40 units working on different shots at any given time, this vision not only needs to be clear but it also must be a shared vision.

The rules that apply to each character - including non-living characters like the ocean and the wind - have to be clearly established so that no matter which team is animating those characters, they behave in a way that's consistent to their character. It would do little for the movie if the Pirate Captain inexplicably moved and behaved in 40 different ways through the movie, depending on who was animating him.

The objects in our software - howvere you choose to interpret that word - are the characters in our stories. As the design evolves and grows, is extremely important to maintain a clear shared vision of those objects and how they behave, as well as the narratives in which those objects play a part.

Watching "Pirates", something else jumps out at me; the extraordinary consistency of quality. Aardman have very high standards, and these standards seem to have been applied across the board.

I don't doubt that there were animators working on that film with less experience than some of the others. I don't doubt that some animators were probably learning this craft on the job. Where else do they get their great animators from? That scope and quality is not evident in art and film schools. I suspect you can only really learn to make films of Aardman quality working for someone like Aardman.

But there's not a scrap of evidence for less experienced animators in the movie. Every scene and every shot is sublime. If someone was screwing up, then it must have ended up on the cutting room floor or at the back of shot where nobody noticed.

The greatest animators are masters of collaborative design. I believe there's much we could learn from companies like Aardman about telling compelling stories, about establishing a clear shared vision, about getting the tiniest details right while not losing sight of our "comic timing", and about committing to consistently high standards of quality.






September 2, 2012

Teachers, Please Don't Start Kids On SQL. kthxbye.

SQL.

Yes, the horror.

But, in the non-ideal world we live in, most programmers need to know at least some SQL. On account of businesses insisting we stick most of our data in relational databases. The swines!

But I would not recommend a SQL-centric view of software. Data-centricity is a mindset that we find tends to lead to all manner of issues, from the software designs that start at the back with tables and relationships and sort of build an application on top of that to connect the users with their data, to the maintainability problems that tend to result from data-centric architectures where blobs of data get passed around by functions that do stuff to them.

Which is why, even though I know we all need some SQL and RDBMS skills, I would heartily recommend not starting your programming career at what I consider to be the horse's arse.

So it makes my nerves jangle like so much falling cutlery when I hear teachers suggesting that SQL would make a good first language for students to learn programming with.

No.

No, no, no, no.

Oh no.

NO.

I've seen where that leads, and it doesn't lead anywhere pretty.

Software should be tackled from the front. The first question should always be "who or what will be using this program, and what will they be using it for?"

Programs - even data-intensive ones - need to be built to allow people to do useful things with them. The top-down, outside-in approach to design wasn't invented just for jolly larks and cakes.

A database is a consequence of that outside-in, usage-driven design. Just like packing a case is a consequence of your trip, not a reason for it. We decide on the destination and only then decide on what we'll need to pack.

Software designed from the database is up is a vacation planned from the suitcase's point of view. Which is why database-centric designers tend to pack very heavy suitcases, because until you know what the trip will require, the tendency is to pack everything you could possibly need. Just in case.

And even then, how many projects have we worked on where the database designer packed every manner of summer clothing for what turned out, upon reading the use cases, to be an arctic expedition?

So, no. You don't start young impressionable minds with SQL.

Don't you dare!




March 28, 2012

Announcing A Powerful New Framework - Programming Language

Programming Language is a powerful new framework that enables developers to quickly and easily handle Dependency Injection, Inversion of Control, Model-View-Controller and many other common design problems.

Programming Language is easy to use and takes no time to master. There's a version of Programming Language for pretty much every platform - Java, .NET, Linux, iOS etc. You name it, there's a Programming Language for it.

Programming Language is completely object-oriented (providing you're working on an OO programming platform, of course.)

Here are just a couple of examples that illustrate the power and flexibility of the Programming Language framework.

Dependency Injection in Programming Language for Java



In this example, we use Programming Language to provide a mapping between a method parameter, declared as an instance of some interface Abstraction, and a concrete implementation of abstraction. These mappings are stored in a special, easily configurable file called a "Java class". The client method can now access features of Implementation without binding directly to it, and we can easily substitute Implementation for any other class that implements the Abstraction interface (e.g., for the purposes of mocking or stubbing it in unit tests).

Inversion of Control in Programming Language .NET



In this second example, we use the advanced IoC feature of Programming Language to define the explicit order of workflow in a user interface using a special mapping file called a "C# class". This gives us greater flexibility over the workflow. If we want to change the order of events, we simply edit the special mapping file, recompile and - bingo!

Of course, these are just two simple examples. But I'm sure even the least experienced developers among you will already see the incredible potential of the Programming Language framework.

Here's a list of some of the other powerful features accessible through Programming Language:

* Factories
* Builders
* Observers & Events
* Interpreters
* Undoable Commands
* Adaptors
* Proxies
* Persistence
* Role-based Access Control
* And many, many more.

You can download the latest stable build of Programming Language here.




January 24, 2012

Jason's Handy Guide To Evaluating Software Packages

I get asked this question a lot, but it never occurred to me to write down my usual answer.

How do we evaulate shrink-wrapped software against our needs?

Well, that's easy. You still need to do the usual business requirements analysis. Identify who will be using this system, and what their goals will be for using it. In the good old days, we called these "Use Cases". Yep, even if you're buying and not building the software, you still need use cases.

The next step is to flesh out the design of your use cases, as we might normally do, by describing how the user interacts with the software to achieve their goal.

When we're describing software we haven't built yet, this is design. When we're describing how we'll use software that already exists, this is a process of validation. Can the user achieve their goal using the software we're evaluating?

Even with the most feature-rich packages, we tend to find we don't get an exact match. It's not always possible to achieve every user goal using the software. So as we validate the software against our use cases, we may identify gaps. There are almost always gaps.

The next question we need to answer is can we fill those gaps? Let's say we're evaluating Microsoft PowerPoint for our training business. It doesn't do everything we need out of the box. Let's pretend we have a use case where the trainer needs to populate a slide with an organisation chart showing the reporting structure of the group attending the course. She has a spreadsheet with those names listed in alphabetical order and with information about who reports to whom. using PowerPoint's built-in scripting language, Visual Basic for Applications (VBA), it is indeed possible to take that information and automatically generate an Org Chart.

So that gap could be plugged, with some work. Write a reminder about it down on a blank index card. This is now a potential "User Story" for some programming work that would need to be done if we went the PowerPoint route.

Of course, people identify gaps in software all the time, and it's possible that someone somewhere has already found a solution to plugging some of your gaps with handy tools and utilities. Google is your friend here: search for solutions before you think about reinventing the wheel. If you find one, and there's money involved, write down roughly how much on the index card.

Finally, don't forget the non-functional requirements. A package may offer the right features, but it may not be able to handle a high-enough volume of users, or it may not be secure enough for your purposes, or it may take a long time for users to learn. Evaluate thye software against these criteria, too. Be as explicit as you can. Handwavy requirements like "it must be scalable" aren't very helpful for validating software. What do you mean by "scalable" - a certain number of users at any one time, or a certain number of transactions per second, or the ability to run it on more servers?

All too often, businesses buy a solution and then validate that it does what they need - often by actually trying to roll it out. Whether buying or building, the key is to have clear, testable requirements and to validate the software against them. Don't be seduced by their sales patter and let them lead you like a donkey to the slaughter to their feature list. What their software does is far less important than what we can do with their software.




April 2, 2011

The Dependable Dependencies Principle - Draft Paper

For over 15 years we have known of design principles to help us manage dependencies in software to limit the impact of making changes. Another consequence of dependencies has been overlooked, namely the relationship between dependencies and risk of failure in our code. The more depended-upon code is, the greater the potential impact of its failure. We should therefore desire that code this is more depended-upon be more reliable. This paper explores the relationship between dependencies and reliability, and proposes a new design principle, with a first attempt at an accompanying set of metrics, to help us limit the impact of failure.

Download the full draft paper here.