July 16, 2014

...Learn TDD with Codemanship

What Level Should We Automate Most Of Our Tests At?

So this blog post has been a long time in the making. Well, a long time in the procrastinating, at any rate.

I have several clients who have hit what I call the "front-end automated test wall". This is when teams place greatest emphasis on automating acceptance tests, preferring to verify the logic of their applications at the system level - often exercised through the user interface using tools like Selenium - and rely less (or not at all, in some cases) on unit tests that exercise the code at a more fine-grained level.

What tends to happen when we do this is that we end up with large test suites that require much set-up - authentication, database stuff, stopping and starting servers to reset user sessions and application state, and all the fun stuff that comes with system testing - and run very slowly.

So cumbersome can these test suites become that they slow development down, sometimes to a crawl. If it takes half an hour to regression test your software, that's going to make the going tough for Clean Coders.

The other problem with these high-level tests is that, when they fail, it can take a while to pin down what went wrong and where it went wrong. As a general rule of thumb, it's better to have tests that only have one reason to fail, so when something breaks it's alreay pretty well pinpointed. Teams who've hit the wall tend to spend a lot time debugging.

And then there's the modularity/reuse issue: when the test for a component is captured at a much higher level, it can be tricky to take that chunk and turn it into a reusable chunk. Maybe the risk calculation component of you web application could also be a risk calculation component of a desktop app, or a smartwatch app. Who knows? But when its contracts are defined through layers of other stuff like web pages and wotnot, it can be difficult to spin it out into a product in its own right.

For all these reasons, I follow the rule of thumb: Test closest to the responsibility.

One: it's faster. Every layer of unnecessary wotsisname the tests have to go through to get an answer adds execution time and other overheads.

Two: it's easier to debug. Searching for lost car keys gets mighty complicated when your car is parked three blocks away. If it's right outside the front door, and you keep the keys in a bowl in the hallway, you should find them more easily.

Three: it's better for componentising your software. You may call them "microservices" these days, but the general principles is the same. We build our applications by wiring together discrete components that each have a distinct responsibility. The tests that check if a component fulfils its reponsibility need to travel with that components, if at all possible. If only because it can get horrendously difficult to figure out what's being tested where when we scatter rules willy nilly. The risk calculation test wants to talk to the Risk Calculator component. Don't make it play Chinese Whsipers through several layers of enterprise architecture.

Sometimes, when I suggest this, developers will argue that unit tests are not acceptance tests, because unit tests are not written from the user's perspective. I believe - and find from experience - that this is founded on an artificial distinction.

In practice, an automated acceptance test is just another program written by a programmer, just like a unit test. The programmer interprets the user's requirements in both cases. One gives us the illusion of it being the customer's test, if we want it to be. But it's all smoke and mirrors and given-when-then flim-flam in reality.

The pattern, known of old, of sucking test data provided by the users into parameterised automated tests is essentially what our acceptance test automation tools do. Take Fitnesse, for example. Customer enters their Risk Calculation inputs and expected outputs into a table on a Wiki. We write a test fixture that inserts data form the table into program code that we write to test our risk calculation logic.

We could ask the users to jot those numbers down onto a napkin, and hardcode them into our test fixture. Is it still the same test? It it still an automated acceptance test? I believe it is, to all intents and purposes.

And it's not the job of the user interface or our MVC implementation or our backend database to do the risk calculation. There's a distinct component - maybe even one class - that has that responsibility. The rest of the architecture's job is to get the inputs to that component, and marshall the results back to the user. If the Risk Calculator gets the calculation wrong, the UI will just display the wrong answer. Which is correct behaviour for the UI. It should display whatever output the Risk Calculator gives it, and display it correctly. But whether or not it's the correct output is not the UI's problem.

So I would test the risk calculation where the risk is calculated, and use the customer's data from the acceptance test to do it. And I would test that the UI displays whatever result it's given correctly, as a separate test for the UI. That's what we mean by "separation of concerns"; works for testing, too. And let's not also forget that UI-level tests are not the same thing as system or end-to-end tests. I can quite merrily unit test that a web template is rendered correctly using test data injected into it, or that an HTML button is disabled running inside a fake web browser. UI logic is UI logic.

And I know some people cry "foul" and say "but that's not acceptance testing", and "automated acceptance tests written at the UI level tend to be nearer to the user and therefore more likely to accurately reflect their requirements."

I say "not so fast".

First of all, you cannot automate user acceptance testing. The clue is in the name. The purpose of user acceptance testing is to give the user confidence that we delivered what they asked for. Since our automated tests are interpretations of those requirements - eevery bit as much as the implementations they're testing - then, if it were my money, I wouldn't settle for "well, the acceptance tests passed". I'd want to see those tests being executed with my own eyes. Indeed, I'd wanted to execute them myself, with my own hands.

So we don't automate acceptance tests to get user acceptance. We automate acceptance tests so we can cheaply and effectively re-test the software in case a change we've made has broken something that was previously working. They're automated regression tests.

The worry that the sum total of our unit tests might deviate from what the users really expected is mitigated by having them manually execute the acceptance tests themselves. If the software passes all of their acceptance tests AND passes all of the unit tests, and that's backed up by high unit test assurance - i.e., it is very unlikely that the software could be broken from the user's perspsctive without any unit tests failing - then I'm okay with that.

So I still have user acceptance test scripts - "executable specifications" - but I rely much more on unit tests for ongoing regression testing, because they're faster, cheaper and more useful in pinpointing failures.

I still happily rely on tools like Fitnesses to capture users' test data and specific examples, but the fixtures I write underneath very rarely operate at a system level.

And I still write end-to-end tests to check that the whole thing is wired together correctly and to flush out configuration and other issues. But they don't check logic. They just the engine runs when you turn the key in the ignition.

But typically I end up with a peppering of these heavyweight end-to-end tests, a feathering of tests that are specifically about display and user interaction logic, and the rest of the automated testing iceberg is under the water in the form of fast-running unit tests, many of which use example data and ask questions gleaned from the acceptance tests. Because that is how I do design. I design objects directly to do the work to pass the acceptance tests. It's not by sheer happenstance that they pass.

And if you simply cannot let go of the notion that you must start by writing an automated acceptance test and drive downwards from there, might I suggest that as new objects emerge in your design, you refactor the test assertions downwards also and push them into new tests that sit close to those new objects, so that eventually you end up with tests that only have one reason to fail?

Refactorings are supposed to be behaviour-preserving, so - if you're a disciplined refactorer - you should end up with a cluster of unit tests that are logically directly equivalent to the original high-level acceptance test.

There. I've said it.

Posted 3 years, 5 months ago on July 16, 2014