July 27, 2018
For Load-Bearing Code, Unleash The Power of Third-Generation TestingAs software "eats the world", and people rely more and more on the code we write, there's a strong case for making that code more reliable.
In popular products and services, code may get executed millions or even billions of times a day. In the face of such traffic, the much vaunted "5 nines" reliability (99.999%) just doesn't cut the mustard. Our current mainstream testing practices are arguably not up to the job where our load-bearing code's concerned.
And, yes, when I say "current mainstream practices", I'm including TDD in that. I may test-drive, say, a graph search algorithm in a dozen or so test cases, but put that code in a SatNav system and ship it 1 million cars, and suddenly a dozen tests doesn't fill me with confidence.
Whenever I raise this issue, most developers push back. "None of our code is that critical", they argue. I would suggest that's true of most of their code. But even in pretty run-of-the-mill applications, there's usually a small percentage of code that really needs to not fail. For that code, we should consider going further with our tests.
The first generation of software testing involved running the program and seeing what happens when we enter certain inputs or click certain buttons. We found this to be time-consuming. It created severe bottlenecks in our dev processes. Code needs to be re-tested every time we change it, and manual testing just takes far too long.
So we learned to write code to test our code. The second generation of software testing automated test execution, and removed the bottlenecks. This, for the majority of teams, is the state of the art.
But there are always the test cases we didn't think of. Current practice today is to perform ongoing exploratory testing, to seek out the inputs, paths, user journeys and combinations our test suites miss. This is done manually by test professionals. When they find a failing test we didn't think of, we add it to our automated suite.
But, being manual, it's slow and expensive and doesn't achieve the kind of coverage needed to go beyond the Five 9's.
Which brings me to the Third Generation of Software Testing: writing code to generate the test cases themselves. By automating exploratory testing, teams are able achieve mind-boggling levels of coverage relatively cheaply.
To illustrate, here's a parameterised unit test I wrote when test-driving an algorithm to calculate square roots:
Imagine this is going to be integrated into a flight control system. Those five tests don't give me a warm fuzzy feeling about stepping on any plane using this code.
Now, I feel I need to draw attention to this: unit test fixtures are just classes and unit tests are just methods. They can be reused. We can compose new fixtures and new tests out of them.
So I can write a new parameterised test that, for example, generates a large number of random inputs - all unique - using a library called JCheck (a Java port of the Haskell QuickCheck library).
Don't worry too much about how this works. The important thing to note is that JCheck generates 1,000 unique random inputs. So, with a few extra lines of code we're jumped from 5 test cases to 1,000 test cases.
And with a single extra character, we can leap up a further order of magnitude by simply adding a zero to the number of cases. Or two zeros for 100x more coverage. Or three, or four. Whatever we need. This illustrates the potential power of this kind of technique: we can cover massive state spaces with relatively little extra code.
(And, for those of you thinking "Yeah, but I bet it takes hours to run" - when I ran this for 1 million test cases, it took just over 10 seconds.)
The eagle-eyed among you wil have noticed that I didn't reuse the exact same MathsTest fixture listed above. When test inputs are being generated, we don't have 1,000,000 expected results. We have to generalise our assertions. I adapted the original test into a property-based test, asserting a general property that every correct square root has to have.
Our property-based test can be reused in other ways. This test, for example, generates a range of inputs from 1 to 10 at increments of 0.01.
Again, adding coverage is cheap. Maybe we want to test from 1 to 10000 at increments of 0.001? Easy as peas.
(Yes, these tests take quite a while to run - but that's down to the way JUnit handles parameterised tests, and could be optimised.)
Let's consider a different example. Imagine we have a design with a selection of UI's (Web, Android, iOS, Windows), a selection of local languages (English, French, Chinese, Spanish, Italian, German), and a selection of output formats (Excel, HTML, XML, JSON) and we want to test that every possible combination of UI, language and output works.
There are 96 possible combinations. We could write 96 tests. Or we could generate all the possible combinations with a relatively straightforward bit of code like the Combiner I knocked up in a few hours for larks.
If we added another language (e.g., Polish), we'd go from 96 combinations to 112. It's hopefully easy to see how much easier it could be to evolve the design when the test cases are generated in this way, without dropping below 100% coverage. And, yes, we could take things even further and use reflection to generate the input arrays, so our tests always keep pace with the design without having to change the test code at all. There are many, many possibilities for this kind of testing.
To repeat, I'm not suggesting we'd do this for all our code - just for the code that really has to work.
Food for thought?
Posted 2 months, 4 days ago on July 27, 2018