August 3, 2018
Keyhole APIs - Good for Microservices, But Not for Unit TestingI've been thinking a lot lately about what I call keyhole APIs.
A keyhole API is the simplest API possible, that presents the smallest "surface area" to clients for its complete use. This means there's a single function exposed, which has the smallest number of primitive input parameters - ideally one - and a single, simple output.
To illustrate, I had a crack at TDD-ing a solution to the Mars Rover kata, writing tests that only called a single method on a single public class to manipulate the rover and query the results.
You can read the code on my Github account.
This produces test code that's very loosely coupled to the rover implementation. I could have written test code that invokes multiple methods on multiple implementation classes. This would have made it easier to debug, for sure, because tests would pinpoint the source of errors more closely.
If we're writing microservices, keyhole APIs are - I believe - essential. We have to hide as much of the implementation as possible. Clients need to be as loosely coupled to the microservices they use as possible, including microservices that use other microservices.
I encourage developers to create these keyhole APIs for their components and services more and more these days. Even if they're not going to go down the microservice route, its helpful to partition our code into components that could be turned into microservices easily, shoud the need arise.
Having said all that, I don't recommend unit testing entirely through such an API. I draw a distinction there: unit tests are an internal thing, a sort of grey-box testing. Especially important is the ability to isolate units under test from their external dependencies - e.g., by using mocks or stubs - and this requires the test code to know a little about those dependencies. I deliberately avoided that in my Mars Rover tests, and so ended up with a design where dependencies weren't easily swappable in ths way.
So, in summary: keyhole APIs can be a good thing for our architectures, but keyhole developer tests... not so much.
July 27, 2018
For Load-Bearing Code, Unleash The Power of Third-Generation TestingAs software "eats the world", and people rely more and more on the code we write, there's a strong case for making that code more reliable.
In popular products and services, code may get executed millions or even billions of times a day. In the face of such traffic, the much vaunted "5 nines" reliability (99.999%) just doesn't cut the mustard. Our current mainstream testing practices are arguably not up to the job where our load-bearing code's concerned.
And, yes, when I say "current mainstream practices", I'm including TDD in that. I may test-drive, say, a graph search algorithm in a dozen or so test cases, but put that code in a SatNav system and ship it 1 million cars, and suddenly a dozen tests doesn't fill me with confidence.
Whenever I raise this issue, most developers push back. "None of our code is that critical", they argue. I would suggest that's true of most of their code. But even in pretty run-of-the-mill applications, there's usually a small percentage of code that really needs to not fail. For that code, we should consider going further with our tests.
The first generation of software testing involved running the program and seeing what happens when we enter certain inputs or click certain buttons. We found this to be time-consuming. It created severe bottlenecks in our dev processes. Code needs to be re-tested every time we change it, and manual testing just takes far too long.
So we learned to write code to test our code. The second generation of software testing automated test execution, and removed the bottlenecks. This, for the majority of teams, is the state of the art.
But there are always the test cases we didn't think of. Current practice today is to perform ongoing exploratory testing, to seek out the inputs, paths, user journeys and combinations our test suites miss. This is done manually by test professionals. When they find a failing test we didn't think of, we add it to our automated suite.
But, being manual, it's slow and expensive and doesn't achieve the kind of coverage needed to go beyond the Five 9's.
Which brings me to the Third Generation of Software Testing: writing code to generate the test cases themselves. By automating exploratory testing, teams are able achieve mind-boggling levels of coverage relatively cheaply.
To illustrate, here's a parameterised unit test I wrote when test-driving an algorithm to calculate square roots:
Imagine this is going to be integrated into a flight control system. Those five tests don't give me a warm fuzzy feeling about stepping on any plane using this code.
Now, I feel I need to draw attention to this: unit test fixtures are just classes and unit tests are just methods. They can be reused. We can compose new fixtures and new tests out of them.
So I can write a new parameterised test that, for example, generates a large number of random inputs - all unique - using a library called JCheck (a Java port of the Haskell QuickCheck library).
Don't worry too much about how this works. The important thing to note is that JCheck generates 1,000 unique random inputs. So, with a few extra lines of code we're jumped from 5 test cases to 1,000 test cases.
And with a single extra character, we can leap up a further order of magnitude by simply adding a zero to the number of cases. Or two zeros for 100x more coverage. Or three, or four. Whatever we need. This illustrates the potential power of this kind of technique: we can cover massive state spaces with relatively little extra code.
(And, for those of you thinking "Yeah, but I bet it takes hours to run" - when I ran this for 1 million test cases, it took just over 10 seconds.)
The eagle-eyed among you wil have noticed that I didn't reuse the exact same MathsTest fixture listed above. When test inputs are being generated, we don't have 1,000,000 expected results. We have to generalise our assertions. I adapted the original test into a property-based test, asserting a general property that every correct square root has to have.
Our property-based test can be reused in other ways. This test, for example, generates a range of inputs from 1 to 10 at increments of 0.01.
Again, adding coverage is cheap. Maybe we want to test from 1 to 10000 at increments of 0.001? Easy as peas.
(Yes, these tests take quite a while to run - but that's down to the way JUnit handles parameterised tests, and could be optimised.)
Let's consider a different example. Imagine we have a design with a selection of UI's (Web, Android, iOS, Windows), a selection of local languages (English, French, Chinese, Spanish, Italian, German), and a selection of output formats (Excel, HTML, XML, JSON) and we want to test that every possible combination of UI, language and output works.
There are 96 possible combinations. We could write 96 tests. Or we could generate all the possible combinations with a relatively straightforward bit of code like the Combiner I knocked up in a few hours for larks.
If we added another language (e.g., Polish), we'd go from 96 combinations to 112. It's hopefully easy to see how much easier it could be to evolve the design when the test cases are generated in this way, without dropping below 100% coverage. And, yes, we could take things even further and use reflection to generate the input arrays, so our tests always keep pace with the design without having to change the test code at all. There are many, many possibilities for this kind of testing.
To repeat, I'm not suggesting we'd do this for all our code - just for the code that really has to work.
Food for thought?
June 21, 2018
Adopting TDD - The Codemanship RoadmapI've been doing Test-Driven Development for 20 years, and helping dev teams to do it for almost as long. Over that time I've seen thousands of developers and hundreds of teams try to adopt this crucial enabling practice. So I've built a pretty clear picture of what works and what doesn't when you're adopting TDD.
TDD has a steep learning curve. It fundamentally changes the way you approach code, putting the "what" before the "how" and making us work backwards from the question. The most experienced developers, with years of test-after, find it especially difficult to rewrite their internal code to make it comfortable. It's like learning to write with your other hand.
I've seen teams charge at the edifice of this learning curve, trying to test-drive everything from Day #1. That rarely works. Productivity nosedives, and TDD gets jettisoned at the next urgent deadline.
The way to climb this mountain is to ascend via a much shallower route, with a more gentle and realistic gradient. You will most probably not be test-driving all your code in the first week. Or the first month. typically, I find it takes 4-6 months for teams to get the hang of TDD, with regular practice.
So, I have a recommended Codemanship Route To TDD which has worked for many individuals and teams over the last decade.
Week #1: For teams, an orientaton in TDD is a really good idea. It kickstarts the process, and gets everyone talking about TDD in practical detail. My 3-day TDD workshop is designed specifically with this in mind. It shortcuts a lot of conversations, clears up a bunch of misconceptions, and puts a rocket under the team's ambitions to succeed with TDD.
Week #2-#6: Find a couple of hours a week, or 20 minutes a day, to do simple TDD "katas", and focus on the basic Red-Green-Refactor cycle, doing as many micro-iterations as you can to reinforce the habits
Week #7-#11: Progress onto TDD-ing real code for 1 day a week. This could be production code you're working on, or a side project. The goal for that day is to focus on doing it right. The other 4 days of the week, you can focus on getting stuff done. So, overall, your productivity maybe only dips a bit each week. As you gain confidence, widen this "doing it right" time.
Week #12-#16: By this time, you should find TDD more comfortable, and don't struggle to remember what you're supposed to do and when. Your mind is freed up to focus on solving the problem, and TDD is becoming your default way of working. You'll be no less productive TDD-ing than you were befpre (maybe even more productive), and the code you produce will be more reliable and easier to change.
The Team Dojo: Some teams are keen to put their new TDD skills to the test. An exercise I've seen work well for this is my Team Dojo. It's a sufficiently challenging problem, and really works on those individual skills as well as collaborative skills. Afterwards, you can have a retrospective on how the team did, examining their progress (customer tests passed), code quality and the disciplie that was applied to it. Even in the most experienced experienced teams, the doj will reveal gaps that need addressing.
Graduation: TDD is hard. Learning to test-drive code involves all sorts of dev skills, and teams that succeed tell me they feel a real sense of achievement. It can be good to celebrate that achievement. Whether it's a party, or a little ceremony or presentation, when organisations celebrate the achievement with their dev teams, it shows reall commitment to them and to their craft.
Of course, you don't have to do it my way. What's important is that you start slow and burn your pancakes away from the spotlight of real projects with real deadlines. Give yourself the space and the safety to get it wrong, and over time you'll get it less and less wrong.
If you want to talk about adopting TDD on your team, drop me a line.
June 20, 2018
Design Principles Are The Key To A Testing PyramidOn the 3-day Codemanship TDD workshop, we discuss the testing pyramid and why optimising your test suites for fast execution is critical to achieving continuous delivery.
The goal with the pyramid is to be able to test as much of our software as possible as quickly as possible, so we can re-test and reassure ourselves that our code is shippable very frequently (i.e., continuously).
If our tests take hours to run, then we can only run them every few hours. Those are hours during which we don't know if the software's shippable.
So the bulk of our automated tests - the base of testing pyramid - should be fast-running "unit" tests. This typically means tests that have no external dependencies. (That's my working definition of "unit" test, for the purposes of making the argument for excluding file systems, databases, web services and the like from the majority of our tests.)
The purpose of our automated tests is to detect when code is broken. Every time we change a line of code, it can break the software. Therefore we need a test to catch every potential broken LOC.
The key to a good testing pyramid is to minimise the tests that have external dependencies, and the key to that is minimising the amount of code that has external dependencies.
I explain in the workshop how our design principles help us achieve this - and three in particular:
* Single Responsibility
* Don't Repeat Yourself
* Dependency Inversion
Take the example of a module that has a method which:
1. Formats a SQL string using data from a business object
2. Connects to a database to execute that query
3. Unpacks the response (recordset or array) into a business object
To test any part of this logic, we must include a trip to the database. If we break it up into 3 methods, each with a distinct responsibility, then it becomes possible to test 1. and 3. without including 2. That's a third as many "integration" tests.
In a similar vein, imagine we have data access objects, each like our module above. Each can format a SQL string using an object's data - e.g., CustomerDAO, InvoiceDAO, OrderDAO. Each connects to the database for fetch and save that object type's data. Each knows how to unpack the database response into the corresponding object type.
There's repetition in this design: connecting to the database. If we consolidate that code into a single module, we again reduce the number of integration tests we need.
Finally, we have to consider the call stack in which database connections are being made. Consider this poor design for a video rental system:
When we examine the code, we see that the methods that have direct external dependencies are not swappable within the overall call stack.
We cannot test pricing a video rental without paying a visit to the external video ratins service. We cannot test rentals without trips to the database, either.
To exclude these external dependencies from a set of tests for Rental, we have to turn those dependencies upside-down (make them swappable by dependency injection, basically).
This is often what people mean when they talk about "testable" code. In effect, it means there's enough "swappability" in our design to allow us to test the bulk of the logic by mocking or stubbing external dependencies. The win-win here is that we not only get a better-proportioned testing pyramid, we also get a more flexible design that can more readily accommodate change. e.g., getting video ratings from Rotten Tomatoes instead.)
June 10, 2018
Only This Week - Save Up To 65% On Codemanship Training
For one week only, we’re offering a veritable picnic of on-site code craft training at never-to-be repeated prices.
Save up to 65%, and train your developers in key skills like TDD, refactoring and OO design for as little as £40 per person per day. That’s full, action-packed hands-on days of code craft training.
Book any Codemanship training course before June 17th and save a whopping 50%. Book all four of our courses and save 65%. That’s a massive £12,000.
Find out more by visiting www.codemanship.com
June 8, 2018
The Entire Codemanship TDD Course Book - Absolutely FreeChanges are afoot with my code craft training and coaching company, Codemanship, and as part of that, I'm making my 222-page TDD course book available to download as a spiffy full-colour PDF for free.
It covers everything from the basics of Red-Green-Refactor, through software design principles to apply to your growing code, all the way up to advanced topics other TDD books and courses don't reach, like mutation testing, property-based and data-driven testing and Continuous Inspection. Many people who've read the book have commented on how straightforward and to-the-point it is. Shorter than most TDD/code craft books, but covers more, all in practical detail.
Of course, to get the best from the book, you should try the exercises.
Better still, try the exercises with the guy who wrote the book in the room to guide you.
April 28, 2018
8 Rules of Maintainable Code: A Handy Cut-Out-And-Keep ChartIf you've been on the Codemanship TDD training course, you may vaguely recall the first afternoon when we discuss design principles and how they can shape our code as it emerges.
I posit 8 principles that I ask participants to apply to the exercises, drawing from Simple Design, "Tell, Don't Ask" and S.O.L.I.D. These 8 factors are interrelated, and form a kind of virtuous - if somewhat complex - virtuous circle.
Code that's easier to change tends to be easier to test quickly. Fast-running tests make refactoring easier. Which helps us make our code easier to change. And around we go.
We don't do slides on the course (hoorah!), but I'm trying this morning to visualise these 8 principles and how they relate to each other in a single graphic.
There's the simple version:
And this is my latest iteration, to print off and hang on your toilet wall or put on a spiffy t-shirt. All non-profit uses are fine.
Going beyond maintainability, there's also a relationship between Clean code and reliability. Code that can be tested very quickly tends to have far fewer bugs. And code that's simpler and easier to understand is likely to get broken when we change it. So, it's more of a virtuous triangle, really.
April 6, 2018
Could Refactoring (& Refuctoring) Help Us Test Claims About Benefits of Clean CodeOne of the more frustrating things about teaching developers about code craft and "Clean Code" is the lack of credible hard evidence from respectable sources about the claimed benefits of it.
Not only does this make code craft a tougher sell to skeptics - and there was a time when I was one of them, decades ago - but it also calls into question whether the alleged benefits are real.
The biggest barrier to doing research in this area has been twofold:
1. The lack of data points. Most software engineering academic studies take data from a handful of projects. If this were, say, medical research, we'd never get our medicines on to the market.
2. The problem of comparing apples with apples. There are so many factors in software development that it's pretty much impossible to isolate one and rule out all others. Studies into the effects of adopting TDD can't account for the variations in experience and ability, for example. Teams new to TDD tend to have to deal with a steep learning curve before they become productive again.
When I consider some of the theories about what makes code harder to change - the central plank of the code craft thesis - some we have strong evidence to back them up, others... not so much.
I've had a bit of a brainwave in this area that might help researchers. Take a code base, then specifically vary it along a single dimension. e.g., refactor to remove duplication, or "refuctor" to introduce duplication (by inlining functions and modules). The resulting variants should all be functionally equivalent, but you could fine-grain the levels of variation. Then ask developers to make changes to the logic, and measure how much code had to be edited to achieve those changes. Automated acceptance tests would ensure that every change was logically equivalent.
I can easily envisage how refactoring (and it's evil twin, refuctoring) could be used to vary readability, complexity, duplication, coupling and cohesion (e.g., by moving methods between classes to introduce or eliminate feature envy), "swabbability" (e.g., by introducing dependency injection, or by reversing the dependency inversion by using explicit references to concrete implementations of interfaces) and a range of other code qualities. Automated tests could ensure that every variant still works exactly the same way on the outside.
And the tests themselves could be varied. For example, you could manipulate test suite execution time so that in some cases developers had to wait an hour for feedback, while others only need wait seconds for the same feedback.
I think I might be on to something. What do you think?
March 24, 2018
Code Craft: What Is It, And Why Do You Need It?One of my missions at the moment is to spread the word about the importance of code craft to organisations of all shapes and sizes.
The software craftsmanship (now "software crafters") movement may have left some observers with the impression that a bunch of prima donna programmers were throwing our toys out of the pram over "beautiful code".
For me, nothing could be further from the truth. It's always been clear in my mind - and I've tried to be clear when talking about craft - that it's not about "beautiful code", or about "masters and apprentices". It has always been about delivering software that works - does what end users need - and that can be easily changed to solve new problems.
I learned early on that iterating our designs was the ultimate requirements discipline. Any solution of any appreciable complexity is something we're unlikely to get right first time. That would be the proverbial "hole in one". We should expect to need multiple passes at it, each pass getting it less wrong.
Iterating software designs requires us to be able to keep changing the code over and over. If the code's difficult to change, then we get less throws of the dice. So there's a simple business truth here: the harder our code is to change, the less likely we are to deliver a good working solution. And, as times goes on, the less able we are to keep our working solution working, as the problem itself changes.
For me, code craft's about delivering the right thing in the short-to-medium term, and about sustaining the pace of innovation to keep our solution working in the long term.
The factors involved here are well-understood.
1. The longer it takes us to re-test our software, the bigger the cost of fixing anything we broke. This is supported by a mountain of evidence collected from thousands of projects over several decades. The cost of fixing bugs rises exponentially the longer they go undetected. So a comprehensive suite of good fast-running automated tests is an essential ingredient in minimising the cost of changing code. I see it being a major bottleneck for many organisations, and see the devastating effect long testing feedback loops can have on a business.
2. The harder it is to understand the code, the more likely it is we'll break it if we change it.
3. The more complex our code is, the harder it is to understand and the easier it is to break. More ways for it to be wrong, basically.
4. Duplication in our code multiplies the cost of changing common logic.
5. The more the different units* in our software depend on each other, the wider the potential impact of changing one unit on other units. (The "ripple effect").
6. When units aren't easily swappable, the impact of changing one unit can break other modules that interact with it.
* Where a "unit" could be a function, a module, a component, or a service. A unit of reusable code, essentially.
So, six key factors determine the cost of changing code:
* Test Assurance & Execution Time
* Abstraction of Dependencies
Add to these, a few other factors can make a big difference.
Firstly, the amount of "friction" in the delivery pipeline. I'd classify "friction" here as "steps in releasing or deploying working software into production that take a long time and/or have a high cost". Manually testing the software before a release would be one example of high friction. Manually deploying the executable files would be another.
The longer it takes, the more it costs and the more error-prone the delivery process is, the less often we can deliver. When we deliver less often, we're iterating more slowly. When we iterate more slowly, we're back to my "less throws of the dice" metaphor.
Frequency of releases is directly related also to the size of each release. Releasing changes in big batches has other drawbacks, too. Most importantly - because software either works as a whole or it doesn't - big releases incorporating many changes present us with an all-or-nothing choice. If change X is wrong, we now have to carefully rework that one thing with all the other changes still in place. So much easier to do a single release for change X by itself, and if it doesn't work, roll it back.
Another aside factor to consider is how easy it is to undo mistakes if necessary. If my big refactoring goes awry, can I easily get back to the last good state of the code? If a release goes pear-shaped, can we easily roll it back to a working version, with minimal disruption to our end customer?
Small releases help a lot in this respect, as does Version Control and Continuous Integration. VCS and CI is like seatbelts for programmers. It can significantly reduce lost time if we have a little accident.
So, I add:
* Small & Frequent Releases
* Frictionless Delivery Processes (build-test-deploy automation)
* Version Control
* Continuous Integration
To my working definition of "code craft".
Noted that there's more to delivering software than these things. There's requirements, there's UX, there's InfoSec, there's data management, and a heap of other considerations. Which is why I'm clear to disambiguate code craft and software development.
Organisations who depend on software need code that works and that can change and stay working. My belief is that anyone writing software for a living needs to get to grips with code craft.
As software continues to "eat the world", this need will grow. I've watched $multi-billion on their knees because their software and systems couldn't change fast enough. As the influence of code spreads into every facet of life, our ability to change code becomes more and more a limiting factor on what we can achieve.
To borrow from Peter McBreen's original book on software craftsmanship, there's a code craft imperative.
March 11, 2018
Proposing The xUnit "Meta-Kata"A while back I ruminated on refactoring old-fashioned "test it all with a main method" test code (like we did back in the day) into the xUnit unit test framework pattern.
It occurred to me that this might make a good code kata. TDD a well-known kata (e.g., FizzBuzz, Bowling Game, "Rock, Paper, Scissors"), but starting without a unit testing framework doing it all in a single main method.
As the code evolves, refactor the test code to remove code smells like methods testing more than one thing, classes testing more than one "unit" or "feature" (depending on how you roll), high-level modules that depend directly on low-level test fixtures, multiple tests being different examples of the same test, and so on.
It could be an interesting exercise in discovering frameworks. Perhaps it'll take multiple katas for a complete xUnit framework to reveal itself, just as so many great and useful frameworks don't really take shape until they've been reused a few times on other problems.
It might also be an exercise in applying the TDD discipline on two problems simultaneously; a test of your Craft Fu.
Really, it would be a kata within a kata: a meta-kata, if you like. And as such, I think it could be really interesting and rather challenging. I'll hopefully be giving it a go - well, probably a few goes - when I pair with my "apprentice" Will Price soon. When I think we've cracked it, I'll post a screencast and the code (with version history, so you can play it back).