February 2, 2015
Stepping Back to See The Bigger PictureSomething we tend to be bad at in the old Agile Software Development lark is the Big Picture problem.
I see time and again teams up to their necks in small details - individual users stories, specific test cases, individual components and classes - and rarely taking a step back to look at the problem as a whole.
The end result can often be a piecemeal solution that grows one detail at a time to reveal a patchwork quilt of a design, in the worst way.
While each individual piece may be perfectly rational in its design, when we pull back to get the bird's eye view, we realise that - as a whole - what we've created doesn't make much sense.
In other creative disciplines, a more mature approach is taken. Painters, for example, will often sketch out the basic composition before getting out their brushes. Film producers will go through various stages of "pre-visualisation" before any cameras (real or virtual) star rolling. Composers and music producers will rough out the structure of a song before worrying about how the kick drum should be EQ'd.
In all of these disciplines, the devil is just as much in the detail as in software development (one vocal slightly off-pitch can ruin a whole arrangement, one special effect that doesn't quite come off can ruin a movie, one eye slightly larger than the other can destroy the effect of a portrait in oil - well, unless you're Picasso, of course; and there's another way in which we can be like painters, claiming that "it's mean to be like that" when users report a bug.)
Perhaps in an overreaction to the Big Design Up-Front excesses of the 1990's, these days teams seem to skip the part where we establish an overall vision. Like a painter's initial rough pencil sketch, or a composer's basic song idea mapped out on an acoustic guitar, we really do need some rough idea of what the thing we're creating as a whole might look like. And, just as with pencil sketches and song ideas recorded on Dictaphones, we can allow it to change based on the feedback we get as we flesh it out, but still maintain an overall high-level vision. They don't take long to establish, and I would argue that if we can't sketch out our vision, then maybe - just maybe - it's because we don't have one yet.
Another thing they do that I find we usually don't (anywhere near enough) is continually step back and look at the thing as a whole while we're working on the details.
Watch a painter work, and you'll see they spend almost as much time standing back from the canvas just looking at it as they do up close making fine brush strokes. They may paint one brush stroke at a time, but the overall painting is what matters most. The vase of flowers may be rendered perfectly by itself, but if its perspective differs from its surroundings, it's going to look wrong all the same.
Movie directors, too, frequently step back and look at an edit-in-progress to check that the footage they've shot fits into a working story. They may film movies one shot at a time, but the overall movie is what matters most. An actor may perform a perfect take, but if the emotional pitch of their performance doesn't make sense at that exact point in the story, it's going to seem wrong.
When a composer writes a song, they will often play it all the way through up to the point they're working on, to see how the piece flows. A great melody or a killer riff might seem totally wrong if it's sandwiched in between two passages that it just doesn't fit in with.
This is why a recommend to the developers that they, too, routinely take a step back and look at how the detail they're currently working on fits in with the whole. Run the application, walk through user journeys that bring you to that detail and see how that feels in context. It may seem like a killer feature to you, but a killer feature in the wrong place is just a wrong feature.
November 25, 2014
Continuous Inspection II - Planning & Executing CInspIn this second blog post about Continuous Inspection (CInsp, for short), I want to look at how we might manage the CInsp process to get the most value from it.
While some develoment teams are now using CInsp tools to analyse their code to get early warnings about code quality problems when they're easier and cheaper to fix, it's fair to say that this area of the develoment discipline has to date evaded the principles that we apply to other kinds of requirements.
Typically, as a kind of work, CInsp is ad hoc, unplanned, untracked and most teams who do it have only a very vague idea of what kind of cost it has and what kind of benefits they're reaping from it.
CInsp is rarely prioritised, leaving the field wide open to waste a lot of time and effort on activities that add little or no value.
Non-functional requirements obey the same laws as functional ones, which is why we need to attack them using the same principles and techniques.
In this post, I want to examine how we plan and execute CInsp on projects starting from scratch. (In a future post, I'll talk about applying CInsp to existing code bases with a build-up of code quality issues.)
Continuous Inspection Requirements.
There are an infinite number of properties we could look for in our code, but some have value in finding and most don't. Rather than waste our time arbitrarily searching our code for "stuff", it's important we have a clear idea of what it is we're looking for and why.
Extreme Programming, for example, has a perfectly usable mechanism for describing the things we want to inspect for, and the benefits of catching those kinds of code quality problems early.
A Code Quality Story is a non-functional user story that briefly summarises a code quality "bug" we wish to avoid and the pay-off we might expect if we can avoid introducing it into our code.
Note first of all that I've chosen here to use a blue index card. This might be in a system where we write functional user stories on green cards, report bugs on red cards, and record other outcomes - "miscellaneous tasks", like setting up the build and implementing code quality gates - on blue cards.
Why do this? Well, I've found it very useful to know roughly how much of a team's time is split between delivering working features, fixing bugs (ideally, zero time), and "shaving yaks" when the yaks being shaved are sufficiently large and not part of the work of delivering specific features.
The importance of the effort split becomes apparent as time goes by and the software evolves. A healthy project is one where the proportion of effort devoted to delivering working features remains relatively constant. What typically happens on teams who set out at an unsustainable pace is that they begin development with their time devoted mostly to the green cards, and after a few months most of their time is spent tackling red cards and making a lot less progress on new features. This is a good indicator of the rising cost of change we're seeking to avoid, so we can sustain the pace of development and deliver value for longer. This information will help us better judge how well-spent the time devoted to things like CInsp is.
So we have a placeholder for our code quality requirement in the form of a blue index card. What next?
Planning Continuous Inspection
This is where I, and a lot of teams, have gone wrong in the past. What we should never, ever do is allow the customer to choose when and whether we tackle non-functional requirements. And in "customer" I include proxy customers like business analysts and project managers. The overwhelmingly common experience of development teams is that purely technical issues, like code quality, get sidelined by non-technical stakeholders.
We must not give them the chance to drop our Feature Envy story in favour of a story about, say, sorting columns in an HTML table if we strongly believe, as professionals, that avoiding Feature Envy is important. If, as the evidence suggests, care taken over code quality helps to maintain productivity and deliver greater value over time, then we risk presenting customers with a confusing false dichotomy between work that enhances quality and work that directly delivers working features.
The analogy I use is to pretend we're running a restaurant using the planning practices of Extreme Programming.
Every job that needs doing gets written on a card, and placed into a backlog of outstanding work. There will be user stories like "Take table 3's order" and "Serve french fries and beer to table 7" and "Get the bill for table 12". These are stories about work that will make the restaurant money.
There will also be stories like "Wash the dishes in the sink" and "Clean out pizza oven" and "Repaint sign over door". These are about tasks that cost money, but don't directly bring in revenue by themselves.
If we allowed our restaurant's shareholders - who themselves have never worked in a restaurant, but they have a stake in it as a business - to prioritise what stories get done at the expense of others in a world where backlogs always outweigh the available time and resources, then there's a very real danger that the kitchen will rarely get cleaned, the sign above the door will fade until nobody can see it, and we'll run out of clean plates halfway through service.
The temptation for teams who are driven solely by the priorities of non-technical stakeholders is that non-functional issues like code quality will only get tackled when a crisis emerges that blocks progress on functional requirements. i.e., we don't wash up until we run out of plates, or we don't clean the kitchen until the inspector shuts us down, or we don't repaint the sign until the customers have stopped coming in.
One thing we've learned about writing software is that it's cheaper and easier to tackle problems proactively and catch them earlier. Sadly, too many teams are left lurching from one urgent crisis to the next, never getting the chance to get ahead of the issues.
For this reason, I strongly advise against involving non-technical stakeholders in planning CInsp. (As well as other technical work.)
Now put yourself in the diner's shoes: you pick up the menu, and every dish lists all of the tasks restaurant staff have to do in order to deliver it. Let's say we charge £11 for fish and chips, with a clean grill, mopping the floors, cashing up that evening, doing the accounts, getting up early to take delivery of fresh fish, and so on.
1. If we hadn't told them, would the diner even care?
2. If we make it the diner's business, are we inviting them to negotiate the price of the fish and chips down by itemising what goes in to running the restaurant? ("I'll have the fish & chips, but I'm not paying for your trainee chef's college course" etc)
The world is full of work that needs doing, but nobody thinks they should pay for. In order for the world to keep turning, for fish & chips to appear on our dining tables, this work has to get done one way or another, and it has to be paid for.
The way a restaurant squares this circle is to build it into the cost of the meal and to not present diners with a choice. Their choice is simple: don't like the price, don't order the dish.
Likewise in software development, there's a universe of tasks that need doing that do not directly end with a working feature being delivered to the customer's table. We must build this work into the price ("feature X will take 3 days to deliver") and avoid presenting the customer with bewildering choices that, in reality, aren't choices at all.
So planning Continuous Inspection is something that happens within the team among technical stakeholders who understand the issues and will be doing the work. This is good advice for any non-functional requirements, be they about build automation, internal training or hiring developers. This is just "stuff that has to happen" so we can deliver working software reliably, economically and sustainably.
The key thing, to avoid teams disappearing up their own backsides with the technical stuff, is to make sure we're all absolutely clear about why we're doing it. Why are we automating the build? Why are we writing a tool that generates code? Why are we sending half the team to the Software Craftsmanship conference? (Some companies send entire teams.) And the answer should always be something of value to the customer, even if that value might not be realised for months or years.
In practice, we have planning meetings - especially in the early stages of a project - that are for technical stakeholders only. Lock the doors. Close the blinds. Don't tell the boss. (I have literally experienced running around offices looking for rooms where the developers can have these discussions in private, chased by the project manager who insists on sitting in. "Don't mind me. I won't interfere." Two seconds later...)
Such meetings give teams a chance to explicitly discuss code quality and to thrash out what they mean by "good code" and "bad code" and establish a shared set of priorities over code quality. It's far better to have these meetings - and all the inevitable disagreements - at the start, when we can take steps to prevent issues, than to have them later when we can only ask "what went wrong?"
Executing Continuous Inspection
On new software, the effort in Continuous Inspection tends to be front-loaded, and with good reason.
As I've mentioned a few times already, it tends to be far cheaper to tackle code quality "bugs" early - the earlier the better. This means that adding new code quality requirements later in development tends to catch problems when they're much more expensive to fix, so it makes sense to set the quality bar as high as we can at the start.
There's good news and there's bad news. First, the bad news: on a new project, from a standing start, it's going to take considerable effort to get automated code inspections in place. It will vary greatly, depending on the technology stack, availability of tools, experience levels in the team, and so on. But it's not going to take an afternoon. So you may be faced with having to hide a big chunk of effort from non-technical stakeholders if you attempt to start development (from their perspective, when they're actively involved) at the same time as putting CInsp in place. (Same goes for builds, CI, and a raft of other stuff that we need to get up and running early on.)
Another very strong recommendation from me: have at least one iteration before you involve the customer. Get the development engine running smoothly before you wind down the window and shout "Where to, guv'nor?" They may be less than impressed to discover that you just need to build the engine before you can set off. Delighting customers is as much about expectations as it is about actual delivery.
Going back to the restaurant analogy, consider why restaurants distinguish between "service" and "preparation". Service may start at 6pm, but the chefs have probably been there since 9am getting things ready for that. If they didn't, then those first orders might take hours to reach the table. Too many development teams attempt the equivalent of starting service as the ingredients are being delivered to the kitchen. We need to do prep, too, before we can start taking orders.
Now, for the good news: the kinds of code quality requirements we might have on one, say, JEE project are likely to be similar on another JEE project. CInsp practitioners tend to find that they can get a lot of reuse out of code quality gates they've already developed for previous projects. So, over months and years, the overall cost of getting CInsp up and running tends to decrease quite significantly. If your technology stack remains fairly stable over the years, you may well find that getting things up and running can eventually become an almost push-button process. It takes a lot of investment to get there, though.
Code Quality stories work the same way as user stories in their execution. We plan what stories we're going to tackle in the current timebox in the same way. We tackle them in pairs, if possible. We treat them purely as placeholders to have a conversation with the person asking for each story. And, most importantly, we agree...
Continuous Inspection Acceptance Tests
Going back to our Feature Envy code quality story, what does the developer who write that story mean by "Feature Envy"?
Here's the definition from Martin Fowler's Refactoring book:
"A classic [code] smell is a method that seems more interested in a class other than the one it is in. The most common focus of the envy is the data."
It's all a bit handwavy, as is usually the case with software design wisdom. A human being using their intelligence, experience and judgement might be able to read this, look at some code and point to things that seem to them to fit the description.
Programming a computer to do it, on the other hand...
This is where we can inhabit our customer's world for a little while. When we ask our customer to precisely decribe a business rule, we're putting them on the spot every bit as much as a computable definition of Feature Envy might put me and you on the spot. In cold, hard, computable terms: we don't quite know what we mean.
When the business problem we're solving is about, say, mortgages or video rentals or friend requests, we ask the customer for examples that illustrate the rule. Using examples, we can establish a shared vocabulary - a language for expressing the rule - explore the boundaries, and pin down a precise computable understanding of it (if there is one.)
We shouldn't be at all surprised that this technique also works very well for rules about our code. Ask the owner of a code quality story to track down some classic examples of code that breaks the rule, as well as code that doesn't (even if it looks at first glance like it might).
This is where the real skill in CInsp comes into play. To win at Continuous Inspection, development teams need to be skilled as reasoning about code. This is not a bad skill for a developer to have generally. It helps us communicate better, it helps us visualise better, it makes us better at design, at refactoring, at writing tools that work with code. Code is our domain model - the business objects of programming.
Using our code reasoning skills, applied to examples that will form the basis of acceptance tests, we can drive out the design of the simplest tool possible that will sound the alarm when the "bad" examples are considered, while silently allowing the "good" examples to pass through the quality gate.
As with functional user stories, we're not done until we have a working automated quality gate that satisfies our acceptance tests and can be applied to new code straight away.
In the next blog post, we'll be rolling up our sleeves with an example Continuous Inspection quality gate, implementing it using a variety of tools to demonstrate that there's often more than one way to skin the code quality cat.
November 22, 2014
Continuous Inspection I - Why Do We Need It?This is the first of a series of posts about Continuous Inspection. My goals here is to give you something to think about, rather than to present a complete hands-on guide. The range (and maturity) of tools and techniques we can apply to Continuous Inspection (I'll call it CInsp from now on to save a few keystrokes) is such that I could write 1,000 blog posts and still not cover it all. So here I'll just focus on general CInsp principles and illustrate with cherrypicked examples.
In this first post, I want to summarise what I mean by "Continuous Inspection" and argue that there's a real need for it on most software development teams.
Contininuous Inspection is the practice of - and stop me if I'm getting too technical here - continuously inspecting your code to detect non-functional issues in the software.
CInsp is just another kind of Continuous Testing, which is a cornerstone of Continuous Delivery. To have our software always in a shippable state, we must take steps to assure ourselves that the software is always working.
If we follow the thinking behind continuous testing (and re-testing) of our software to check that it still works, the benefit is that we never stray more than a few minutes from having something we could ship if the business wanted us to.
To date, the only practical way we've found to achieve Continuous Testing is to automate those tests as much as possible, so they can be run quickly and economically. If it takes you 2 weeks to re-test your software, then after each change you make to the code, you are at least 2 weeks away from knowing if the software still works. Manual testing makes Continuous Delivery impractical.
In recent years, automated testing - and especially automated unit testing - has grown in popularity, and the effects can be seen in teams delivering more reliably and more sustainably as a result.
But only to a point.
What I've observed across hundreds of teams over the last decade or more is that, even with high levels of automated testing, the pace of delivery still slows to unacceptable levels.
In order to sustain the pace of change, the code itself needs to remain open to change. Being able to quickly regression test our software is a boon in this respect, no doubt. But it doesn't address the whole picture.
There are other things that can hamper change in our code. If the code's complicated, for example, it will be more likely to break when we change it. If there's duplication in our code - if we've been a bit trigger-happy with Copy+Paste - then that can multiply the cost of making a change. If we've not paid attention to the dependencies in our code, small changes can cause big ripples through the code and amplify the cost.
As we make progress in delivering functionality we tend also to make a mess inside the software, and that mess can get in our way and impede future progress. To maintain the pace of innovation over months and years and get the most out of our investment over the lifetime of a software product, we need to keep our code clean.
Experienced developers view design issues that impede progress in their code as bugs, and they can be every bit as serious as bugs in the functionality of the software.
And, just like functional bugs, these code quality bugs (often referred to as "code smells", because they're indicatice of your code "rotting" as it grows) have a tendency to get harder and more expensive to fix the longer we leave them.
Duplication has a tendency to grow, as does complexity. We build more dependencies on top of our dependencies. Switch statements get longer. Long parameter lists get longer. Big classes get bigger. And so on.
Here's what I've discovered form examining hundreds of code bases over the years: code smells that get committed into the code are very likely to remain for the lifetime of the software.
There seems to be a line that once we've crossed it, our mistakes are likely to live forever (and impede us forever). From observation, I've found that this line is moving on.
In the Test-driven Development cycle, for example, I've seen that when developers move on to the next failing test, any code smells they leave behind will likely not get addressed later. In programming, "later" is a distant and alien land where all our little TO-DO's never get done. "Later" might as well be "Narnia".
Even more so, when developers commit their code to a shared repository, at that point code smells "petrify", and remain forever trapped in the amber of all the other code that surrounds them. 90% of code smells introduced in committed code never get fixed.
This is partly because most teams have no processes for identifying and addressing code quality problems. But even the ones who do tend to find that their approach, while better than nothing, is not up to the task of keeping the code as clean as it needs to be to maintain the pace of change the customer needs.
Why? Well, let's look at the kinds of techniques teams these days use:
1. Code Reviews
There's a joke that goes something like this: "Ask a developer what's wrong with a line of code, and she'll give you a list. Ask her what's wrong with 500 lines of code, and she'll tell you it's fine."
Code reviews have a tendency to store up large amounts of code - potentially containing large numbers of issues - for consideration. The problem here is seeing the wood for the trees. A lot of issues get overlooked in the confusion.
But even if code reviews identified all of the code quality issues, the economics of fixing those issues is working against us. Fixing bugs - functional or non-functional - tends to get exponentially more expensive the longer we leave them in the code, and for precisely the same reasons (longer feedback cycles).
In practice, while rigorous code reviews would be a step forward for many teams who don't do them at all, they are still very much shutting the stable door after the horse has bolted.
2. Pair Programming
In theory, pair programming is a continuous code review where the "navigator" is being especially vigilent to code quality issues and points them out as soon as they spot them. In some cases, this is pretty much how it works. But, sad to say, in the majority of pairs, code quality issues are not high on anyone's agenda.
This is for two good reasons: firstly, most developers are not all that aware of code smells. They don't figure high in our list of priorities. Code quality isn't sexy, and doesn't get you hired at IronicBeards.com.
Secondly, with the best will in the world, people have limitations. When Codemanship does pairing to assess a developer's skill level in certain practices, the level of focus required on what the other person's doing is really quite intense. You don't take your eye off the screen in case you miss something. But there are dozens of code smells we need to be vigilant for, and even with all my experience and know-how, I can't catch them all. My mind will have to skip between lots of competing concerns, and when my remaining brain cells are tied up trying to remember how to do something with Swing, I'm likely to take my eye off the code quality ball. It's also very difficult to maintain that level of focus hour after hour, day-in and day-out. It hurts my brain.
Pair programming, as an approach to guarding against code smells, is good when it's done well. But it's not that good that we can be assured code written in this way will be maintainable enough.
3. Design Authorities
By far the least effective route to ensuring code quality is to make it someone else's job.
Hiring architects or "technical design authorities" suffers from all the shortcomings of code reviews and pair programming, and then adds a big bunch of new shortcomings.
Putting aside the fact that almost every architect or TDA I've ever met has been mostly focused on "the big picture", and that I've seen 1,000-line switch statements waved through the quality gate by people obsessing over whether classes implement certain interfaces they've prescribed, turning design authorities into design quality testers never seems to end well. Who wants to spend their day scouring other people's code for examples of Feature Envy?
I'll say no more, except to summarise by observing that the code I've seen produced by teams with dedicated design authorities counts amongs the worst for code quality.
4. Coding Standards
In theory, a team's coding standards are a codification of what we all agree we mean by "good code".
Typically, these are written down in documents that nobody ever reads, and suffer from the same practical drawbacks as architecture documents and company mission statements. They're aspirational affirmations at best. But, in practice, everybody just ignores them.
Even on those more disciplined teams that try to adhere to coding standards, they still have major drawbacks, all relating back to things we've already discussed.
Firstly, coding standards are a list of "stuff" we need to be thinking about along with all the other "stuff" we have to think about. So they tend to take a lower priority and often get overlooked.
Secondly, as someone who's studied a lot of coding standards documents (and what joy they bring!), they have a tendency to be both arbitrary and by no means universally agreed upon. Often they've been written by some kind of design or development authority, usually with little or no input from the team they're being imposed on. It's rare for issues that affect maintainability to be addressed in a coding standards document. Programmers are a funny bunch: we care deeply about some weird stuff while Elephants In The Room creep in without being questioned and sit on us. Naming conventions, therefore, have little relation to how easy the code will be to read and understand. And it's rare to see duplication, dependencies, complexity and so on even being hinted at. As long as all your instances have names beginning with obj and all your private member variables beging with "m_", the gods of code goodness will be appeased.
And then there's the question of how and when we enforce coding standards. And we're back to the hard physics of software development - time, money and cost. Knowing what we should be looking for is only the tip of the code quality iceberg.
What's needed is the ability to do code reviews so freqently, and do them in a way that's so effective, that we never stray more than a few minutes from clean code. For this, we need code reviewers who miss very little, who are constantly looking at the code, and who never get tired or distracted.
For that, thankfully, we have computers.
Program code is like any other domain model; we can write programs to reason about the design of other programs, expressed in terms of the structure of code itself.
Code quality rules are just like any other computable business rules. If the rule is that a block of code in one class should not make copious references to features in another class ("Feature Envy"), it's possible to write an automated test that reads code and looks at those references to determine if that block of code is in the right place.
Let's illustrate with a technology example. Imagine we're working in Java in, say, Eclipse. We could write code for a plug-in that, whenever we make a change to the code document we're working on, reads the code's Abstract Syntax Tree (basically, a code DOM) and does a calculation for the ratio of internal and external dependencies in that Java method we just changed. If the ratio is too low, it could flag it up as a warning while we're writing the code.
The computational power of computers is such today that this sort of continuous background code reviewing is practically possible, and there have already been some early attempts to create just such plug-ins.
In the article I wrote a few years ago for Visual Studio Journal, Ever-decreasing Cycles, I speculate about the impact such short code quality feedback loops might have on the economics of development.
It's my belief that, just as continuous automated unit testing has had a profound effect on the "bottom line" of software development for many teams and businesses, so too would Continuous Inspection.
In the next blog post, I'll talk about the CInsp process and look at practical ways of managing CInsp requirements, test automation and how we action the code quality problems it can throw up.
October 16, 2014
Dear Aunty Jason, I'm An Architect Who Wants To be Relevant Again..."Dear Aunty Jason,
I am a software architect. You know, like in the 90's.
For years, my life was great. I was the CTO's champion in the boardroom, fighting great battles against the firebreathing dragons of Ad Hoc Design and the evil wizards of Commercial Off-The-Shelf Software. The money was great, and all the ordinary people on the development teams would bow down to me and call me 'Sir'.
Then, about a decade ago, everything changed.
A man called Sir Kent of Beck wrote a book telling the ordinary people that dragons and wizards don't actually exist, and that the songs I'd been singing of my bravery in battle had no basis in reality. 'You', Sir Kent told them, 'are the ones who fight the real battles.'
And so it was that the ordinary people starting coming up with their own songs of bravery, and doing their own software designs. If a damsel in distress needed saving, they would just save her, and not even wait for me - their brave knight - to at the very least sit astride my white horse and look handsome while they did it.
I felt dejected and rejected; they didn't need me any more. Ever since then, I have wandered the land, sobbing quietly to myself in my increasingly rusty armour, looking for a kingdom that needs a brave knight who can sings songs about fighting dragons and who looks good on a horse.
Do you know of such a place?
Sir Rational of Rose."
I can sympathise with your story, Sir Rational. I, too, was once a celebrated knight of the UML Realm about whom many songs were sung of battles to the death with the Devils of Enterprise Resource Planning. And, I, too, found myself marginalised and ignored when the ordinary folk started praying at the altar of Agile.
And, I'm sorry to say, there are lots of kingdoms these days where little thought is given to design or to architecture. You can usually spot them from a distance by their higgledy-piggledy, ramshackle rooftops, and the fact that they dug the moat inside the city walls.
But, take heart; there are still some kingdoms where design matters and where architecture is a still a "thing". Be warned, though, that many of them, while they may look impressive from the outside, are in fact uninhabited. Such cities can often be distinguished by gleaming spires and high white walls, beautiful piazzas and glistening fountains, and the fact that nobody wants to live there because they built it on the wrong place.
You don't want to go to either of those kinds of kingdom.
Though very rare, there are a handful of kingdoms where design matters and it matters that people want to live in that design. And in these places, there is a role for you. You can be relevant once again. Once again, people will sing songs of your bravery. But you won't be fighting dragons or wizards in the boardroom. You'll be a different kind of knight.
You'll be fighting real battles, embedded in the ranks of real soldiers. Indeed, you'll be a soldier yourself. And you'll be singing songs about them.
September 17, 2014
The 4 C's of Continuous DeliveryContinuous Delivery has become a fashionable idea in software development, and it's not hard to see why.
When the software we write is always in a fit state to be released or deployed, we give our customers a level of control that is very attractive.
The decision when to deploy becomes entirely a business decision; they can do it as often as they like. They can deploy as soon as a new feature or a change to an existing feature is ready, instead of having to wait weeks or even months for a Big Bang release. They can deploy one change at a time, seeing what effect that one change has and easily rolling it back if it's not successful without losing 1,001 other changes in the same release.
Small, frequent releases can have a profound effect on a business' ability to learn what works and what doesn't from real end users using the software in the real world. It's for this reason that many, including myself, see Continuous Delivery as a primary goal of software development teams - something we should all be striving for.
Regrettably, though, many software organisations don't appreciate the implications of Continuous Delivery on the technical discipline teams need to apply. It's not simply a matter of decreeing from above "from now on, we shall deliver continuously". I've watched many attempts to make an overnight transition fall flat on their faces. Continuous Delivery is something teams need to work up to, over months and years, and keep working at even after they've achieved it. You can always be better at Continuous Delivery, and for the majority of teams, it would pay dividends to improve their technical discipline.
So let's enumerate these disciplines; what are the 4 C's of Continuous Delivery?
1. Continuous Testing
Before we can release our software, we need confidence that it works. If our aim is to make the software available for release at a moment's notice, then we need to be continuously reassuring ourselves - through testing - that it still works after we've made even a small change. The secret sauce here is being able to test and re-test the software to a sufficiently high level of assurance quickly and cheaply, and for that we know of only one technical practice that seems to work: automate our tests. It's for this reason that a practice like Test-driven Development, which leaves behind a suite of fast-running automated tests (if you're doing TDD well) is a cornerstone of the advice I give for transitioning to Continuous Delivery.
2. Continuous Integration
As well as helping us to flag up problems in integrating our changes into a wider system, CI is also fundamental to Continuous Delivery. If it's not in source control, it's going to be difficult to include it in a release. CI is the metabolism of software development teams, and a foundation for Continuous Delivery. Again, automation is our friend here. Teams that have to manually trigger compilation of code, or do manual testing of the built software, will not be able to integrate very often. (Or, more likely, they will integrate, but the code in their VCS will likely as not be broken at any point in time.)
3. Continuous Inspection
With the best will in the world, if our code is hard to change, changing it will be hard. Code tends to deteriorate over time; it gets more complicated, it fills up with duplication, it becomes like spaghetti, and it gets harder and harder to understand. We need to be constantly vigilant to the kind of code smells that impede our progress. Pair Programming can help in this respect, but we find it insufficient to achieve the quality of code that's often needed. We need help in guarding against code smells and the ravages of entropy. Here, too, automation can help. More advanced teams use tools that analyse the code and detect and report code smells. This may be done as part of a build, or the pre-check-in process. The most rigorous teams will fail a build when a code smell is detected. Experience teaches us that when we let code quality problems through the gate, they tend to never get addressed. Implicit in ContInsp is Continuous Refactoring. Refactoring is a skill that many - let's be honest, most - developers are still lacking in, sadly.
Continuous Inspection doesn't only apply to the code; smart teams are very frequently showing the software to customers and getting feedback, for example. You may think that the software's ready to be released, because it passes some automated tests. But if the customer hasn't actually seen it yet, there's a significant risk that we end up releasing something that we've fundamentally misunderstood. Only the customer can tell us when we're really "done". This is a kind of inspection. Essentially, any quality of the software that we care about needs to be continuously inspected on.
4. Continuous Improvement
No matter how good we are at the first 3 C's, there's almost always value in being better. Developers will ask me "How will we know if we're over-doing TDD, or refactoring?", for example. The answer's simple: hell will have frozen over. I've never seen code that was too good, never seen tests that gave too much assurance. In theory, of course, there is a danger of investing more time and effort into these things than the pay-offs warrant, but I've never seen it in all my years as a professional developer. Sure, I've seen developers do these things badly. And I've seen teams waste a lot of time because of that. But that's not the same thing as over-doing it. In those cases, Continuous Improvement - continually working on getting better - helped.
DevOps in particular is one area where teams tend to be weak. Automating builds, setting up CI servers, configuring machines and dealing with issues like networking and security is low down on the average programmer's list of must-have skills. We even have a derogatory term for it: "shaving yaks". And yet, DevOps is pretty fundamental to Continuous Delivery. The smart teams work on getting better at that stuff. Some get so good at it they can offer it to other businesses as a service. This, folks, is essentially what cloud hosting is - outsourced DevOps.
Sadly, software organisations who make room for improvement are in a small minority. Many will argue "We don't have the time to work on improving". I would argue that's why they don't have the time.
September 16, 2014
Why We IterateSo, in case you were wondering, here's my rigorous and highly scientific process for buying guitars...
It starts with a general idea of what I think I need. For example, for a couple of years now I've been thinking I need an 8-string electric guitar, to get those low notes for the metalz.
I then shop around. I read the magazines. I listen to records and find out what guitars those players used. I visit the manufacturers websites and read the specifications of the models that might fit. I scout the discussion forums for honest, uncensored feedback from real users. And gradually I build up a precise picture of exactly what I think I need, down to the wood, the pickups, the hardware, the finish etc.
And then I go to the guitar shop and buy a different guitar.
Why? Because I played it, and it was good.
Life's full of expectations: what would it be like to play one of Steve Vai's signature guitars? What would it be like to be a famous movie star? What would it be like to be married to Uma Thurman?
In the end, though, there's only one sure-fire way to know what it would be like. It's the most important test of all. Sure, an experience may tick all of the boxes on paper, but reality is messy and complicated, and very few experiences can be completely summed up by ticks in boxes.
And so it goes with software. We may work with the customer to build a detailed and precise requirements specification, setting out explicitly what boxes the software will need to tick for them. But there's no substitute for trying the software for themselves. From that experience, they will learn more than weeks or months or years of designing boxes to tick.
We're on a hiding to nothing sitting in rooms trying to force our customers to tell us what they really want. And the more precise and detailed the spec, the more suspicious I am of it. bottom line is they just don't know. But if you ask them, they will tell you. Something. Anything.
Now let me tell you how guitar custom shops - the good ones - operate.
They have a conversation with you about what guitar you want them to create for you. And then they build a prototype of what you asked for. And then - and this is where most of the design magic happens - they get you to play it, and they watch and they listen and they take notes, and they learn a little about what kind of guitar you really want.
Then they iterate the design, and get you to try that. And then rinse and repeat until your money runs out.
With every iteration, the guitar's design gets a little bit less wrong for you, until it's almost right - as right as they can get it with the time and money available.
Custom guitars can deviate quite significantly from what the customer initially asked for. But that is not a bad thing, because the goal here is to make them a guitar they really need; one that really suits them and their playing style.
In fact, I can think of all sorts of areas of life where what I originally asked for is just a jumping-off point for finding out what I really needed.
This is why I believe that testing - and then iterating - is most importantly a requirements discipline. It needs to be as much, if not more, about figuring out what the customer really needs as it is about finding out if we delivered what they asked for.
The alternative is that we force our customers to live with their first answers, refusing to allow them - and us - to learn what really works for them.
And anyone who tries to tell you that it's possible to get it right - or even almost right - first time, is a ninny. And you can tell them I said that.
September 8, 2014
Iterating Is FundamentalJust like it boggles my mind that, in this day and age of electric telephones and Teh Internets, we still debate whether an invisible man in the sky created the entire universe in 6 days, so too is my mind boggled that - in 2014 - we still seem to be having this debate about whether or not we should iterate our software designs.
To me, it seems pretty fundamental. I struggle to recall a piece of software I've worked on - of any appreciable complexity or sophistication - where getting it right first time was realistic. On my training courses, I see the need to take multiple passes on "trivial" problems that take maybe an hour to solve. Usually this is because, while the design of a solution may be a no-brainer, it's often the case that the first solution solves the wrong problem.
Try as I might to spell out the requirements for a problem in clear, plain English, there's still a need for me to hover over developers' shoulders and occasionally prod them to let them know that was not what I meant.
That's an example of early feedback. I would estimate that at least half the pairs in the average course would fail to solve the problem if I didn't clear up these little misunderstandings.
It's in no way an indictment of those developers. Put me in the exact same situation, and I'm just as likely to get it wrong. It's just the lossy, buggy nature of human communication.
That's why we agree tests; to narrow down interpretations until there's no room for misunderstandings.
In a true "waterfall" development process - bearing in mind that, as I've said many times, in reality there's no such thing - all that narrowing down would happen at the start, for the entire release. This is a lot of work, and requires formalisms and rigour that most teams are unfamiliar with and unwilling to attempt.
Part of the issue is that, when we bite off the whole thing, it beecomes much harder to chew and much harder to digest. Small, frequent releases allow us to focus on manageable bitesized chunks.
But the main issue with Big Design Up-Front is that, even if we pin down the requirements precisely and deliver a bug-free implementation of exactly what was required, those requirements themselves are open to question. Is that what the customer really needs? Does it, in reality, solve their problem?
With the best will in the world, validating a system's requirements to remove all doubt about whether or not it will work in the real world, when the system is still on the drawing board, is extremely difficult. At some point, users need something that's at the very least a realistic approximation of the real system to try out in what is, at the very least, a realistic approximation of the real world.
And here's the the thing; it's in the nature of software that a realistic approximation of a program is, in effect, the program. Software's all virtual, all simulation. The code is is the blueprint.
So, in practice, what this means is that we must eventually validate our software's design - which is the software itself - by trying out a working version in the kinds of environments it's intended to be used in to try and solve the kinds of problems the software's designed to solve.
And the sooner we do that, the sooner we learn what needs to be changed to make the software more fit for purpose.
Put "agility" and "business change" to the back of your mind. Even if the underlying problem we want to solve stays completely static throughout, our understanding of it will not.
I've seen it time and again; teams agonise over features and whether or not that's what the customer really needs, and then the software's released and all that debate becomes academic, as we bump heads with the reality of what actually works in the real world and what they actually really need.
Much - maybe most - of the value in a software product comes as a result of user feedback. Twitter is a classic example. Look how many features were actually invented by the users themselves. We invented the Retweet (RT). We invented addressing tweets to users (using @). We invented hastags (#) to follow conversations and topics. All of the things that make tweets go viral, we invented. Remember that the founders of Twitter envisioned a micro-blogging service in the beginning. Not a global, open messaging service.
Twitter saw what users were doing with their 140 characters, and assimilated it into the design, making it part of the software.
How much up-front design do you think it would have taken them to get it right in the first release? Was their any way of knowing what users would do with their software without giving them a working version and watching what they actually did? I suspect not.
That's why I believe iterating is fundamental to good software design, even for what many of us might consider trivial problems like posting 140-character updates on a website.
There are, of course, degrees of iterativeness (if that's a word). At one extreme, we might plan to do only one release, to get all the feedback once we think the software is "done". But, of course, it's never done. Which is why I say that "waterfall" is a myth. What typically happens is that teams do one very looooong iteration, which they might genuinely believe is the only pass they're going to take at solving the problem, but inevitably when the rubbers meets the road and working software is put in front of end users, changes become necessary. LOTS OF CHANGES.
Many teams disguise these changes by re-classifying them as bugs. Antony Marcano has written about the secret backlogs lurking in many a bug tracking system.
Ambiguity in the original spec helps with this disguise: is it what we asked for? Who can tell?
Test-driven design processes re-focus testers on figuring our the requirements. So too does the secret backlog, turning testers into requirements analysts in all but name only, who devote much of their time to figuring out in what ways the design needs to change to make it more useful.
But the fact remains that producing useful working software requires us to iterate, even if we save those iterations for last.
It's for these reasons that, regardless of the nature of the problem, I include iterating as one of my basics of software development. People may accuse me of being dogmatic in always recommending that teeams iterate their designs, but I really do struggle to think of a single instance in my 30+ years of programming when that wouldn't have been a better idea than trying to get it absolutely right in one pass. And, since we always end up iterating anyway, we might as well start as we will inevitably go on, and get some of that feedback sooner.
There may be those in the Formal Methods community, or working on safety-critical systems, who argue that - perhaps for compliance purposes - they are required to follow a waterfall process. But I've worked on projects using Formal Methods, and consulted with teams doing safety-critical systems development, and what i see the good ones doing is faking it to tick all the right boxes. The chassis may look like a waterfall, but under the hood, it's highly iterative, with small internal releases and frequent testing of all kinds. Because that's how we deliver valuable working software.
August 12, 2014
TDD is TDD (And Far From Dead)Now, it would take enormous hubris for me to even suggest that this blog post that follows is going to settle the "What is TDD?" "Is TDD dead?" "Did weazels rip my TDD?" debates that have inexplicably sprung up around and about the countryside of late.
But it will. (In my head, at any road.)
First of all, what is TDD? I'm a bit dismayed that this debate is still going on, all these years later. TDD is what it always was, right from the time the phrase appeared:
Test-driven Development = Test-driven Design + Refactoring
Test-driven Design is the practice of designing our software to pass tests. They can be any kind of tests that software can pass: unit tests, integration tests, customer acceptance tests, performance tests, usability tests, code quality tests, donkey jazz hands tests... Any kind of tests at all.
The tests provide us with examples of how the software must be - at runtime, at design time, at tea time, at any time we say - which we generalise with each new test case to evolve a design for software that does a whole bunch of stuff, encompassed by the set of examples (the suite of tests) for that software.
We make no distinction in the name of the practice as to what kind of tests we're aiming to pass. We do not call it something else just because the tests we're driving our design with happen not to be unit tests.
Refactoring is the practice of improving the internal design of our software to make it easier to change. This may mean making the code easier for programmers to understand, or generalising duplicate code into some kind of reusable abstraction like a parameterised method or a new module, or unpicking a mess of dependencies to help localise the impact of making changes.
As we're test-driving our designs, it's vitally important to keep our code clean and maintainable enough to allow us to evolve it going forward to pass new tests. Without refactoring, Test-driven Design quickly becomes hard going, and we lose the ability to adapt to changes and therefore to be agile for our customer.
The benefits of TDD are well understood, and backed up by some hard data. Software that is test-driven tends to be more reliable. It tends to be simpler in its design. Teams that practice TDD tend to find it easier to achieve continuous delivery. From a business perspective, this can be very valuable indeed.
Developers who are experienced in TDD know this to be true. Few would wish to go back to the Bad Old Days before they used it.
That's not say that TDD is the be-all and end-all of software design, or that the benefits it can bring are always sufficient for any kind of software application.
But it is very widely applicable in a wide range of applications, and as such has become the default approach - a sort of "start for ten" - for many teams who use it.
It is by no means dead. There are more teams using it today than ever before. And, as a trainer, I know there are many more that aspire to try it. It's a skill that's highly in demand.
Of course, there are teams who don't succeed at learning TDD. Just like there are people who don't succeed at learning to play the trombone. The fact that not everybody succeeds at learning it does not invalidate the practice.
I've trained and coached thousands of developers in TDD, so I feel I have a good overview of how folk get on with it. Many - most, let's be honest - seriously underestimate the learning curve. Like the trombone, it may take quite a while to get a tune out of it. Some teams give up too easily, and then blame the practice. Many thousands of developers are doing it and succeeding with it. I guess TDD just wasn't for you.
So there you have it, in a nutshell: TDD is what it always was. It goes by many names, but they're all pseudonyms for TDD. It's bigger today than it ever was, and it's still growing - even if some teams are now calling it something else.
There. That's settled, then.
July 31, 2014
My Top 5 Most Under-used Dev PracticesSo, due to a last-minute change of plans, I have some time today to fill. I thought I'd spend it writing about those software development practices that come highly recommended by some, but - for whatever reason - almost no teams do.
Let's count down.
5. Mutation Testing - TDD advocates like me always extol the benefits of having a comprehensive suiite of tests we can run quickly, so we can get discover if we've broken our code almost immediately.
Mutation testing is a technique that enables us to ask the critical question: if our code was broken, would our tests show it?
We deliberately introduce a programming error - a "mutation" - into a line of code (e.g., turn a + into a -, or > into a <) and then run our tests. If a test fails, we say our test suite has "killed the mutant". We can be more assured that if that particular line of code had an error, our tests would show it. If no tests fail, that potentially highlights an gap in our test suite that we need to fill.
Mutation testing, done well, can bring us to test suites that offer very high assurance - considerably higher than I've seen most teams achieve. And that extra assurance tends to bring us economic benefits in terms of catching more bugs sooner, saving us valuable time later.
So why do so few teams do it? Well, tool support is one issue. The mutation testing tools available today tend to have a significant learning curve. They can be fiddly, and they can throw up false positives, so teams can spend a lot of time chasing ghosts in their test coverage. It takes some getting used to.
In my own experience, though, it's worth working past the pain. The pay-off is often big enough to warrant a learning curve.
So, in summary, reason why nobody does it : LEARNING CURVE.
4. Visualisation - pictures were big in the 90's. Maybe a bit too big. After the excesses of the UML days, when architects roamed the Earth feeding of smaller prey and taking massive steaming dumps on our code, visual modeling has - quite understandably - fallen out of favour. So much so that many teams do almost none at all. "Baby" and "bathwater" spring to mind.
You don't have to use UML, but we find that in collaborative design, which is what we do when we work with customers and work in teams, a picture really does speak a thousand words. I still hold out hope that one day it will be commonplace to see visualisations of software designs, problem domains, user interfaces and all that jazz prominently displayed in the places where development teams work. Today, I mainly just see boards crammed with teeny-weeny itty-bitty index cards and post-it notes, and the occasional wireframe from the UX guy, who more often than not came up with that design without any input at all from the team.
The effect of lack of visualisation on teams can be profound, and is usually manifested in the chaos and confusion of a code base that comprises several architectures and a domain model that duplicates concepts and makes little to no sense. If you say you're doing Domain-driven Design - and many teams do - then where are your shared models?
There's still a lot of mileage in Scott Ambler's "Agile Modeling" book. Building a shared understanding of a complex problem or solution design by sitting around a table and talking, or by staring at a page of code, has proven to be uneffective. Pictures help.
In summary, reason why so few do it: MISPLACED AGILE HUBRIS
3. Model Office - I will often tell people about this mystical practice of creating simulated testing environments for our software that enable us to see how it would perform in real-world scenarios.
NASA's Apollo team definittely understood the benefits of a Model Office. Their lunar module simulator enabled engineers to try out solutions to systemm failures on the ground before recommending them to the imperilled astronauts on Apollo 13. Tom Hanks was especially grateful, but Bill Paxton went on to star in the Thunderbirds movie, so it wasn't all good.
I first heard about them doing a summer stint in my local W H Smith in the book department. Upstairs, they had a couple of fake checkouts and baskets of fake goods with barcodes.
Not only did we train in their simulated checkout, but they also used them to analyse system issues and to plan IT changes, as well as to test those changes in a range of "this could actually happen" scenarios.
A Model Office is a potentially very powerful tool for understanding problems, for planning solutions and for testing them - way more meaningful than acceptance tests that were agreed among a bunch of people sitting in a room, many of whom have never even seen the working environment in which the software's going to be used, let alone experienced it for themselves.
There really is no substitute for the real thing; but the real thing comes at a cost, and often the real thing is quite busy, actually, thank you very much. I mean, dontcha just hate it when you're at the supermarket and the checkout person is just learning how it all works while you stand in line? And mistakes that get made get made with real customers and real money.
We can buy ourselves time, control and flexibility by recreating the real thing as faithfully as possible, so we can explore it at our leisure.
Time, because we're under no pressure to return the environment to business use, like we would be if it was a real supermarket checkout, or a real lunar module.
Control, because we can deliberately recreate scenarios - even quite rare and outlandish ones - as often as we like, and make it exactly the same, or vary it, as we wish. One of the key reasons I believe many business systems are not very robust is because they haven't been tested in a wide-enough range of possible circumstances. In real life, we might have to wait weeks for a particular scenario to arise.
Flexibility, because in a simulated environment, we can do stuff that might be difficult or dangerous in the real world. We can try out the most extraordinary situations, we can experiment with solutions when the cost of failure is low, and we can explore the problem and possible solutions in ways we just couldn't or wouldn't dare to if real money, or real lives, or real ponies were at stake.
For this reason,, from me, Model Offices come very highly recommended. Which is very probably why nobody uses them.
Reason why nobody does it - NEVER OCCURRED TO THEM
2. Testing by Inspection - This is another of those blind spots teams seem to have about testing. Years of good research have identified reading the code to look for errors as one of the most - if not the most - effective and efficient ways of finding bugs.
Now, a lot of teams do code reviews. It's a ritual humiliation many of us have to go through. But commonly these reviews are about things like coding style, naming conventions, design rules and so forth. It's vanishingly rare to meet a team who get around a computer, check out some code and ask "okay, will this work?"
Testing by inspection is actually quite a straightforward skill, if we want it to be. A practice like guided inspection, for example, simply requires us to pick some interesting test cases, and step through the code, effectively executing it in our heads, asking questions like "what should be true at this point?" and "when might this line of code not work?"
If we want to, we can formalise that process to a very high degree of rigour. But the general pattern is the same; we make assertions about what should be true at key points during the execution of our code, we read the code and dream up interesting test cases that will cause those parts of the code to be executed and ask those questions at the appropriate times. When an inspection throws up interesting test cases that our code doesn't handle, we can codify this knowledge as, say, automated unit tests to ensure that the door is closed to that particular bug permanently.
Do not underestimate the power of testing by inspection. It's very rare to find teams producing high-integrity software who don't do it. (And, yes, I'm saying it's very rare to find teams producing high-integrity software.)
But, possibly because of associations with the likes of NASA, and safety-critical software engineering in general, it has a reputatioon for being "rocket science". It can be, if we choose to go that far. But in most cases, it can be straightforward, utilising things we already know about computer programming. Inspections can be very economical, and can reap considerable rewards. And pretty much anyone who can program can do them. Which is why, of course, almost nobody does.
Reason why nobody does it - NASA-PHOBIA
1. Business Goals - Okay, take a deep breath now. Imminent Rant Alert.
Why do we build software?
There seems to be a disconnect between the motivations of developers and their customers. Customers give us money to build software that hopefully solves their problems. But, let's be honest now, a lot of developers simply could not give two hoots about solving the customer's problems.
Which is why, on the vast majority of software teams, when I ask them what the ultimate business goals of what they're doing are, they just don't know.
Software for the sake of software is where our heads are mostly at. We buiild software to build software.
Given a free reign, what kind of software do developers like to build? Look on Github. What are most personal software projects about?
We don't build software to improve care co-ordination for cancer sufferers. We don't build software to reduce delivery times for bakeries. We don't build software to make it easier to find a hotel room with fast Wi-Fi at 1am in a strange city.
With our own time and resources, when we work on stuff that interests us, we won't solve a problem in the real world. We'll write another Content Management System. Or an MVC framework. Or another testing tool. Or another refactoring plug-in. Or another VCS.
The problems of patients and bakers and weary travelers are of little interest to us, even though - in real life - we can be all of these things ourselves.
So, while we rail at how crappy and poorly thought-out the software we have to use on a daily basis tends to be ("I mean, have they never stayed in a hotel?!"), our lack of interest in understanding and then solving these problems is very much at the root of that.
We can be so busy dreaming up solutions that we fail to see the real problems. The whole way we do development is often a testament to that, when understanding the business problem is an early phase in a project that, really, shouldn't exist until someone's identified the problem and knows at least enough to know it's worth writing some software to address it.
Software projects and products that don't have clearly articulated, testable and realistic goals - beyond the creation of software for its own sake - are almost guaranteed to fail; for the exact same reason that blindly firing arrows in random directions with your eyes closed is almost certainly not going to hit a valuable target. But this is what, in reality, most teams are doing.
We're a solution looking for a problem. Which ultimately makes us a problem. Pretty much anyone worth listening to very, very strongly recommends that software development should have clear and testable business goals. So it goes without saying that almost no teams bother.
Reason why so few teams do it - APATHY
July 16, 2014
What Level Should We Automate Most Of Our Tests At?So this blog post has been a long time in the making. Well, a long time in the procrastinating, at any rate.
I have several clients who have hit what I call the "front-end automated test wall". This is when teams place greatest emphasis on automating acceptance tests, preferring to verify the logic of their applications at the system level - often exercised through the user interface using tools like Selenium - and rely less (or not at all, in some cases) on unit tests that exercise the code at a more fine-grained level.
What tends to happen when we do this is that we end up with large test suites that require much set-up - authentication, database stuff, stopping and starting servers to reset user sessions and application state, and all the fun stuff that comes with system testing - and run very slowly.
So cumbersome can these test suites become that they slow development down, sometimes to a crawl. If it takes half an hour to regression test your software, that's going to make the going tough for Clean Coders.
The other problem with these high-level tests is that, when they fail, it can take a while to pin down what went wrong and where it went wrong. As a general rule of thumb, it's better to have tests that only have one reason to fail, so when something breaks it's alreay pretty well pinpointed. Teams who've hit the wall tend to spend a lot time debugging.
And then there's the modularity/reuse issue: when the test for a component is captured at a much higher level, it can be tricky to take that chunk and turn it into a reusable chunk. Maybe the risk calculation component of you web application could also be a risk calculation component of a desktop app, or a smartwatch app. Who knows? But when its contracts are defined through layers of other stuff like web pages and wotnot, it can be difficult to spin it out into a product in its own right.
For all these reasons, I follow the rule of thumb: Test closest to the responsibility.
One: it's faster. Every layer of unnecessary wotsisname the tests have to go through to get an answer adds execution time and other overheads.
Two: it's easier to debug. Searching for lost car keys gets mighty complicated when your car is parked three blocks away. If it's right outside the front door, and you keep the keys in a bowl in the hallway, you should find them more easily.
Three: it's better for componentising your software. You may call them "microservices" these days, but the general principles is the same. We build our applications by wiring together discrete components that each have a distinct responsibility. The tests that check if a component fulfils its reponsibility need to travel with that components, if at all possible. If only because it can get horrendously difficult to figure out what's being tested where when we scatter rules willy nilly. The risk calculation test wants to talk to the Risk Calculator component. Don't make it play Chinese Whsipers through several layers of enterprise architecture.
Sometimes, when I suggest this, developers will argue that unit tests are not acceptance tests, because unit tests are not written from the user's perspective. I believe - and find from experience - that this is founded on an artificial distinction.
In practice, an automated acceptance test is just another program written by a programmer, just like a unit test. The programmer interprets the user's requirements in both cases. One gives us the illusion of it being the customer's test, if we want it to be. But it's all smoke and mirrors and given-when-then flim-flam in reality.
The pattern, known of old, of sucking test data provided by the users into parameterised automated tests is essentially what our acceptance test automation tools do. Take Fitnesse, for example. Customer enters their Risk Calculation inputs and expected outputs into a table on a Wiki. We write a test fixture that inserts data form the table into program code that we write to test our risk calculation logic.
We could ask the users to jot those numbers down onto a napkin, and hardcode them into our test fixture. Is it still the same test? It it still an automated acceptance test? I believe it is, to all intents and purposes.
And it's not the job of the user interface or our MVC implementation or our backend database to do the risk calculation. There's a distinct component - maybe even one class - that has that responsibility. The rest of the architecture's job is to get the inputs to that component, and marshall the results back to the user. If the Risk Calculator gets the calculation wrong, the UI will just display the wrong answer. Which is correct behaviour for the UI. It should display whatever output the Risk Calculator gives it, and display it correctly. But whether or not it's the correct output is not the UI's problem.
So I would test the risk calculation where the risk is calculated, and use the customer's data from the acceptance test to do it. And I would test that the UI displays whatever result it's given correctly, as a separate test for the UI. That's what we mean by "separation of concerns"; works for testing, too. And let's not also forget that UI-level tests are not the same thing as system or end-to-end tests. I can quite merrily unit test that a web template is rendered correctly using test data injected into it, or that an HTML button is disabled running inside a fake web browser. UI logic is UI logic.
And I know some people cry "foul" and say "but that's not acceptance testing", and "automated acceptance tests written at the UI level tend to be nearer to the user and therefore more likely to accurately reflect their requirements."
I say "not so fast".
First of all, you cannot automate user acceptance testing. The clue is in the name. The purpose of user acceptance testing is to give the user confidence that we delivered what they asked for. Since our automated tests are interpretations of those requirements - eevery bit as much as the implementations they're testing - then, if it were my money, I wouldn't settle for "well, the acceptance tests passed". I'd want to see those tests being executed with my own eyes. Indeed, I'd wanted to execute them myself, with my own hands.
So we don't automate acceptance tests to get user acceptance. We automate acceptance tests so we can cheaply and effectively re-test the software in case a change we've made has broken something that was previously working. They're automated regression tests.
The worry that the sum total of our unit tests might deviate from what the users really expected is mitigated by having them manually execute the acceptance tests themselves. If the software passes all of their acceptance tests AND passes all of the unit tests, and that's backed up by high unit test assurance - i.e., it is very unlikely that the software could be broken from the user's perspsctive without any unit tests failing - then I'm okay with that.
So I still have user acceptance test scripts - "executable specifications" - but I rely much more on unit tests for ongoing regression testing, because they're faster, cheaper and more useful in pinpointing failures.
I still happily rely on tools like Fitnesses to capture users' test data and specific examples, but the fixtures I write underneath very rarely operate at a system level.
And I still write end-to-end tests to check that the whole thing is wired together correctly and to flush out configuration and other issues. But they don't check logic. They just the engine runs when you turn the key in the ignition.
But typically I end up with a peppering of these heavyweight end-to-end tests, a feathering of tests that are specifically about display and user interaction logic, and the rest of the automated testing iceberg is under the water in the form of fast-running unit tests, many of which use example data and ask questions gleaned from the acceptance tests. Because that is how I do design. I design objects directly to do the work to pass the acceptance tests. It's not by sheer happenstance that they pass.
And if you simply cannot let go of the notion that you must start by writing an automated acceptance test and drive downwards from there, might I suggest that as new objects emerge in your design, you refactor the test assertions downwards also and push them into new tests that sit close to those new objects, so that eventually you end up with tests that only have one reason to fail?
Refactorings are supposed to be behaviour-preserving, so - if you're a disciplined refactorer - you should end up with a cluster of unit tests that are logically directly equivalent to the original high-level acceptance test.
There. I've said it.