September 13, 2016

Learn TDD with Codemanship

4 Things You SHOULDN'T Do When The Schedule's Slipping

It takes real nerve to do the right thing when your delivery date's looming and you're behind on your plan.

Here are four things you should really probably avoid when the schedule's slipping:

1. Hire more developers

It's been over 40 years since the publication of Fred L. Brooks' 'The Mythical Man-Month'. This means that our industry has known for almost my entire life that adding developers to a late project makes it later.

Not only is this born out by data on team size vs. productivity, but we also have a pretty good idea what the causal mechanism is.

Like climate change, people who reject this advice should not be called "skeptics" any more. In the face of the overwhelming evidence, they're Small Team Deniers.

Hiring more devs when the schedule's slipping is like prescribing cigarettes, boxed sets and bacon for a patient with high blood pressure.

2. Cut corners

Still counterintuitively, for most software managers, the relationship between software quality and the time and cost of delivery is not what most of us think it is.

Common sense might lead us to believe that more reliable software takes longer, but the mountain of industry data on this clearly shows the opposite in the vast majority of cases.

To a point - and it's a point 99% of teams are in no danger of crossing - it actually takes less effort to deliver more reliable software.

Again, the causal mechanism for this is well understood. And, again, anyone who rejects the evidence is not a "skeptic"; they're a Defect Prevention Denier.

The way to go faster on 99% of projects is to slow down, and take more care.

3. Work longer hours

Another management myth that's been roundly debunked by the evidence is that, when a software delivery schedule's slipping significantly, teams can get back on track by working longer hours.

The data very clearly shows that - for most kinds of work - longer hours is a false economy. But it's especially true for writing software, which requires a level of concentration and focus that most jobs don't.

Short spurts of extra effort - maybe the odd weekend or late night - can make a small difference in the short term, but day after day, week after week overtime will burn your developers out faster than you can say "get a life". They'll make stupid, easily avoidable mistakes. And, as we've seen, mistakes cost exponentially more to fix than to avoid. This is why teams who routinely work overtime tend to have lower overall productivity: they're too busy fighting their own self-inflicted fires.

You can't "cram" software development. Like your physics final exams, if you're nowhere near ready a week before, then you're not gong to be ready, and no amount of midnight oil and caffeine is going to fix that.

You'll get more done with teams who are rested, energised, feeling positive, and focused.

4. Bribe the team to hit the deadline

Given the first three points we've covered here, promising to shower the team with money and other rewards to hit a deadline is just going to encourage them to make those mistakes for you.

Rewarding teams for hitting deadlines fosters a very 1-dimensional view of software development success. It places extra pressure on developers to do the wrong things: to grow the size of their teams, to cut corners, and to work silly hours. It therefore has a tendency to make things worse.

The standard wheeze, of course, is for teams to pretend that they hit the deadline by delivering something that looks like finished software. The rot under the bonnet quickly becomes apparent when the business then expects a second release. Now the team are bogged down in all the technical debt they took on for the first release, often to the extent that new features and change requests become out of the question.

Yes, we hit the deadline. No, we can't make it any better. You want changes? Then you'll have to pay us to do it all over again.

Granted, it takes real nerve, when the schedule's slipping and the customer is baying for blood, to keep the team small, to slow down and take more care, and to leave the office at 5pm.

Ultimately, the fate of teams rests with the company cultures that encourage and reward doing the wrong thing. Managers get rewarded for managing bigger teams. Developers get rewarded for being at their desk after everyone else has gone home, and appearing to hit deadlines. Perversely, as an industry, it's easier to rise to the top by doing the wrong thing in these situations. Until we stop rewarding that behaviour, little will change.

August 25, 2016

Learn TDD with Codemanship

Employers: Don't Use GitHub Stats To Judge Developers

I'm seeing a growing trend among employers to use GitHub statistics as a measure of a developer's capability (or, at least, one of them).

I advise employers very strongly against going down this road. There's absolutely no statistical correlation between those stats and the quality of a developer's work, or their ability to "get shit done" under commercial constraints of time and resources.

GitHub - an extremely useful tool for developers to share and collaborate on projects - is also a social network, and like all social networks it can be gamed. In many ways, it's little different to online services like Soundcloud. Would we measure a musician's worth by how many tracks they upload, or how long those tracks are? Indeed, given the tricks that can be deployed to boost a track's apparent popularity, can we even trust the number of track plays and likes as a metric?

Developers who are very active on GitHub are very active on GitHub. That's all we can reasonably conclude. I know many really spiffy devs who don't even have GitHub accounts. And I've stumbled across more than a few who have amazing-looking GitHub profiles who actually suck as developers.

The only thing employers can usefully get from looking at a developer's GitHub profile is the actual code itself. Does it have automated tests? Is it readable? Is it simple? Can you see lots of obvious duplication? Is it basically a C program, but written in Ruby? You can tell a lot about a developer by what their code's like.

GitHub commits are social interactions. They carry information about the committer, just as track Likes on Soundcloud do. And any interaction on a social network that carries information about the sender can, and eventually will, become just another channel for self-promotion, every bit as much as Facebook "likes", LinkedIn endorsements, and Twitter "follows".

So, yes, if you're a developer, stick some of your code up on a site like GitHub so we can take a peek and see what it's like. But if you're an employer, take those stats with a big pinch of salt. If you want to know how good a developer really is, there are much better ways.

July 16, 2016

Learn TDD with Codemanship

Taking Agile To The Next Level

As with all of my Codemanship training workshops, there's a little twist in the tail of the Agile Software Development course.

Teams learn all about the Agile principles, and the Agile manifesto, Extreme Programming, and Scrum, as you'd expect from an Agile Software Development course.

But they also learn why all of that ain't worth a hill of beans in reality. The problem with Agile, in its most popular incarnations, is that teams iterate towards the wrong thing.

Software doesn't exist in a vacuum, but XP, Scrum, Lean and so forth barely pay lip-service to that fact. What's missing from the manifesto, and from the implementations of the manifesto, is end goals.

In his 1989 book, Principles of Software Engineering Management, Tom Gilb introduced us to the notion of an evolutionary approach to development that iterates towards testable goals.

On the course, I ask teams to define their goals last, after they've designed and started building a solution. Invariably, more than 50% of them discover they're building the wrong thing.

It had a big influence on me, and I devoted a lot of time in the late 90s and early 00s to exploring and refining these ideas.

Going beyond the essential idea that software should have testable goals - based on my own experiences trying to do that - I soon learned that not all goals are created equal. It became very clear that, when it comes to designing goals and ways of testing them (measures), we need to be careful what we wish for.

Today, the state of the art in this area - still relatively unexplored in our industry - is a rather naïve and one-dimensional view of defining goals and associated tests.

Typically, goals are just financial, and a wider set of perspectives isn't taken into account (e.g., we can reduce the cost of manufacture, but will that impact product quality or customer satisfaction?)

Typically, goals are not caveated by obligations on the stakeholder that benefits (e.g., the solution should reduce the cost of sales, but only if every sales person gets adequate training in the software).

Typically, the tests ask the wrong questions (e.g., the airline who measured speed of baggage handling without noticing the increase in lost of damaged property and insurance claims, and then mandated that every baggage handling team at every airport copy how the original team hit their targets.)

Now, don't get me wrong: a development team with testable goals is a big improvement on the vast majority of teams who still work without any goals other than "build this".

But that's just a foundation on which we have to build. Setting the wrong goals, implemented unrealistically and tested misleadingly, can do just as much damage as having no goals at all. Ask any developer whose worked under a regime of management metrics.

Going beyond Gilb's books, I explored the current thinking from business management on goals and measures.

Balancing Goals

First, we need to identify goals from multiple stakeholder perspectives. It's not just what the bean counters care about. How often have we seen companies ruined by an exclusive focus on financial numbers, at the expense of retaining the best employees, keeping customers happy, being kind to the environment, and so on? We're really bad at considering wider perspectives. The law of unintended consequences can be greatly magnified by the unparalleled scalability of software, and there may always be side-effects. But we could at least try to envisage some of the most obvious ones.

Conceptual tools like the Balanced Scorecard and the Performance Prism can help us to do this.

Back in the early 00s, I worked with people like Mike Bourne, Professor of Business Performance Innovation, to explore how these ideas could be applied to software development. The results were highly compatible, but still - more than a decade later - before their time, evidently.


If business goals are post-conditions, then we - above all others - should recognise that many of them will have pre-conditions that constrain the situations in which our strategy or solution will work. A distributed patient record solution for hospitals cannot reduce treatment errors (e.g., giving penicillin to an unconscious patient who is allergic) if the computers they're using can't run our software.

For every goal, we must consider "when would this not be possible?" and clearly caveat for that. Otherwise we can easily end up with unworkable solutions.

Designing Tests

Just as with software or system acceptance tests, to completely clarify what is meant by a goal (e.g., improve customer satisfaction) we need to use examples, which can be worked into executable performance tests. Precise English (or French or Chinese or etc) just isn't precise enough.

Let's run with my example of "improve customer satisfaction"; what does that mean, exactly? How can we know that customer satisfaction has improved?

Imagine we're running a chain of restaurants. Perhaps we could ask customers to leave reviews, and grade their dining experience out of 10, with 1 being "very poor" and 10 being "perfect".

Such things exist, of course. Diners can go online and leave reviews for restaurants they've eaten at. As can the people who own the restaurant. As can online "reputation management" firms who employ armies of paid reviewers to make sure you get a great average rating. So you could be a very highly rated restaurant with very low customer satisfaction, and the illusion of meeting your goal is thus created.

Relying solely on online reviews could actively hurt your business if they invited a false sense of achievement. Why would service improve if it's already "great"?

If diners were really genuinely satisfied, what would the real signs be? They'd come back. Often. They'd recommend you to friends and family. They'd leave good tips. They'd eat all the food you served them.

What has all this got to do with software? Let's imagine a tech example: an online new music discovery platform. The goal is to amplify good bands posting good music, giving them more exposure. Let's call it "Soundclown", just for jolly.

On Soundclown, listeners can give tracks a thumbs-up if they like them, and thumb's down if they really don't. Tracks with more Likes get promoted higher in the site's "billboards" for each genre of music.

But here's the question: just because a track gets more Likes, does that mean more listeners really liked it? Not necessarily. As a user of many such sites, I see how mechanisms for users interacting with music and musicians get "gamed" for various purposes.

First and foremost, most sites identify to the musician who the listener that Liked their track is. This becomes a conduit for unsolicited advertising. Your track may have a lot of Likes, but that could just be because a lot of users would like to sell you promotional services. In many cases, it's evident that they haven't even listened to the track that they're Liking (when you notice it has more Likes than it's had plays.)

If I were designing Soundclown, I'd want to be sure that the music being promoted was genuinely liked. So I might measure how many times a listener plays the track all the way through, for example. The musical equivalent of "but did they eat it all? and "did they come back for more?"

We might also ask for a bit more proof than clicking a thumbs-up icon. One website I use keeps a list of my "fans", but are they really fanatical about my music? Judging by the tumbleweed when I alert my "fans" to new music, the answer is evidently "nope". Again, we could learn from the restaurant, and allow listeners to "tip" artists, conveying some kind of reward that costs the listener something somehow.

Finally, we might consider removing all unintended backwards marketing channels. Liking my music shouldn't be an opportunity for you to try and sell me something. That very much lies at the heart of the corruption of most social networks. "I'm really interested in you, Now buy my stuff!"

This is Design. So, ITERATE!

The lesson I learned early is that, no matter how smart we think we've been about setting goals and defining tests, we always need to revisit them - probably many times. This is a design process, and should be approached the same we design software.

We can make good headway using a workshop format I designed many years ago

Maintaining The Essence

Finally, but most important of all, our goals need to be expressed in a way that doesn't commit us to any solution. If our goal is to promote the best musicians, then that's our goal. We must always keep our eyes on that prize. It takes a lot of hard work and diligence not to lose sight of our end goals when we're bogged down in technical solution details. Most teams fail in that respect, and let the technical details become the goal.

May 25, 2016

Learn TDD with Codemanship

How Many Bugs In Your Code *Really*?

A late night thought, courtesy of the very noisy foxes outside my window.

How many bugs are lurking in your code that you don't know about?

Your bug tracking database may suggest you have 1 defect per thousand lines of code (KLOC), but maybe that's because your tests aren't very thorough. Or maybe it's because you deter users from reporting bugs. I've seen it all, over the years.

But if you want to get a rough idea of how many bugs there are really, you can use a kind of mutation testing.

Create a branch of your code and deliberately introduce 10 bugs. Do your usual testing (manual, automated, whatever it entails), and keep an eye on bugs that get reported. Stop the clock at the point you'd normally be ready to ship it. (But if shipping it *is* your usual way of testing, then *start* the clock there and wait a while for users to report bugs.)

How many of those deliberate bugs get reported? If all 10 do, then the bug count in your database is probably an accurate reflection of the actual number of bugs in the code.

If 5 get reported, then double the bug count in your database. If your tracking says 1 bug/KLOC, you probably have about 2/KLOC.

If none get reported, then your code is probably riddled with bugs you don't know about (or have chosen to ignore.)

April 25, 2016

Learn TDD with Codemanship

Mutation Testing & "Debuggability"

More and more teams are waking up to the benefit of checking the levels of assurance their automated tests give them.

Assurance, as opposed to coverage, answers a more meaningful question about our regression tests: if the code was broken, how likely is it that our tests would catch that?

To answer that question, you need to test your tests. Think of bugs as crimes in your code, and your tests as police officers. How good are your code police at detecting code crimes? One way to check would be to deliberately commit code crimes - deliberately break the code - and see if any tests fail.

This is a practice called mutation testing. We can do it manually, while we pair - I'm a big fan of that - and we can do it using one of the increasingly diverse (and rapidly improving) mutation testing tools available.

For Java, for example, there are tools like Jester and PIT. What they do is take a copy of your code (with unit tests), and "mutate" it - that is, make a single change to a line of code that (theoretically) should break it. Examples of automated mutations include turning a + into a -, or a < into <=, or ++ into --, and so on.

After it's created a "mutant" version of the code, it runs the tests. If one or more tests fail, then they are said to have "killed the mutant". If no test fails, then the mutant survives, and we may need to have a think about whether that line of code that was mutated is being properly tested. (Of course, it's complicated, and there will be some false positives where the mutation tool changed something we don't really care about. But the results tend to be about 90% useful, which is a boon, IMHO.)

Here's a mutation testing report generated by PIT for my Combiner spike:

Now, a lot of this may not be news for many of you. And this isn't really what this blog post is about.

What I wanted to draw your attention to is that - once I've identified the false positives in the report - the actual level of assurance looks pretty high (about 95% of mutations I cared about got killed.) Code coverage is also pretty high (97%).

While my tests appear to be giving me quite high assurance, I'm worried that may be misleading. When I write spikes - intended to as proof of concept and not to be used in anger - I tend to write a handful of tests that work at a high level.

This means that when a test fails, it may take me some time to pinpoint the cause of the problem, as it may be buried deep in the call stack, far removed from the test that failed.

For a variety of good reasons, I believe that tests should stick close to the behaviour being tested, and have only one reason to fail. So when they do fail, it's immediately obvious where and what the problem might be.

Along with a picture of the level of assurance my tests give me, I'd also find it useful to know how far removed from the problem they are. Mutation testing could give me an answer.

When tests "kill" a mutant version of the code, we know:

1. which tests failed, and
2. where the bug was introduced

Using that information, we can calculate the depth of the call stack between the two. If multiple tests catch the bug, then we take the shallowest depth out of those tests.

This would give me an idea of - for want of a real word - the debuggability of my tests (or rather, the lack of it). The shallower the depth between bugs and failing tests, the higher the debuggability.

I also note a relationship between debuggability and assurance. In examining mutation testing reports, I often find that the problem is that my tests are too high-level, and if I wrote more focused tests closer to the code doing that work, they would catch edge cases I didn't think about at that higher level.

April 18, 2016

Learn TDD with Codemanship

SC2016 Mini-Project: Code Risk Heat Map

Software Craftsmanship 2016 - Mini-Project

Code Risk Heat Map

Estimated Duration: 2-4 hours

Author: Jason Gorman, Codemanship

Language(s)/stacks: Any


Create a tool that produces a "heat map" of your code, highlighting the parts (methods, classes, packages) that present the highest risk of failure.

Risk should be classified along 3 variables:

1. Potential cost of failure

2. Potential risk of failure

3. Potential system impact of failure (based on dependencies)

Use colour coding/gradation to visually highlight the hotspots (e.g., red = highest risk, green = lowest risk)

Also, produce a prioritised list of individual methods/functions in the code with the same information.


April 25, 2015

Learn TDD with Codemanship

Non-Functional Tests Can Help Avoid Over-Engineering (And Under-Engineering)

Building on the topic of how we tackle non-functional requirements like code quality, I'm reminded of those times where my team has evolved an architecture that developers taking over from us didn't understand the reasons or rationale for.

More than once, I've seen software and systems scrapped and new teams start again from scratch because they felt the existing solution was "over-engineered".

Then, months later, someone on the new team reports back to me that, over time, their design has had to necessarily evolve into something similar to what they scrapped.

In these situations it can be tricky: a lot of software really is over-engineered and a simpler solution would be possible (and desirable in the long term).

But how do we tell? How can we know that the design is the simplest thing that a team could have done?

For that, I think, we need to look at how we'd know that software was functionally over-complicated and see if we can project any lessons we leearn on to non-functional complexity.

A good indicator of whether code is really needed is to remove it and see if any acceptannce tests fail. You'd be surprised how many features and branches in code find their way in there without the customer asking for them. This is especially true when teams don't practice test-driven development. Developers make stuff up.

Surely the same goes for the non-functional stuff? If I could simplify the design, and my non-functional tests still pass, then it's probable that the current design is over-engineered. But in order to do that, we'd need a set of explicit non-functional tests. And most teams don't have those. Which is why designs can so easily get over-engineered.

Just a thought.

March 1, 2015

Learn TDD with Codemanship

Continuous Inspection at NorDevCon

On Friday, I spent a very enjoyable day at the Norfolk developer's conference NorDevCon (do you see what they did there?) It was my second time at the conference, having given the opening keynote last year, and it's great to see it going from strength to strength (attendance up 50% on 2014), and to see Norwich and Norfolk being recognised as an emerging tech hub that's worthy of inward investment.

I was there to run a workshop on Continuous Inspection, and it was a good lark. You can check out the slides, which probably won't make a lot of sense without me there to explain them - but come along to CraftConf in Budapest this April or SwanseaCon 2015 in September and I'll answer your questions.

You can also take a squint at (or have a play with) some code I knocked up in C# to illustrate a custom FxCop code rule (Feature Envy) to see how I implemented the example from the slides in a test-driven way.

I'm new to automating FxCop (and an infrequent visitor to .NET Land), so please forgive any naivity. Hopefully you get the idea. The key things to take away are: you need a model of the code (thanks Microsoft.Cci.dll), you need a language to express rules against that model (thanks C#), and you need a way to drive the implementation of rules by writing executable tests that fail (thanks NUnit). The fun part is turning the rule implementation on its own code - eating your own dog food, so to speak. Throws up all sorts of test cases you didn't think of. It's a work in progress!

I now plan, before CraftConf, to flesh the project out a bit with 2-3 more example custom rules.

Having enjoyed a catch-up with someone who just happens to be managing the group at Microsoft who are working on code analysis tools, I think 2015-2016 is going to see some considerable ramp-up in interest as the tools improve and integration across the dev lifecycle gets tighter. If Continuous Inspection isn't on your radar today, you may want to put it on your radar for tomorrow. It's going to be a thing.

Right now, though, Continuous Inspection is very much a niche pastime. An unscientific straw poll on social media, plus a trawl of a couple of UK job sites, suggests that less than 1% of teams might even be doing automated code analysis at all.

I predicted a few years ago that, as computers get faster and code gets more complex, frequent testing of code quality using automated tools is likely to become more desirable and more do-able. I think we're just on the cusp of that new era today. Today, code quality is an ad hoc concern relying on hit-and-miss practices like pair programming, where many code quality issues often get overlooked by pair who have 101 other things to think about, and code reviews, where issues - if they get spotted at all in the to-and-fro - are flagged up long after anybody is likely to do anything about them.

In related news, after much discussion and braincell-wrangling, I've chosen the name for the conference that will be superceding Software Craftsmanship 20xx later this year (because craftsmanship is kind of done now as a meme). Watch this space.

November 25, 2014

Learn TDD with Codemanship

Continuous Inspection II - Planning & Executing CInsp

In this second blog post about Continuous Inspection (CInsp, for short), I want to look at how we might manage the CInsp process to get the most value from it.

While some develoment teams are now using CInsp tools to analyse their code to get early warnings about code quality problems when they're easier and cheaper to fix, it's fair to say that this area of the develoment discipline has to date evaded the principles that we apply to other kinds of requirements.

Typically, as a kind of work, CInsp is ad hoc, unplanned, untracked and most teams who do it have only a very vague idea of what kind of cost it has and what kind of benefits they're reaping from it.

CInsp is rarely prioritised, leaving the field wide open to waste a lot of time and effort on activities that add little or no value.

Non-functional requirements obey the same laws as functional ones, which is why we need to attack them using the same principles and techniques.

In this post, I want to examine how we plan and execute CInsp on projects starting from scratch. (In a future post, I'll talk about applying CInsp to existing code bases with a build-up of code quality issues.)

Continuous Inspection Requirements.

There are an infinite number of properties we could look for in our code, but some have value in finding and most don't. Rather than waste our time arbitrarily searching our code for "stuff", it's important we have a clear idea of what it is we're looking for and why.

Extreme Programming, for example, has a perfectly usable mechanism for describing the things we want to inspect for, and the benefits of catching those kinds of code quality problems early.

A Code Quality Story is a non-functional user story that briefly summarises a code quality "bug" we wish to avoid and the pay-off we might expect if we can avoid introducing it into our code.

Note first of all that I've chosen here to use a blue index card. This might be in a system where we write functional user stories on green cards, report bugs on red cards, and record other outcomes - "miscellaneous tasks", like setting up the build and implementing code quality gates - on blue cards.

Why do this? Well, I've found it very useful to know roughly how much of a team's time is split between delivering working features, fixing bugs (ideally, zero time), and "shaving yaks" when the yaks being shaved are sufficiently large and not part of the work of delivering specific features.

The importance of the effort split becomes apparent as time goes by and the software evolves. A healthy project is one where the proportion of effort devoted to delivering working features remains relatively constant. What typically happens on teams who set out at an unsustainable pace is that they begin development with their time devoted mostly to the green cards, and after a few months most of their time is spent tackling red cards and making a lot less progress on new features. This is a good indicator of the rising cost of change we're seeking to avoid, so we can sustain the pace of development and deliver value for longer. This information will help us better judge how well-spent the time devoted to things like CInsp is.

So we have a placeholder for our code quality requirement in the form of a blue index card. What next?

Planning Continuous Inspection

This is where I, and a lot of teams, have gone wrong in the past. What we should never, ever do is allow the customer to choose when and whether we tackle non-functional requirements. And in "customer" I include proxy customers like business analysts and project managers. The overwhelmingly common experience of development teams is that purely technical issues, like code quality, get sidelined by non-technical stakeholders.

We must not give them the chance to drop our Feature Envy story in favour of a story about, say, sorting columns in an HTML table if we strongly believe, as professionals, that avoiding Feature Envy is important. If, as the evidence suggests, care taken over code quality helps to maintain productivity and deliver greater value over time, then we risk presenting customers with a confusing false dichotomy between work that enhances quality and work that directly delivers working features.

The analogy I use is to pretend we're running a restaurant using the planning practices of Extreme Programming.

Every job that needs doing gets written on a card, and placed into a backlog of outstanding work. There will be user stories like "Take table 3's order" and "Serve french fries and beer to table 7" and "Get the bill for table 12". These are stories about work that will make the restaurant money.

There will also be stories like "Wash the dishes in the sink" and "Clean out pizza oven" and "Repaint sign over door". These are about tasks that cost money, but don't directly bring in revenue by themselves.

If we allowed our restaurant's shareholders - who themselves have never worked in a restaurant, but they have a stake in it as a business - to prioritise what stories get done at the expense of others in a world where backlogs always outweigh the available time and resources, then there's a very real danger that the kitchen will rarely get cleaned, the sign above the door will fade until nobody can see it, and we'll run out of clean plates halfway through service.

The temptation for teams who are driven solely by the priorities of non-technical stakeholders is that non-functional issues like code quality will only get tackled when a crisis emerges that blocks progress on functional requirements. i.e., we don't wash up until we run out of plates, or we don't clean the kitchen until the inspector shuts us down, or we don't repaint the sign until the customers have stopped coming in.

One thing we've learned about writing software is that it's cheaper and easier to tackle problems proactively and catch them earlier. Sadly, too many teams are left lurching from one urgent crisis to the next, never getting the chance to get ahead of the issues.

For this reason, I strongly advise against involving non-technical stakeholders in planning CInsp. (As well as other technical work.)

Now put yourself in the diner's shoes: you pick up the menu, and every dish lists all of the tasks restaurant staff have to do in order to deliver it. Let's say we charge £11 for fish and chips, with a clean grill, mopping the floors, cashing up that evening, doing the accounts, getting up early to take delivery of fresh fish, and so on.

Two questions:

1. If we hadn't told them, would the diner even care?

2. If we make it the diner's business, are we inviting them to negotiate the price of the fish and chips down by itemising what goes in to running the restaurant? ("I'll have the fish & chips, but I'm not paying for your trainee chef's college course" etc)

The world is full of work that needs doing, but nobody thinks they should pay for. In order for the world to keep turning, for fish & chips to appear on our dining tables, this work has to get done one way or another, and it has to be paid for.

The way a restaurant squares this circle is to build it into the cost of the meal and to not present diners with a choice. Their choice is simple: don't like the price, don't order the dish.

Likewise in software development, there's a universe of tasks that need doing that do not directly end with a working feature being delivered to the customer's table. We must build this work into the price ("feature X will take 3 days to deliver") and avoid presenting the customer with bewildering choices that, in reality, aren't choices at all.

So planning Continuous Inspection is something that happens within the team among technical stakeholders who understand the issues and will be doing the work. This is good advice for any non-functional requirements, be they about build automation, internal training or hiring developers. This is just "stuff that has to happen" so we can deliver working software reliably, economically and sustainably.

The key thing, to avoid teams disappearing up their own backsides with the technical stuff, is to make sure we're all absolutely clear about why we're doing it. Why are we automating the build? Why are we writing a tool that generates code? Why are we sending half the team to the Software Craftsmanship conference? (Some companies send entire teams.) And the answer should always be something of value to the customer, even if that value might not be realised for months or years.

In practice, we have planning meetings - especially in the early stages of a project - that are for technical stakeholders only. Lock the doors. Close the blinds. Don't tell the boss. (I have literally experienced running around offices looking for rooms where the developers can have these discussions in private, chased by the project manager who insists on sitting in. "Don't mind me. I won't interfere." Two seconds later...)

Such meetings give teams a chance to explicitly discuss code quality and to thrash out what they mean by "good code" and "bad code" and establish a shared set of priorities over code quality. It's far better to have these meetings - and all the inevitable disagreements - at the start, when we can take steps to prevent issues, than to have them later when we can only ask "what went wrong?"

Executing Continuous Inspection

On new software, the effort in Continuous Inspection tends to be front-loaded, and with good reason.

As I've mentioned a few times already, it tends to be far cheaper to tackle code quality "bugs" early - the earlier the better. This means that adding new code quality requirements later in development tends to catch problems when they're much more expensive to fix, so it makes sense to set the quality bar as high as we can at the start.

There's good news and there's bad news. First, the bad news: on a new project, from a standing start, it's going to take considerable effort to get automated code inspections in place. It will vary greatly, depending on the technology stack, availability of tools, experience levels in the team, and so on. But it's not going to take an afternoon. So you may be faced with having to hide a big chunk of effort from non-technical stakeholders if you attempt to start development (from their perspective, when they're actively involved) at the same time as putting CInsp in place. (Same goes for builds, CI, and a raft of other stuff that we need to get up and running early on.)

Another very strong recommendation from me: have at least one iteration before you involve the customer. Get the development engine running smoothly before you wind down the window and shout "Where to, guv'nor?" They may be less than impressed to discover that you just need to build the engine before you can set off. Delighting customers is as much about expectations as it is about actual delivery.

Going back to the restaurant analogy, consider why restaurants distinguish between "service" and "preparation". Service may start at 6pm, but the chefs have probably been there since 9am getting things ready for that. If they didn't, then those first orders might take hours to reach the table. Too many development teams attempt the equivalent of starting service as the ingredients are being delivered to the kitchen. We need to do prep, too, before we can start taking orders.

Now, for the good news: the kinds of code quality requirements we might have on one, say, JEE project are likely to be similar on another JEE project. CInsp practitioners tend to find that they can get a lot of reuse out of code quality gates they've already developed for previous projects. So, over months and years, the overall cost of getting CInsp up and running tends to decrease quite significantly. If your technology stack remains fairly stable over the years, you may well find that getting things up and running can eventually become an almost push-button process. It takes a lot of investment to get there, though.

Code Quality stories work the same way as user stories in their execution. We plan what stories we're going to tackle in the current timebox in the same way. We tackle them in pairs, if possible. We treat them purely as placeholders to have a conversation with the person asking for each story. And, most importantly, we agree...

Continuous Inspection Acceptance Tests

Going back to our Feature Envy code quality story, what does the developer who write that story mean by "Feature Envy"?

Here's the definition from Martin Fowler's Refactoring book:

"A classic [code] smell is a method that seems more interested in a class other than the one it is in. The most common focus of the envy is the data."

It's all a bit handwavy, as is usually the case with software design wisdom. A human being using their intelligence, experience and judgement might be able to read this, look at some code and point to things that seem to them to fit the description.

Programming a computer to do it, on the other hand...

This is where we can inhabit our customer's world for a little while. When we ask our customer to precisely decribe a business rule, we're putting them on the spot every bit as much as a computable definition of Feature Envy might put me and you on the spot. In cold, hard, computable terms: we don't quite know what we mean.

When the business problem we're solving is about, say, mortgages or video rentals or friend requests, we ask the customer for examples that illustrate the rule. Using examples, we can establish a shared vocabulary - a language for expressing the rule - explore the boundaries, and pin down a precise computable understanding of it (if there is one.)

We shouldn't be at all surprised that this technique also works very well for rules about our code. Ask the owner of a code quality story to track down some classic examples of code that breaks the rule, as well as code that doesn't (even if it looks at first glance like it might).

This is where the real skill in CInsp comes into play. To win at Continuous Inspection, development teams need to be skilled as reasoning about code. This is not a bad skill for a developer to have generally. It helps us communicate better, it helps us visualise better, it makes us better at design, at refactoring, at writing tools that work with code. Code is our domain model - the business objects of programming.

Using our code reasoning skills, applied to examples that will form the basis of acceptance tests, we can drive out the design of the simplest tool possible that will sound the alarm when the "bad" examples are considered, while silently allowing the "good" examples to pass through the quality gate.

As with functional user stories, we're not done until we have a working automated quality gate that satisfies our acceptance tests and can be applied to new code straight away.

In the next blog post, we'll be rolling up our sleeves with an example Continuous Inspection quality gate, implementing it using a variety of tools to demonstrate that there's often more than one way to skin the code quality cat.

November 22, 2014

Learn TDD with Codemanship

Continuous Inspection I - Why Do We Need It?

This is the first of a series of posts about Continuous Inspection. My goals here is to give you something to think about, rather than to present a complete hands-on guide. The range (and maturity) of tools and techniques we can apply to Continuous Inspection (I'll call it CInsp from now on to save a few keystrokes) is such that I could write 1,000 blog posts and still not cover it all. So here I'll just focus on general CInsp principles and illustrate with cherrypicked examples.

In this first post, I want to summarise what I mean by "Continuous Inspection" and argue that there's a real need for it on most software development teams.

Contininuous Inspection is the practice of - and stop me if I'm getting too technical here - continuously inspecting your code to detect non-functional issues in the software.

CInsp is just another kind of Continuous Testing, which is a cornerstone of Continuous Delivery. To have our software always in a shippable state, we must take steps to assure ourselves that the software is always working.

If we follow the thinking behind continuous testing (and re-testing) of our software to check that it still works, the benefit is that we never stray more than a few minutes from having something we could ship if the business wanted us to.

To date, the only practical way we've found to achieve Continuous Testing is to automate those tests as much as possible, so they can be run quickly and economically. If it takes you 2 weeks to re-test your software, then after each change you make to the code, you are at least 2 weeks away from knowing if the software still works. Manual testing makes Continuous Delivery impractical.

In recent years, automated testing - and especially automated unit testing - has grown in popularity, and the effects can be seen in teams delivering more reliably and more sustainably as a result.

But only to a point.

What I've observed across hundreds of teams over the last decade or more is that, even with high levels of automated testing, the pace of delivery still slows to unacceptable levels.

In order to sustain the pace of change, the code itself needs to remain open to change. Being able to quickly regression test our software is a boon in this respect, no doubt. But it doesn't address the whole picture.

There are other things that can hamper change in our code. If the code's complicated, for example, it will be more likely to break when we change it. If there's duplication in our code - if we've been a bit trigger-happy with Copy+Paste - then that can multiply the cost of making a change. If we've not paid attention to the dependencies in our code, small changes can cause big ripples through the code and amplify the cost.

As we make progress in delivering functionality we tend also to make a mess inside the software, and that mess can get in our way and impede future progress. To maintain the pace of innovation over months and years and get the most out of our investment over the lifetime of a software product, we need to keep our code clean.

Experienced developers view design issues that impede progress in their code as bugs, and they can be every bit as serious as bugs in the functionality of the software.

And, just like functional bugs, these code quality bugs (often referred to as "code smells", because they're indicatice of your code "rotting" as it grows) have a tendency to get harder and more expensive to fix the longer we leave them.

Duplication has a tendency to grow, as does complexity. We build more dependencies on top of our dependencies. Switch statements get longer. Long parameter lists get longer. Big classes get bigger. And so on.

Here's what I've discovered form examining hundreds of code bases over the years: code smells that get committed into the code are very likely to remain for the lifetime of the software.

There seems to be a line that once we've crossed it, our mistakes are likely to live forever (and impede us forever). From observation, I've found that this line is moving on.

In the Test-driven Development cycle, for example, I've seen that when developers move on to the next failing test, any code smells they leave behind will likely not get addressed later. In programming, "later" is a distant and alien land where all our little TO-DO's never get done. "Later" might as well be "Narnia".

Even more so, when developers commit their code to a shared repository, at that point code smells "petrify", and remain forever trapped in the amber of all the other code that surrounds them. 90% of code smells introduced in committed code never get fixed.

This is partly because most teams have no processes for identifying and addressing code quality problems. But even the ones who do tend to find that their approach, while better than nothing, is not up to the task of keeping the code as clean as it needs to be to maintain the pace of change the customer needs.

Why? Well, let's look at the kinds of techniques teams these days use:

1. Code Reviews

There's a joke that goes something like this: "Ask a developer what's wrong with a line of code, and she'll give you a list. Ask her what's wrong with 500 lines of code, and she'll tell you it's fine."

Code reviews have a tendency to store up large amounts of code - potentially containing large numbers of issues - for consideration. The problem here is seeing the wood for the trees. A lot of issues get overlooked in the confusion.

But even if code reviews identified all of the code quality issues, the economics of fixing those issues is working against us. Fixing bugs - functional or non-functional - tends to get exponentially more expensive the longer we leave them in the code, and for precisely the same reasons (longer feedback cycles).

In practice, while rigorous code reviews would be a step forward for many teams who don't do them at all, they are still very much shutting the stable door after the horse has bolted.

2. Pair Programming

In theory, pair programming is a continuous code review where the "navigator" is being especially vigilent to code quality issues and points them out as soon as they spot them. In some cases, this is pretty much how it works. But, sad to say, in the majority of pairs, code quality issues are not high on anyone's agenda.

This is for two good reasons: firstly, most developers are not all that aware of code smells. They don't figure high in our list of priorities. Code quality isn't sexy, and doesn't get you hired at

Secondly, with the best will in the world, people have limitations. When Codemanship does pairing to assess a developer's skill level in certain practices, the level of focus required on what the other person's doing is really quite intense. You don't take your eye off the screen in case you miss something. But there are dozens of code smells we need to be vigilant for, and even with all my experience and know-how, I can't catch them all. My mind will have to skip between lots of competing concerns, and when my remaining brain cells are tied up trying to remember how to do something with Swing, I'm likely to take my eye off the code quality ball. It's also very difficult to maintain that level of focus hour after hour, day-in and day-out. It hurts my brain.

Pair programming, as an approach to guarding against code smells, is good when it's done well. But it's not that good that we can be assured code written in this way will be maintainable enough.

3. Design Authorities

By far the least effective route to ensuring code quality is to make it someone else's job.

Hiring architects or "technical design authorities" suffers from all the shortcomings of code reviews and pair programming, and then adds a big bunch of new shortcomings.

Putting aside the fact that almost every architect or TDA I've ever met has been mostly focused on "the big picture", and that I've seen 1,000-line switch statements waved through the quality gate by people obsessing over whether classes implement certain interfaces they've prescribed, turning design authorities into design quality testers never seems to end well. Who wants to spend their day scouring other people's code for examples of Feature Envy?

I'll say no more, except to summarise by observing that the code I've seen produced by teams with dedicated design authorities counts amongs the worst for code quality.

4. Coding Standards

In theory, a team's coding standards are a codification of what we all agree we mean by "good code".

Typically, these are written down in documents that nobody ever reads, and suffer from the same practical drawbacks as architecture documents and company mission statements. They're aspirational affirmations at best. But, in practice, everybody just ignores them.

Even on those more disciplined teams that try to adhere to coding standards, they still have major drawbacks, all relating back to things we've already discussed.

Firstly, coding standards are a list of "stuff" we need to be thinking about along with all the other "stuff" we have to think about. So they tend to take a lower priority and often get overlooked.

Secondly, as someone who's studied a lot of coding standards documents (and what joy they bring!), they have a tendency to be both arbitrary and by no means universally agreed upon. Often they've been written by some kind of design or development authority, usually with little or no input from the team they're being imposed on. It's rare for issues that affect maintainability to be addressed in a coding standards document. Programmers are a funny bunch: we care deeply about some weird stuff while Elephants In The Room creep in without being questioned and sit on us. Naming conventions, therefore, have little relation to how easy the code will be to read and understand. And it's rare to see duplication, dependencies, complexity and so on even being hinted at. As long as all your instances have names beginning with obj and all your private member variables beging with "m_", the gods of code goodness will be appeased.

And then there's the question of how and when we enforce coding standards. And we're back to the hard physics of software development - time, money and cost. Knowing what we should be looking for is only the tip of the code quality iceberg.

What's needed is the ability to do code reviews so freqently, and do them in a way that's so effective, that we never stray more than a few minutes from clean code. For this, we need code reviewers who miss very little, who are constantly looking at the code, and who never get tired or distracted.

For that, thankfully, we have computers.

Program code is like any other domain model; we can write programs to reason about the design of other programs, expressed in terms of the structure of code itself.

Code quality rules are just like any other computable business rules. If the rule is that a block of code in one class should not make copious references to features in another class ("Feature Envy"), it's possible to write an automated test that reads code and looks at those references to determine if that block of code is in the right place.

Let's illustrate with a technology example. Imagine we're working in Java in, say, Eclipse. We could write code for a plug-in that, whenever we make a change to the code document we're working on, reads the code's Abstract Syntax Tree (basically, a code DOM) and does a calculation for the ratio of internal and external dependencies in that Java method we just changed. If the ratio is too low, it could flag it up as a warning while we're writing the code.

The computational power of computers is such today that this sort of continuous background code reviewing is practically possible, and there have already been some early attempts to create just such plug-ins.

In the article I wrote a few years ago for Visual Studio Journal, Ever-decreasing Cycles, I speculate about the impact such short code quality feedback loops might have on the economics of development.

It's my belief that, just as continuous automated unit testing has had a profound effect on the "bottom line" of software development for many teams and businesses, so too would Continuous Inspection.

In the next blog post, I'll talk about the CInsp process and look at practical ways of managing CInsp requirements, test automation and how we action the code quality problems it can throw up.