April 19, 2013

Dark Life & New Ways Of Seeing

This article in the Guardian about how some astrobiologists theorise that there could be a "hidden biosphere" that has evolved on Earth in parallel with the tree of life from which we sprang reminded me of that age-old problem of how we can expect to find things we're not looking for.

We similarly overlooked a big chunk of the mass of the universe because we looked at electromagnetic radiation, seeing only that which emits or reflects electromagnetic waves.

In software development, we to can naively interpret our inability to see something as non-existence of the thing we can't see. Typically, we can't see it because we're not looking for it.

These could be, for example, the bugs nobody tested for. Like dark matter, and dark life, the bugs are still there, and they can still bite. But I've seen too many teams apply the strategy of not looking, as if that somehow means those bugs don't exist. This is like covering our faces and assuming that, because we can't see other people, they can't see us.

New ways of seeing are therefore vitally important. We can "see" dark matter by measuring its gravitational effects. And we could see dark life by appying tests for a wider set of biological possibilities. Then a whole new world (or universe) emerges out of the shadows, and our understanding is expanded.

Developers may believe their multithreaded code has few bugs, but that may because they haven't tested it in multithreaded scenarios. They may believe their software is easy to use, but that may be because they haven't tested it users who weren't involved in the design. They may believe their software is performant, but that may be because they haven't tested it under a high load. They may believe their classes are loosely coupled, but that may be because they haven't looked at a graph of class dependencies.

New ways of seeing offer up new possible understandings. And I can't help feeling we, as an industry, invest far too little in expanded our senses so we can expand our understanding of software. Too much of it is about "looking at text files", and I find that limits our vision and restricts our understanding.






November 12, 2012

Scorecards for Business Angels?

Watching Dragons' Den, it gave me a thought about investors.

Now, I'm not a business expert - although I do run a business - but it occurs to me that when an investment opportunity comes up that more thasn one Dragon seems interested in, it can come down as much to who that Dragon is and what they could do for that business beyond giving them money as it does to the money itself. Indeed, in some cases, the money seems secondary.

What a lot of businesses seem to be seeking is the experience, the expertise and the contacts of the Dragons. The money they're asking for is often in amounts they could probably borrow without giving away any of their business.

In the spirit of a more scientific approach, it struck me that it could be quite easy to gauge the ptential value-add of an investor by building a picture of their track record, built on the subsequent performance of businesses they invest in, as well as the performance of businesses they turn down.

For example, in Dragons' Den, someone could back through the years and build a financial picture of businesses Dragon's made offers to, businesses they invested in, and businesses they didn't think they'd get a return from.

This could be broken down by Dragon, so we could see, for example, what the average effect of an investment from Deborah Meaden is vs. an investment from Peter Jones. They could also break it down by sector, so we could see what the effect of Peter Jones investing in your ruggedized touchpad computer might be vs. what it might be if Deborah invested.

It might also be very interesting to see what effect just giving businesses useful contacts might be, without any investment or guidance from a Dragon.

Maybe it's all about "who you know". Who can say?

The benefit to investors - at least, too good investors who tend to make good calls and have a more positive impact on their investments - would be that, with a proven track record, you might be able to ask for a bigger slice of a business based on expectations of higher performance.

Perhaps something like this already exists. But I suspect the world of business angels is so secretive that there probably isn't. I've never come across one.

The good news is anybody who wanted to create something like this could use publicly available data for most of it (apart from knowing what businesses they turned down, of course), so it wouldn't require the permission of investors or the businesses they invest in.






September 25, 2012

Revisiting Unified Principles of Dependency Management (In Lieu Of 100 Tweets)

Some years ago, I published a slide deck on OO design principles that's proven to be quite popular (about 50,000 downloads) on the parlezuml.com web sites.

I'm ashamed to say, due to forgetfulness on my part, the metrics suggested for each principles have long fallen out favour on my own work.

SOLID has formed the basis of how we explain OO design principles for probably 15 or more years, and it's easy to forget that there's nothing scientific about SOLID. The principles are not a theoretically complete explanation, nor are they scientifically tested.

We also inherited (no pun intended) different design principles to think about dependencies at different levels of code organisation.

I went into the wilderness for a couple of years and really dug deep to try and get OO design principles straight in my own mind. I wanted to examine the mechanics of it - the "physics" of dependency management, if you like.

Network models have become popular in physics to explain certain kinds of phenomena, ranging from earthquakes to the runs on the financial markets. It occurred to me that any sound principles of OO design ought to be based on models of propagation through networks.

I built simulations to explore propagation scenarios in simplified models of code dependency networks, and from that formed a set of unified dependency management principles that, I believe, apply at any level of code organisation, and not just in OO programming.

My Four Principles of Dependency Management have an order of precedence.

1. Minimise Dependencies - the simpler our code, the less "things" we have referring to other "things"

2. Localise Dependencies - for the code we have to write, as much as possible, "things" should be packaged - in units of code organisation - together with the "things" they depend on

3. Stabilise Dependencies - of course, we can't put our entire dependency network in the same function (that would be silly). For starters, it's at odds with minimising our dependencies, since modularity is the mechanism for removing duplication, and modularisation inevitably requires some dependencies to cross the boundaries between modules (using the most general meaning of "module" to mean a unit of code reuse - which could be a function or could be an entire system in a network of systems). When dependencies have to cross those boundaries, they should point towards things that are less likely - e.g., harder - to change. This can help to localise the spread of changes across our network of dependencies, in much the same way that a run on the banks is less likely if banks only lend to other banks that are less likely to default.

4. Abstract Dependencies - when we have to depend on something, but still need to accomodate change into system somehow, the easiest way to that is to make things that are depended upon easier to substitute. It's for much the same reason that we favour modular computer hardware. We can evolve and improve our computer by swapping out components with newer ones. To make this possible, computer components need to communicate through standard interfaces. These industry abstractions make make it possible for me to swap out my memory with larger or faster memory, or my hard drive, or my graphics card. If ATI graphics cards had an ATI-specific interface, and NVidia cards had NVidia-specific interfaces, this would not be possible.

I've found it easier to apply these 4 principles at method, class, package and system level, and much easier to explain them. At each level of code organisation, we just need to substitute the right "things" into the formula.

Measuring how well our code follows these principles is easier, too.

1. Measuring the size or complexity of code at various levels of organisation is a doddle. Most tools will do that for you. e.g., method length, method cyclomatic complexity, class size (number of methods), package size (number of classes), and so on.

2. Let's take classes as an example: if classes have, on average, high internal cohesion - that is, the features of that class reference each other a lot - and low external coupling with features of other classes, it could be said that we have localised dependencies. It's the ratio between cohesion and coupling that paints that picture.

3. & 4. Are interrelated. Robert Martin's metrics for Abstractness, Instability and Distance From The Main Sequence are a good fit, once we've generalised them to make it possible to calculate A, I and D for methods, classes, packages and systems.

But what about Interface Segregation and Single Responsibility? The research I did for myself strongly suggested that if your code is simple, cohesive and loosely coupled and your dependencies tend to point in the right direction, these things are of little consequence. They are all sort of covered by the underlying mechanics of code dependencies and therefore these four principles. An interface that only includes methods used by a specific client is, in my opinion, more abstract than an interface that includes methods the client doesn't use. And we tend to find that when we scatter responsibilities across classes, or have classes that do too much, that's covered by 1. and 2.




September 16, 2012

Are Woolly Definitions Of "Success" At The Heart Of Software Development's Thrall To Untested Ideas?

In the ongoing debate about what works and what doesn't in software development, we need to be especially careful to define what we mean by "it worked".

In my Back To Basics paper, I made the point that teams need to have a clear, shared and testable understanding of what is to be achieved.

Without this, we're a ship on a course to who-knows-where, and I've observed all manner of ills stemming from this.

Firstly, when we don't know where we're supposed to be headed, steering becomes a fruitless exercise.

It also becomes nigh-on impossible to gauge progress in any meaningful way. It's like trying to score an archery contest with an invisible target.

To add to our worries, teams that lack clear goals have a tendency to eat themselves from the inside. We programmers will happily invent our own goals and persue our own agendas in the absence of a clear vision of what we're all meant to be aiming for.

This can lead to excess internal conflict as team members vie to stamp their own vision on a product or project. Hence an HR system can turn into a project to implement an "Enterprise Service Bus" or to "adopt Agile".

Since nobody can articulate what the real goals are, any goal becomes more justifiable, and success becomes much easier to claim. I've met a lot of teams who rated their product or project as a "big success", much to the bemusement of the end users, project sponsors and other stakeholders, who can take a very different view.

There are times when we can display all the misplaced confidence and self-delusion of an X Factor contestant who genuinely seems to have no idea that they're singing out of tune and dancing like their Dad at a wedding.

Much of the wisdom we find on software development comes from people, and teams, who are basing their insights on a self-endowed sense of success. "We did X and we succeeded, therefore it is good to X" sort of thing.

Here's my beef with that: first off, it's bad science.

It's bad science for three reasons: one is that one data point doesn't make a trend, two is that perhaps you have incorrectly attributed your success to X rather than one of the miriad other factors in software development, and three is that can we really be sure that you genuinely succeeded?

If I claim that rubbing frogspawn into your eyes cures blindness, we can test that by rubbing frogspawn into the eyes of blind people and then measuring the accuity of their eyesight afterwards.

If, on ther hand, I claim that rubbing frogspawn into your eyes is "a good thing to do", and that after I rubbed frogspawn into my eyes, I got "better" - well, how can we test that? What is "better"? Maybe I rubbed frogspawn into my eyes and my vocabulary improved.

My sense is that a worrying proportion of what we read and hear about "things that are good to do" in software development is based on little more than "how good (or how right) it felt" to do them. Who knows; maybe rubbing fresh frogspawn in your eyes feels great. But that has little bearing on its efficacy as a treatment.

Without clear goals, it's not easy to objectively determine if what we're doing is working, and this - I suspect - is the underlying reason why so much of what we know, or we think we know, about software development is so darned subjective.

Teams who've claimed to me that they're "winning" (perhaps because of all the tiger blood) have turned out to be so wide of the mark that, in reality, the exact oppsosite was true. These days, when I hear proclamations of great success, it's usually a precursor to the whole project getting canned.

The irony is that those few teams who knew exactly what they were aiming for often measure themselves more brutally against their goals, and are more pessimistic, despite in real terms being more "winning" than teams who were prematurely doing their victory lap.

This, I suspect, has also contributed to the dominance of subjective ideas in software development. Ideas backed up by objective successes seem to be expressed more tentatively and with more caveats than ideas backed up by little more than feelgood and tiger blood, which are expressed more confidently and in more absolute terms.

The naked ape in all of us seems to respond more favourably to people who present their ideas with confidence and a greater sense of authority. In reality, many of these ideas have never really been put to the test.

Once an idea's gained traction, there can be benefits within the software development community to being its originator or a perceived expert in it. Quickly, vested interests build up and the prospect of having their ideas thoroughly tested and potentially debunked becomes very unattractive. The more popular the idea, and the deeper the vested interests, the more resistance to testing it. We do not question whether a burning bush really could talk when we're in the middle of a fundraising drive for the church roof...

It's saddening to see, then, that in the typical lifecycle of an idea, publicising it often preceds testing it. More fools us, though. We probably need to be much more skeptical and demanding of hard evidence to back these ideas up.

Will that happen? I'd like to think it could, but the pessimist in me wonders if we'll always opt for the shiny-and-new and leave our skeptical hats at home when sexy new ideas - with sexy new acronyms - come along.

But a good start would be to make the edges of our definition of "success" crisper and less forgiving.





April 22, 2012

Towards A More Empirical Understanding of The Effects of TDD (That Really Matter)

There's been a handful of empirical studies done into the effects of Test-driven Development over the last decade.

Looking at the state of the art in this area of research, as a long-time TDD practitioner and coach I'm left feeling less than satisfied at the way these studies were conducted, and therefore the quality of the results.

Firstly, I'm not entirely satisfied that all of the studies were conducted by people who really understood what TDD is. They tend not to mention refactoring, for example. As refactoring is roughly half of the effort in TDD, this seems like a major oversight.

Secondly, studies seem to be largely conducted on groups who are introduced to the disciplines of TDD at the start. I know from years of experience training and coaching teams in TDD that the learning curve can be formidable, and that TDD can take hundreds of hours of good practice to really get the hang of. The idea of giving a group a crash course in it, and then setting them a small exercise from which my study will derive, seems unrealistic. A team that's been doing for it for more than a year is likely to produce measurably different results in both software quality (internal and external) and productivity.

Thirdly, studies do not account for the specific way in which TDD is being practiced within the teams. There are different schools of TDD, and a whole spectrum of possible ways it can be practiced. Team A might be writing tests at a high level, for example, and these tests may make many assertions. Team B might be rigorously applying the "tests should test one thing" rule. Both teams could be following the golden rule of TDD, namely that they don't write production code until a failing test requires it.

And finally, these studies aren't asking the important question; namely, what effect does TDD have on things that would matter to a business? What effect does it have on feature request cycle times? What effect does it have on sustainability of innovation?

Tell your CTO that TDD will reduce bug counts, or that it doesn't cost more to do, and he or she is likely to shrug their shoulders and say "so what?" Tell them that TDD can help reduce cycle times to less than a week, or that teams that do it well are able to sustain a reasonable pace of change for years on the same code base, and they may sit up and take notice.

IT managers are so used to telling businesses they'll have to wait six months to get that feature that marketing needed yesterday, or telling them that they can't have it because it's just too expensive to make changes to a legacy system, then you may be carried aloft like conquering heros if you can offer them a way out of that.

Even though the studies are flawed, though, they still tend to conclude that TDD has benefits. Code that is test-driven tends to be simpler, and have lower bug counts. And there's a real mix of results regarding productivity - so much so that it's reasonable to conclude that TDD has little impact on schedules or development costs in the short-to-medium term. And that's with sample groups who are usually just beginning with TDD.

I consider the study conducted at the BBC by Kerry Jones (now at social TV start-up Zeebox) and myself to be one of the better ones. It's using data comparisons from real-world projects and over a long term (1 year), and the developers participating went through not just a crash course in TDD but a fairly rigorous 6 month peer-learning exercise, with regular weekly practice and a practical TDD skills assessment which they all passed. They were all demonstrably capable of practicing TDD in roughly the same way.

Where we suffered was lack of useful data beyond the code itself. Like most organisations who do software development, teams at the BBC do not know how many person-hours go into different activities, or what the cycle time of feature requests is, or even how many bugs are reported with each release.

Anecdotally, they reported that on one project where the team practiced TDD fairly rigorously right from the start, the frequency of live releases was greater than at any time on any previous project. So frequent, in fact, that it was edging towards what we might recognise as "continuous deployment". Again, anecdotally, we heard reports that if the code passed all of the automated tests, the business was satisfied that it was fit for a release, and that lengthy acceptance testing phases were not considered necessary.

These are just anecdotes, though. We have hard evidence to support our claim that TDD improved code quality, but only the usual ghost stories to support any claims beyond that.

What was frustrating at the time, and this is usually the case, is that all the raw data we needed was probably there somewhere. Project management must surely know how many people put in how many days on each release. They must surely know when a feature was first added to the backlog, and when the working code went live.

Couple this with a bug-tracking database, a source code repository and the usual Scrum/Kanban data and I would have everything I need to tie it all together. The hardest piece of the jigsaw to find is how the code is being written. For that, you really need to see it being written. Just as it is with history, there's only so much you can learn from examining ancient artifacts. There's no substitute for a high-fidelity account from somewhere who was there.

If I was conducting an academic study on this now, I'd ask for several sources of useful data:

1. The source code repository containing a complete version history
2. The defect tracking database associated with that code
3. The complete project history (release/iteration plans & actuals, use case/user story estimates & actuals, backlogs, burndowns, staffing etc)
4. Something that would allow me to see code being written (e.g., screencasts made of developers working on the code, IDE session recordings)

Using this data, I could visualise the arc of a software product over its lifetime up to now and look for any correlations between TDD and other coding practices and the shape of the arc.

Software has a tendency to plateau, sooner or later. At some point, the cost of changing it outweighs the benefit of doing so. At this stage, we have a legacy system: namely one that is critical to the continued operation of a business while simultaneously being a significant impediment to the evolution of that business. Like old age, every software system has this coming. And it's arguably the default state of the majority of systems in use today. Which means that it's the default state for the majority of businesses that rely on legacy systems.

But some software reaches this plateau long before others, just as some people age faster than others. If we can postpone the inevitable for longer, our software can live a more active and fulfilling life for longer, and our businesses can stay adaptive using those systems for longer.

It's my theory that business evolution exhibits a sort of punctuated equilibrium. They tend to spend most of their time in prolonged phases of equilibrium, when things don't change much, and then suddenly - due to a new opportunity or threat or some other sudden change in the conditions that surround them - they frantically reinvent themselves to adapt and to stay alive through another prolonged phase of equilibrium.

Quite often, it's these short phases of organised panic that tend to give rise to the ambitious new IT projects, as businesses discover the prohibitive cost of teaching their legacy systems new tricks. It's often accompanied by major structural changes within the organisation and massive upheaval.

This isn't the best way to build a "learning organisation". The same principle applies to major "big bang" software releases and major organisational change programmes - when we change 1,001 things at once, we lose the ability to learn one lesson at a time. Maybe 499 of those changes were the wrong changes, but rolling back 499 changes without throwing the baby out with the bathwater is fiendishly difficult. Like software, businesses succeed or fail as a whole.

It falls on us to develop software in a way that supports continuous, sustainable business evolution and to help build real learning organisations - organisations that can learn one lesson at a time, and sustain the pace of learning indefinitely.

It is my belief that programming practices like TDD, refactoring and continuous integration can help to achieve this. But it's just a belief, based on wishy-washy personal experience. I have seen a ghost. Now I need to get some instruments in there, collect some hard data, and prove it to the rest of the world.





January 28, 2012

Non-functional Test-Driven Development

It's the question that comes up everytime I introduce someone to Test-driven Development: "But what about performance?"

The thing about TDD is that adage "be careful what you wish for" applies. The solution we end up with is constrained by tests. There may be a million and one ways of achieving a goal, and some will perform better than others. The trick with TDD is to ask the right questions.

What I like about TDD - and similar precise approaches to defining requirements - is that it forces us to be explicit and unambiguous about what we want from our software.

So, my stock in trade reply to the question "But what about performance?" is "yes, what about performance?"

Software performance has different dimensions, and if it's important then we need to define exactly what performance we require in specific scenarios. A great way to do this is using non-functional tests.

There's the dimension of time, for example. How long should it take for the code to run?

Imagine a search algorithm that looks for a customer name in a sorted list. We could just loop through the list, and if there are only 1,000 customers and the occasional search, that might be fine. But if there are 10,000,000 customers and users are frequently searching, then a simple loop probably isn't going to cut the mustard.

We can constrain our search algorithm with a basic timing test, like the one below, that makes it explicit that our worst case search - the customer we're looking for isn't in the list - should take a maximum of 1 millisecond to complete.



Execution time is only one dimension, of course. What if we need to constrain the memory footprint of our code while it's running? In Java, we can use the JVM to get information about memory usage, and we can create a multithreaded test to monitor how much more memory is being eaten up as our code executes. Let's imagine we need to constrain the memory footprint when sorting our list of 10 million customers by name, forcing us to use an in-place sorting algorithm that uses up a maximum of another 10KB of memory:



And here, with massive caveats for my less-than-amazing knowledge of the Java Runtime (I make no warranties, the value of shares can go down as well as up, etc etc):



Leaving aside the fact that my brute-force method for calculating memory footprint is a bit hokey (and on running the tests several times, quite variable, it seems), the basic idea is hopefully useful. No doubt some fine fellow will point out a much better way.

You may be able to envisage now how we could use tests to explicitly constrain other non-functional runtime qualities of our code. But we can also often find ways to constrain code at design time, too.

We might have a requirement that our methods should be short and simple. Static code analysis tools like XDepend and Checkstyle can give us hooks into the structure of our code and enable us to create tests that, when code fails to live up to our quality standards, alert us to that fact early enough to do something meaningful about it.

Using executable tests, we can steer our software between acceptable limits of performance, scalability, portability, maintainability, and a whole heap of other -ilities we might care about.

But what about the more, how shall we put this, etheric -ilities, like usability, accessibility and so on? These things tend to be pretty ill-defined and qualitative. Can we make them explicit and testable, just like execution time or memory footprint?

I believe that we can, and to without reason, because I've done it and seen it done. We could, say, define a test that fails if a carefully selected group of target users (e.g., legal secretaries with more than 2 years Windows and web browsing experience), when presented with our application for the first time, fail to get their heads around it fast enough to complete certain tasks we set them within a specified time, without any help or documentation.

With a bit of imagination and lateral thinking, it's possible to meaningfully test many more software qualities than we usually do. And my experience of non-functional TDD is that we tend to get what we have tests for, and we tend not to get what don't have tests for. So agreeing executable non-functional tests tends to lead to better non-functional software quality, if it's done well.

As I warned before, though, be very careful what you wish for.


September 20, 2011

Don't Sell Your Customer Short With Code Quackery

There was a time when nobody had heard of "germs".

Back in the days before microbiology, life was - for most people - short and brutal. And folk believed all sorts of nonsense about what was causing their illnesses and how to cure themselves, most of which was about effective as eating scraps of paper with the word "cure" scribbled on them.

Then, one day, someone took a look at some "clean" water under a new invention - a sort of backwards telescope - and saw millions of tiny little animals swimming about in it (and, no doubt, peeing and pooing in it, too; the dirty little monkeys!)

We now understand that these single-celled organisms are able to invade our bodies via various conveniently-placed orifices and do unspeakable things to the cells inside us, producing symptoms ranging from a mild case of the lergy all the way up to severe bouts of chronic death.

We've learned that we can "bug-proof" our bodies against the effects of these nasty little critters by exposing our immune systems to milder, more polite versions of them. And, apart from one or two slightly ill-informed swimsuit models, most right-minded people accept that immunisation is the way to go if we want to live long, happy, death-free lives.

If you were to fall through a time warp and end up in the days before microscopes and microbiology and try to explain to someone with, say, a bad head cold that their symptoms are caused by tiny invisible creatures attacking the insides of their bodies, they wouldn't have believed you.

Head colds are obviously caused by demonic possession, they'd say, before promptly having you shipped off to the funny farm to have holes drilled into your mind by part-time hairdressers.

As software developers, we live in pre-enlightenment times where our customers and our managers are concerned.

Most non-technical stakeholders in software applications are well aware of the symptoms of poor code quality, but our attempts to convince them that schedule delays, cost overruns, high bug counts, spiralling support and maintenance costs and, ultimately, product death can be caused by tiny, invisible code problems like missing unit tests, switch statements and copy-and-paste inheritance have largely failed.

Programmers like me have observed these software germs under our microscopes and witnessed them attacking the cells of our application, turning young healthy modules into wheezy old decrepit blobs. And we've seen how, as the infection spreads, symptoms start to manifest at the system level and beyond. We have seen entire businesses killed by code quality problems.

But the average customer or senior manager is often unwilling to take it on trust that the problems they're suffering could have any connection to whether or not programmers chose meaningful variable names, or broke down complex functions to make them simpler and more testable.

They don't understand that code needs to be immunised against the worst of these problems with high automated test assurance and continuous monitoring of code quality. Nor do they understand the need to maintain a healthy immune system through practices like refactoring - the programming equivalent of your 5 a day.

They're currently happy to continue with their superstitions and software quackery - things they can see and believe in, like hiring more programmers or creating more documentation, even if there's no hard evidence that they work, and plenty of evidence that they make the symptoms worse.

We'd do well to remember that we are the doctors and they are the patients. We must stop humouring them and asking them what treatment they want us to give them, and instead focus on proper and thorough diagnosis, and offering them treatments that we know from evidence tend to work.

They may reject our advice and seek the help of witch doctors and voodoo, but the solution is not to throw up our hands in defeat and become witch doctors ourselves. Every time we do that, we reinforce their superstitions and our hypocrisy is paid for with the blood of a million dead code bases and hundreds of billions of pounds a year in lost business opportunities.



September 7, 2011

The Right Way To Do Code Reviews Is Not To Do Code Reviews

Code reviews are up there with major surgery and moving house among life's stressful experiences.

The typical code review, based on my own experience and war stories from friends, seems to be that the team gets together in a windowless, airless room with a copy of the code, and expresses opinions about what other people have written. And, as we all know, anything that involves programmers expressing opinions is unlikely to end well.

Adding insult to injury, code reviews tend not to lead to actual improvements in code or design quality. It's rather like being judged on X Factor: the judges have their say, but since music is actually not discussed in any meaningful or constructive way, you leave non the wiser as to what to change.

It doesn't have to be this way, of course. here are my tips for more constructive and productive code reviews:

1. DON'T HOLD CODE REVIEWS!

What? Well, the fact that you're having meetings about code quality is an indication that you're already doing code quality wrong. Do you have Unit Test Execution Meetings, too, where the team gets together and decides which tests are failing? Code review is a continual, ongoing activity. Not a meeting.

2. Set clear quality goals for code and design

It all gets a little less arbitrary and opinion-based when the team have agreed what it is they're actually aiming for. Are you aiming for simplicity? Then when reviewing code, you should be looking for lack of simplicity. Aiming for readability? Seek out code that is hard to understand. Aiming for scalability? Look for processes that can't be easily run in parallel, or load-balanced or clustered. Need a small memory footprint? Need fast response times? Etc etc.

3. Write tests for your quality goals

Need a small memory footprint? How small? Write a test that will fail if the footprint exceeds tolerances. Need code to be as simple as possible? Write a test that will fail when a method or class or package or system (or database schema, or config file) gets too big or too fiddly.

4. Run the tests for your quality goals at last every time you check your code in. The build is broken if any of them fail.

5. Review your code quality tests regularly. If the goals change, change the tests. But remember, raising the quality bar as the code evolves is much, much harder than lowering it. Which is why you'd best start with bar set as high as you can realistically manage, and try and keep it there.

6. Program in pairs. If code's not worth continuous review, it's not worth writing and maintaining.



June 27, 2011

Continuous Delivery is a Platform for Excellence, Not Excellence Itself

In case anyone was wondering, I tend to experience a sort of "heirarchy of needs" in software development. When I meet teams, I usually find out where they are on this ladder and ask them to climb up to the next rung.

It goes a little like this:

0. Are you using a version control system for your code? No? Okay, things really are bad. Sort this out first. You'd be surprised how much relies on that later. Without the ability to go back to previous versions of your code, everything you do will carry a much higher risk. This is your seatbelt.

1. Do you produce working software on a regular basis (e.g., weekly) that you can get customer feedback on? No? Okay, start here. Do small releases and short iterations.

2. How closely do you collaborate with the customer and the end users? If the answer is "infrequently", "not at all", or "oh, we pay a BA to do that", then I urge them to get regular direct collaboration with the customer - this means programmers talking to customers. Anything else is a fudge.

3. Do you agree acceptance tests with the customer so you know if you've delivered what they wanted? No? Okay, then you should start doing this. "Customer collaboration" can be massively more effective when we make things explicit. Teams need a testable definition of "done": it makes things much more focused and predictable and can save an enormous amount of time. Writing working code is a great way to figure out what the customer really needed, but it's a very expensive way to find out what they wanted.

4. Do you automate your tests? No? Well, the effect of test automation can be profound. I've watched teams go round and round in circles trying to stabilise their code for a release, wasting hundreds of thousands of pounds. The problem with manual testing (or little or noe testing at all), is that you get very long feedback cycles between a programmer making a mistake and that mistake being discovered. It becomes very easy to break the code without finding out until weeks or even months later, and the cost of fixing those problems escalates dramatically the later they're discovered. Start automating your acceptance tests at the very least. The extra effort will more than pay for itself. i've never seen an instance when it didn't.

5. Do your programmers integrate their code frequently, and is there any kind of automated process for building and deploying the software? No? Software development has a sort of metabolism. Automated builds and continuous integration are like high fibre diets. You'd be surprised how many symptoms of dysfunctional software development miraculously vanish when programmers start checking inevery hour or three. It will also be the foundation for that Holy Grail of software development, which will come to later.

6. Do your programmers write the tests first, and do they only write code to pass failing tests? No? Okay, this is where it gets more serious. Adopting Test-driven Design is a none-trivial undertaking, but the benefits are becoming well-understood. Teams that do TDD tend to produce mucyh more reliable code. They tend to deliver more predictably, and, in many cases, a bit sooner and with less hassle. They also often produce code that's a bit simpler and cleaner. Most importantly, the feedback we get from developer tests (unit tests) is often the most useful of all. When an acceptance test fails, we have to debug an entire call stack to figure out what went wrong and pinpoint the bug. Well-written unit tests can significantly narrow it down. We also get feedback far sooner from small unit tests than we do from big end-to-end tests, because we write far less code to pass each test. Getting this feedback sooner has a big effect on our ability to safely change our code, and is a cornerstone in sustaining the pace of development long enough for us to learn valuable lessons from it.

Now, before we continue, notice that I called it "Test-driven Design", and not "Test-driven Development". Test-driven Development is defined as "Test-driven Design + Refactoring", which brings us neatly on to...

7. Do you refactor your code to keep it clean? The thing about Agile that too many teams overlook is that being responsive to change is in no small way dependent on our ability to change the code. As code grows and evolves, there's a tendency for what we call "code smells" to creep in. A "code smell" is a design flaw in the code that indicates the onset of entropy - growing disorder in the code. Examples of code smells include things like long and complex methods, big classes or classes that do too many things, classes that depend too much on other classes, and so on. All these things have a tendency to make the code harder to change. By aggressively eliminating code smells, we can keep our code simple and malleable enough to allow us to keep on delivering those valuable changes.

8. Do you collect hard data to help objectively measure how well you're doing 1-7? If you come to me and ask me to help you diet (though God knows why you would), the first thing I'm going to do is recommend you buy a set of bathroom scales and a tape measure. Too many teams rely on highly subjective personal feelings and instincts when assessing how well they do stuff. Conversely, some teams - a much smaller number - rely too heavily on metrics and reject their own experience and judgement when the numbers disagree with their perceptions. Strike a balance here: don't rely entirely on voodoo, but don't treat statistics as gospel either. Use the data to inform your judgement. At best, it will help you ask the right questions, which is a good start towards 9.

9. Do you look at how you're doing - in particular at the quality of the end product - and ask yourselves "how could we do this better?" And do you actually follow up on those ideas for improving? Yes, yes, I know. Most Agile coaches would probably introduce retrospectives at stage 0 in their heirarchy of needs. I find, though, that until we have climbed a few rungs up that ladder, discussion is moot. Teams may well need them for clearing the air and for personal validation and ego-massaging and having a good old moan, but I've seen far too many teams abuse retrospectives by slagging everything off left, right and centre and then doing absolutely nothing about it afterwards. I find retrospectives far more productive when they're introduced to teams who are actually not doing too badly, actually, thanks very much. and I always temper 9 with 8 - too many retrospectives are guided by healing crystals and necromancy, and not enough benefit from the revealing light of empiricism. Joe may well think that Jim's code is crap, but a dig around with NDepend may reveal a different picture. You'd be amazed how many truly awful programmers genuinely believe it's everybody elses' code that sucks.

10. Can your customer deploy the latest working version of the software at the click of a mouse whenever they choose to, and as often as they choose to? You see, when the code is always working, and when what's in source control is never more than maybe an hour or two away from what's on the programmer's desktops, and when making changes to the code is relatively straightfoward, and when rolling back to previous versions - any previous version - is a safe and simple process, then deployment becomes a business decision. They're not waiting for you to debug it enough for it to be usable. They're not waiting for smal changes that should have taken hours but for some reason seem to take weeks or months. They can ask for feature X in the morning, and if the team says X is ready at 5pm then they can be sure that it is indeed ready and, if they choose to, they can release feature X to the end users straight away. This is the Holy Grail - continuous, sustained delivery. Short cycle times with little or no latency. The ability to learn your way to the most valuable solutions, one lesson at a time. The ability to keep on learning and keep on evolving the solution indefinitely. To get to this rung on my ladder, you cannot skip 1-9. There's little point in even trying continuous delivery if you're not 99.99% confident that the software works and that it will be easy to change, or that it can be deployed and rolled back if necessary at the touch of a button.

Now at this point you're probably wondering what happened to user experience, scalability, security, or what about safety-critical systems, or what about blah blah blah etc etc. I do not deny that these things can be very important. But I've learned from experience that these are things that come after 1-10 in my heirarchy of needs for programmers. That's not to say they can't be more important to customers and end users - indeed, user experience is often 1 on their list. But to achieve a great user experience, software that works and that can evolve is essential, since it's user feedback that will help us find the optimal user experience.

To put it another way, on my list, 10 is actually still at the bottom of the ladder. Continuous delivery and ongoing optmisation of our working practices is a platform for true excellence, not excellence itself. 10 is where your journey starts. Everything before that is just packing and booking your flights.




June 15, 2011

Slow & Dirty - A Rant (slides from SPA2011)

Okay, so here's a PDF version of my slides from today's invited rant at SPA2011. With thanks to the SPA team for allowing me to vent, and giving me booze.

Before you read it, a disclaimer: the value of metrics can go down as well as up. 'nuff said.