August 11, 2017
Update: Code Craft "Driving Test" FxCop Rules
I've been continuing work on a tool to automatically analyse .NET code submitted for the Code Craft "Driving Test".
Despite tying myself in knots for the first week trying to build a whole code analysis and reporting framework - when will I ever learn?! - I'm making good progress with Plan B this week.
Plan B is to go with the Visual Studio code analysis infrastructure (basically, FxCop). It's been more than a decade since I wrote FxCop rules, and it's been an uphill battle wrapping my head around it again (along with all the changes they've made since 2006).
But I now have 8 code quality rules that are kind of sort of working. Some work well. Some need more thought.
1. Methods can't be longer than 10 LOC
2. Methods can't have > 2 branches
3. Identifiers must be <= 20 characters. (Plan is to exempt test fixture/method names. TO-DO.)
4. Classes can't have > 8 methods (so max class size is 80 LOC)
5. Methods can't use more than one feature of another class. (My very basic interpretation of "Feature Envy". Again, TO-DO to improve that. Again, test code may be exempt.)
6. Boolean parameters are not allowed
7. Methods can't have > 3 parameters
8. Methods can't instantiate project types, unless they are factory or builder methods that return an abstract type. (The beginning of my Dependency Inversion "pincer movement". 2 more rules to come preventing invocation of static project methods, and methods that aren't virtual or abstract. Again, factories and builders will be exempt, as well as test code.)
What's been really fun about the last couple of weeks has been eating my own dog food. As each new rule emerges, I've been applying it frequently to my own code. I'm a great believer in the power of Continuous Inspection, and this has been a timely reminder of just how powerful it can be.
After passing every test, and performing every refactoring, I run a code analysis that will eventually systematically check all my code for 15 or so issues. I fix any problems it raises there and then. I don't commit or push code that fails code analysis.
In Continuous Inspection, this is the equivalent of all my tests being green. Of course, as with functional tests, the resulting code quality may only be as good as the code quality tests. And I'm refining them with more and more examples, and applying them to real code to see what designs they encourage. So far, not so shabby.
And for those inevitable occasions when blindly obeying the design rules would make our code worse, the tool will have a mechanism for opting out of a rule. (Probably a custom attribute that you can apply to classes and fields and methods etc, specifying which rule you're breaking and - most importantly - documenting why. Again, a TO-DO.) In the Driving Test, I'm thinking candidates will get 3 such "hall passes".
If you want to see the work so far, and try it out for yourself, the source code's at https://github.com/jasongorman/CodeCraft.FxCop
And I've made a handful more tickets available for the trial Code Craft "Driving Test" for C# developers on Sept 16th. It's free this time, in exchange for your adventerous and forgiving participation in this business experiment :)
August 9, 2017
Clean Code isn't a Programming Luxury. It's a Business NecessityI'm not going to call out the tweeter, to spare their blushes, but today I saw one of those regular tweets denigrating the business value of "clean code".
This is an all-too-common sentiment I see being expressed at all levels in our industry. Clean Code is a luxury. A nice-to-have. It's just prima donna programmers making their code pretty. Etc etc.
Nothing could be further from the truth. Just from direct personal experience, I've seen several major businesses brought to their knees by their inability to adapt and evolve their software.
There was the financial services company who struggled for nearly a year to produce a stable release of the new features their sales people desperately needed.
There was the mobile software giant who was burning $100 million a year just fixing bugs while the competition was racing ahead.
There was the online recruitment behemoth who rewrote their entire platform multiple times from scratch because the cost of changing it became too great. Every. Single. Time.
I see time and time again businesses being held back by delays to software releases. In an information economy - which is what we now find ourselves in (up to our necks) - the ability to change software and systems is a fundamental factor in a business's ability to compete.
Far from being a luxury, Clean Code - code that's easy to change - is a basic necessity. It is to the Information Age what coal was to the Industrial Age.
No FTSE500 CEO these days can hatch a plan to change their business that doesn't involve someone somewhere changing software. And the lack of responsiveness from IT is routinely cited as a major limiting factor on how their business performs.
Code craft is not about making our code pretty. It's about delivering value today without blocking more value tomorrow. It's that simple.
August 1, 2017
Codemanship Code Craft Driving Test - Trial Run
In my last blog post, I talked about the code quality criteria for our planned "driving test" for code crafters.
If you're interested in taking the test, I'm planning a trial run on Saturday Sept 16th. Initially, it will be just for C# programmers.
Entry is free (this time). You'll need a decent Internet connection, Visual Studio (2013 or later, Community is fine), and a GitHub account. Resharper is highly recommended (you can download a trial version from https://www.jetbrains.com/resharper/download/ - but don't do it now, because it will time out before the trial!) You'll be writing tests with NUnit 2.6.x. You can use whichever .NET mocking framework you desire. No other third-party libraries should be required.
The project we'll set you should take 4-8 hours, and you'll have 24 hours to submit your solution and your screencast link after we begin.
If you've been on Codemanship training courses, the quality criteria should make sense to you, and you'll know what we're looking for in your screencast.
If you haven't, then I recommend twisting the boss's arm and sending them a link to http://www.codemanship.com :)
You can register using the form below.
July 31, 2017
Codemanship Code Craft "Driving Test" - Code Quality CriteraMuch pondering today about the code quality standards that should be applied in the Codemanship Code Craft Driving Test we'll be trialling at the end of the summer.
Since it's a test, I think it's fair to apply more rigorous and unyielding standards, provided that these are laid out unambiguously in advanced.
We can divide it up into 7 key areas, with some slightly different criteria for test code to allow for more verbose method names and a bit more code duplication:
1. It Works
* Your solution passes all of the customer tests we'll give you
* Your solution also survives a more exhaustive suite of tests to hunt for any lurking bugs
2. It's Readable
* The Conceptual Correlation between your code and the requirements is > 80%
* Non of your identifiers contain > 20 characters (except for test method names)
* No line of your code contains > 100 characters
3. It's Low in Duplication
* A check using Simian will reveal no more than 15% code duplication (We'll give you the precise Simian options so you can check for yourself)
4. It's Made of Simple Parts
* No method will contain > 10 LOC
* No method will have > 2 branches
* No method will have > 3 parameters
* No method will have Boolean parameters
* No class will have > 6 methods
5. It's Made of Swappable Parts
* Excluding in your tests, all dependencies on other classes in your solution will be swappable by dependency injection. Use of DI frameworks will also be forbidden. (Dependency Inversion)
* No non-test class will invoke any method on an instance of another class in the solution that can't be easily extended or swapped (e.g., in C#, only methods on interfaces or virtual methods can be invoked)
6. The Parts Are Loosely Coupled
* No class will depend on > 3 other classes in your solution
* No method will exhibit Feature Envy (when a method of one class uses more than one method of another) for other classes in your solution (Tell, Don't Ask)
* No class or interface will expose features to another class that it doesn't use (Interface Segregation)
* No class will invoke methods on solution classes which are not direct collaborators (i.e., fields or parameters) (Law of Demeter)
7. Test Code Quality
* No unit test will make more than one assertion (or mock object equivalent)
* There will be exactly one unit test method per requirements rule, the name of the test will clearly describe the rule
* All of the unit tests will pass without any external dependencies
* There will be a maximum of 10% integration test code, packaged separately
* The tests will run in < 10 seconds
* Tests will contain < 25% code duplication
Now, even though there's quite a lot of meat on these bones, these criteria may change, of course. But probably not much.
In the trial, I'll be verifying many of them by hand. This will give ma chance to validate them and iron out any conceptual kinks.
The long-term intention is that most - if not all - of these checks will be automated. Initially, I'm working on doing that in C# for the .NET developer community.
The code quality criteria will form half the score for the driving test. To pass it, you'll also need to demonstrate your practices and habits, and explain why you're doing them, so we can evaluate how much insight you have into code craft and the reasons for it. This will be done by recording a 30-minute screencast at some point during the test that we can assess.
More news soon.
July 15, 2017
Finding Load-Bearing Code - Thoughts On ImplementationI've been unable to shake this idea about identifying the load-bearing code in our software.
My very rough idea was to instrument the code and then run all our system or customer tests and record how many times methods are executed. The more times a method gets used (reused), the more critical it may be, and therefore may need more of our attention to make sure it isn't wrong.
This could be weighted by estimates for each test scenario of how big the impact of failure could be. But in my first pass at a tool, I'm thinking method call counts would be a simple start.
So, the plan is to inject this code into the beginning of the body of every method in the code under test (C# example), using something like Roslyn or Reflection.Emit:
The MethodCallCounter could be something as simple as a wrapper to a dictionary:
And this code, too, could be injected into the assembly we're instrumenting, or a reference added to a teeny tiny Codemanship.LoadBearing DLL.
Then a smidgen of code to write the results to a file (e.g., a spreadsheet) for further analysis.
The next step would be to create a test context that knows how critical the scenario is, using the customer's estimate of potential impact of failure, and instead of just incrementing the method call count, actually adds this number. So methods that get called in high-risk scenarios are shown as bearing a bigger load.
External to this would be a specific kind of runner (e.g., NUnit runner, FitNesse, SpecFlow etc) that executes the tests while changing the FailureImpact value using information tagged in each customer test somehow.
(PS. This is also kind of how I'd add logging to a system, in case you were wondering.)
July 13, 2017
Do You Know Where Your Load-Bearing Code Is?Do you know where your load-bearing code is?
In any system, there's some code that - if it were to fail - would be a big deal. Identifying that code helps us target our testing effort to where it's really needed.
But how do we find our load-bearing code? I'm going to propose a technique for measuring the "load-beariness" of individual methods. Let's call it criticality.
Working with your customer, identify the potential impact of failure of specific usage scenarios. It's about like estimating the relative value of features, only this time we're not asking "what's it worth?". We're asking "what's the potential cost of failure?" e.g., applying the brakes in an ABS system would have a relatively very high cost of system failure. Changing the font on a business report would have a relatively low cost of failure. Maybe it's a low-risk feature by itself, but will be used millions of times every day, greatly amplifying the risk.
Execute a system test case. See which methods were invoked end-to-end to pass the test. For each of those methods, assign the estimated cost of failure.
Now rinse and repeat with other key system test cases, adding the cost of failure to every method each scenario hits.
A method that's heavily reused in many low-risk scenarios could turn out to be cumulatively very critical. A method that's only executed once in a single very high-risk scenario could also be very critical.
As you play through each test case, you'll build a "heat map" of criticality in your code. Some areas will be safe and blue, some areas will be risky and red, and a few little patches of code may be white hot.
That is your load-bearing code. Target more exhaustive testing at it: random, data-driven, combinatorial, whatever makes sense. Test it more frequently. Inspect it carefully, many times with many pairs of eyes. Mathematically prove it's correct if you really need to. And, of course, do whatever you can to simplify it. Because simpler code is less likely to fail.
And you don't need code to make a start. You could calculate method criticality from, say, sequence diagrams, or from CRC cards, to give you a heads-up on how rigorous you may need to be implementing the design.
May 30, 2017
Do You Write Automated Tests When You Spike?So, I've been running this little poll on Twitter asking devs if they write automated tests when they're knocking up a prototype (or a "spike", as Extreme Programmers call it).
Do you write any automated tests when you do a proof of concept (a "spike", in XP)?— Codemanship (@codemanship) May 29, 2017
The responses so far have been interesting, if not entirely unexpected. About two thirds of rarely or never write automated tests for a spike.
Behind this is the ongoing debate about the limits of usefulness of such tests (and of TDD, if we take that a step further). Some devs believe that when a problem is small, or when they expect to throw away the code afterwards, automated tests add no value and just slow us down.
My own experience has been a slow but sure transition from not bothering with unit tests for spikes 15 years ago, to almost always writing some unit tests even on small experiments. Why? Because I've found - and I've measured myself doing it, so it's not just a feeling - I get my spike done faster when I have a bit of test scaffolding holding it up.
For sure, I'm not as rigorous about it as when I'm working on production code. The tests tends to be at a higher level, and there are fewer of them. I may break a few of my own TDD rules and have tests that ask more than one question, or I may not refactor the test code quite as fastidiously. But the tests are there, nevertheless. And I'm usually really grateful that I wrote some, as the experiment grows and maybe makes some unexpected twists and turns.
And if - as can happen - the experiment becomes part of the production code, I'm confident that what I've produced is just about good enough to be released and maintained. I'm not in the business of producing legacy code... not even by accident.
An example of one of my spikes, for a utility that combines arrays of test data for use with parameterised tests, gives you an idea of the level of discipline I might usually apply. Not quite production quality, but not that far off.
The spike took - in total - maybe a couple of days, and I was really grateful for the tests by the second day. In timed experiments, I've seen me tackle much smaller problems faster when I wrote automated tests for them as I went along. Which is why, for me, that seems to be the way to go. I get done sooner, with something that could potentially be released. It leaves the door open.
Other developers may find that they get done sooner without writing automated tests. With TDD, I'm very much in my comfort zone. They may be outside it. In those instances, they probably need to be especially disciplined about throwing that code away to remove the temptation of releasing unreliable, unmaintainable code.
They could rehabilitate it, writing tests after the fact and refactoring the code to give it a production sparkle. Some people refer to this process as "spike & stabilise". But, to me, it does rather sound like "code and fix". Because, technically, that's exactly what it is. And experience - not just mine, but a mountain of hard data going back decades - strongly suggests that code and fix is the slow route to delivery.
So I'm a little skeptical, to say the least.
May 16, 2017
My Obligatory Annual Rant About People Who Warn That You Can Take Quality Too Far Like It's An Actual Thing That Happens To Dev TeamsIf you teach developers TDD, you can guarantee to bump into people who'll warn you of the dangers of taking quality too far (dun-dun-duuuuuuun!)
"We don't write the tests first because it protects us from over-testing our code", said one person recently. Ah, yes. Over-testing. A common problem in software.
"You need to be careful not to refactor your code too much", said another. And many's the time I've looked at code and thought "This program is just too easy to understand!"
I can't help recalling the time a UK software company, whose main product had literally thousands of open bugs, hired a VP of Quality and sent him around the dev teams warning them that "perfection is the enemy of good enough". Because that was their problem; the software was just too good.
It seems to still pervade our industry's culture, this idea that quality is the enemy of getting things done, despite mountains of very credible evidence that - in the vast majority of cases - the reverse is true. Most dev teams would deliver sooner if they delivered better software. Not aiming for perfection is the enemy of getting shit done more accurately sums up the relationship between quality and productivity in our line of work.
That's not to say that there aren't any teams who have ever taken it too far. In safety-critical software, the costs ramp up very quickly for very modest improvements in reliability. But the fact is that 99.9% of teams are so far away from this asymptote that, from where they're standing, good enough and too good are essentially the same destination.
Worry about wasting time on silly misunderstandings about the requirements. Worry about wasting time fixing avoidable coding errors. Worry about wasting time trying to hack your way through incomprehensible spaghetti code to make changes. Worry about wasting your time doing the same repeatable tasks manually over and over again.
But you very probably needn't worry about over-testing your code. Or about doing too much refactoring. Or about making the software too good. You're almost certainly not in any immediate danger of that.
April 27, 2017
20 Dev Metrics - The Story So FarSo, we're half way into my series 20 Dev Metrics, and this would seem like a good point to pause and reflect on the picture we're building so far.
I should probably stress - before we get into a heated debate about all of the 1,000,001 other things we could be measuring while we do software development - that my focus here is primarily on two things: sustaining the pace of innovation, and delivering reliable software.
For me, sustaining innovation is the ultimate requirements discipline. Whatever the customer asks for initially, we know from decades of experience that most of the value in what we deliver will be added as a result of feedback from using it. So the ability to change the code is critical to achieving higher value.
Increasing lead times is a symptom, and our metrics help us to pinpoint the cause. As code gets harder and harder to change, delivery slows down. We can identify key factors that increase the cost of change, and teams I've coached have reduced lead times dramatically by addressing them.
Also, when we deliver buggy code, more and more time is spent fixing bugs and less time spent adding value. It's not uncommon to find products that have multiple teams doing nothing but fixing bugs on multiple versions, with little or no new features being added. At that point, when the cost of changing a line of code is in the $hundreds, and the majority of dev effort is spent on bug fixes, your product is effectively dead.
And maybe I'm in a minority on this (sadly, I do seem to be), but I think - as software becomes more and more a part of our daily lives, and we rely on it more and more - what we deliver needs to be reliable enough to depend on. This, of course, is relative to the risks using the software presents. But I think it's fair to say that most software we use is still far from being reliable enough. So we need to up our game.
And there's considerable overlap between factors that make code reliable, and factors that impede change. In particular, code complexity increases the risks of errors as well as making the code harder to understand and change safely. And frequency and effectiveness of testing - e.g., automating manual tests - can have a dramatic impact, too.
My first 10 metrics will either reveal a virtuous circle of high reliability and a sustainable pace, or a vicious cycle of low quality and an exponentially increasing cost of change.
The interplay between the metrics is more complex than my simplistic diagrams can really communicate. In future posts, I'll be highlighting some other little networks of interacting metrics that can tell us more useful stuff about our code and the way we're creating it.
April 25, 2017
20 Dev Metrics - 8. Test AssuranceNumber 8 in my Twitter series 20 Dev Metrics is a very important one, when it comes to minimising the cost of changing code - Test Assurance.
If there's one established fact about software development, it's that the sooner we catch bugs the cheaper they are to fix. When we're changing code, the sooner we find out if we broke it, the better. This can have such a profound effect on the cost of changing software that Michael Feathers, in his book 'Working Effectively with Legacy Code' defines 'legacy code' as code for which we have no automated tests.
The previous metric, Changes per Test Run give us a feel for how frequently we're running our tests - an often-forgotten factor in test assurance - but it can't tell us how effective our tests would be at catching bugs.
We want an answer to the question: "If this code was broken, would we know?" Would one or more tests fail, alerting us to the problem. The probability that our tests will catch bugs is known as "test assurance".
Opinions differ on how best to measure test assurance. Probably because it's easier to measure, a lot of teams track code coverage of tests.
For sure, code that isn't covered by tests isn't being tested at all. If only half your code's covered, then the probability of bugs being caught in the half that isn't is definitely zero.
But just because code is executed in a test, that doesn't necessarily mean it's being tested. Arguably a more meaningful measure of assurance can be calculated by doing mutation testing - deliberately introducing errors into lines of code and then seeing if any tests fail.
Mutation testing's a bit like committing burglaries to measure how effective the local police are at detecting crimes. Be careful with automated mutation testing tools, though; they can throw up false positives (e.g., when a convergent iterative loop just takes a bit longer to fund the right answer after changing its seed value). But most of these tools are configurable, so you can learn what mutations work and what mutations tend to throw up false positives and adapt your configuration.