July 15, 2017

Learn TDD with Codemanship

Finding Load-Bearing Code - Thoughts On Implementation

I've been unable to shake this idea about identifying the load-bearing code in our software.

My very rough idea was to instrument the code and then run all our system or customer tests and record how many times methods are executed. The more times a method gets used (reused), the more critical it may be, and therefore may need more of our attention to make sure it isn't wrong.

This could be weighted by estimates for each test scenario of how big the impact of failure could be. But in my first pass at a tool, I'm thinking method call counts would be a simple start.

So, the plan is to inject this code into the beginning of the body of every method in the code under test (C# example), using something like Roslyn or Reflection.Emit:

The MethodCallCounter could be something as simple as a wrapper to a dictionary:

And this code, too, could be injected into the assembly we're instrumenting, or a reference added to a teeny tiny Codemanship.LoadBearing DLL.

Then a smidgen of code to write the results to a file (e.g., a spreadsheet) for further analysis.

The next step would be to create a test context that knows how critical the scenario is, using the customer's estimate of potential impact of failure, and instead of just incrementing the method call count, actually adds this number. So methods that get called in high-risk scenarios are shown as bearing a bigger load.

External to this would be a specific kind of runner (e.g., NUnit runner, FitNesse, SpecFlow etc) that executes the tests while changing the FailureImpact value using information tagged in each customer test somehow.


(PS. This is also kind of how I'd add logging to a system, in case you were wondering.)

July 13, 2017

Learn TDD with Codemanship

Do You Know Where Your Load-Bearing Code Is?

Do you know where your load-bearing code is?

In any system, there's some code that - if it were to fail - would be a big deal. Identifying that code helps us target our testing effort to where it's really needed.

But how do we find our load-bearing code? I'm going to propose a technique for measuring the "load-beariness" of individual methods. Let's call it criticality.

Working with your customer, identify the potential impact of failure of specific usage scenarios. It's about like estimating the relative value of features, only this time we're not asking "what's it worth?". We're asking "what's the potential cost of failure?" e.g., applying the brakes in an ABS system would have a relatively very high cost of system failure. Changing the font on a business report would have a relatively low cost of failure. Maybe it's a low-risk feature by itself, but will be used millions of times every day, greatly amplifying the risk.

Execute a system test case. See which methods were invoked end-to-end to pass the test. For each of those methods, assign the estimated cost of failure.

Now rinse and repeat with other key system test cases, adding the cost of failure to every method each scenario hits.

A method that's heavily reused in many low-risk scenarios could turn out to be cumulatively very critical. A method that's only executed once in a single very high-risk scenario could also be very critical.

As you play through each test case, you'll build a "heat map" of criticality in your code. Some areas will be safe and blue, some areas will be risky and red, and a few little patches of code may be white hot.

That is your load-bearing code. Target more exhaustive testing at it: random, data-driven, combinatorial, whatever makes sense. Test it more frequently. Inspect it carefully, many times with many pairs of eyes. Mathematically prove it's correct if you really need to. And, of course, do whatever you can to simplify it. Because simpler code is less likely to fail.

And you don't need code to make a start. You could calculate method criticality from, say, sequence diagrams, or from CRC cards, to give you a heads-up on how rigorous you may need to be implementing the design.

July 10, 2017

Learn TDD with Codemanship

Codemanship Bite-Sized - 2-Hour Trainng Workshops for Busy Teams

One thing that clients mention often is just how difficult it is to make time for team training. A 2 or 3-day course takes your team out of action for a big chunk of time, during which nothing's getting delivered.

For those teams that struggle to find time for training, I've created a spiffing menu of action-packed 2-hour code craft workshops that can be delivered any time from 8am to 8pm.

Choose from:

  • Test-Driven Development workshops

    • Introduction to TDD

    • Specification By Example/BDD

    • Stubs, Mocks & Dummies

    • Outside-In TDD

  • Refactoring workshops

    • Refactoring 101

    • Refactoring To Patterns

  • Design Principles workshops

    • Simple Design & Tell, Don’t Ask

    • S.O.L.I.D.

    • Clean Code Metrics

To find out more, visit http://www.codemanship.co.uk/bitesized.html

July 9, 2017

Learn TDD with Codemanship

Why You Should Put Learning Opportunities Front-And-Centre In Dev Recruitment

A little Twitter poll I ran under the Codemanship acccount seems to confirm something that many of us have been saying for years. 41% of those polled said they could be lured from their current job by greater opportunities to learn.

Software development is a career that involves lifelong learning, and lot's of it. That's how we progress. So it doesn't come as a surprise that it was ranked significantly higher than "More money".

It's surprising, then, that learning opportunities don't figure higher in dev recruitment campaigns. I've banged this drum with clients many, many times. Want to attract and retain great developers? Make learning - mentoring, conferences, training, time to read, time to share - a greater part of the job. No, scratch that. Accept that learning is the job, and build your team culture around that inescapable fact.

When you're hiring, don't just look for what they know now. Look for their potential to learn. And their potential to teach (e.g. by example) the stuff they know so others can learn from them. And clear the way for that to happen. A lot.

Sadly, such employers are too few and far between. The unreasonable and unrealistic attitude that developers should arrive knowing everything they need to know, and no learning should go on on company time, is a leading cause of developer attrition.

It's also the reason why you've been searching in vain these last 6 months for a fluent Mandarin and Dutch-speaking full-stack JS/Node/Java/Clojure/Ruby/NoSQL/SQL/Docker/COBOL/Eiffel/Vim/Linux/Windows developer who has an HGV license and is licensed to practice medicine (salary: market rate).

Software developers tend to be highly educated, but the most important thing we learn is how to learn and it's one of the most important skills your money can buy. In return, one of the most valuable perks you can offer them is more opportunities to learn.

As a professional trainer and mentor, I am of course biased. I tell devs what I do, and they say "Wow, you must be really busy!" and I say "You'd think so, wouldn't you?" But the reality is that the majority of employers don't offer their devs any training at all, let alone time to, say, read a book.

The kind of bosses I run training for are unfortunately very much in the minority. Although, interestingly, they seem to have a lot less trouble hiring good developers.

Funny, that...

July 6, 2017

Learn TDD with Codemanship

Conceptual Correlation - Source Code + How To Build Your Own

Although it's only rough and ready, I've published the source code for my Conceptual Correlation calculator so you can get a feel for how it works and how you might implement your own in whichever language you're interested in.

It's actually only about 100 lines of code (not including tests), and if I put my brain in gear, it could well be signiicantly less. It's a pretty simple process:

1. Parse the code (or the IL code, in this case) using a parse, compiler, decompiler - whatever will get you the names used in the code

2. Tokenize those code names into individual words (e.g., thisMethodName becomes "this" "method" "name"

3. Tokenize the contents of a requirements text file

4. Filter stop words (basically, noise - "the", "at", "we", "I" etc) from these sets of words. You can find freely available lists of stop words online for many languages

5. Lemmatize the word sets - meaning to boil down different inflections of the same word ("report", "reports", "reporting" to a single dictionary root)

6. Optionally - just for jolly - count the occurances of each word

7. Calculate what % of the set of code words are also contained in the requirements words

8. Output the results in a usable format (e.g., console)

No doubt someone will show us how it can be done in a single line of F#... ;)

July 5, 2017

Learn TDD with Codemanship

A Little Test for My Conceptual Correlation Metric

Here's a little test for my prototype .NET command line tool for calculating Conceptual Correlation. Imagine we have a use case for booking seats on flights for passengers.

The passenger selects the flight they want to reserve a seat on. They choose the seat by row and seat number (e.g., row A, seat 1) and reserve it. We create a reservation for that passenger in that seat.

We write two implementations: one very domain-driven...

And one... not so much.

We run Conceptual.exe over our first project's binary to compare against the use case text, and get a good correlation.

Then we run it over the second project's output and get zero correlation.

QED :)

You can download the prototype here. What will it say about your code?

Learn TDD with Codemanship

Conceptual Correlation - Prototype Tool for .NET

With a few hours spare time over the last couple of days, I've had a chance to throw together a simple rough prototype of a tool that calculates the Conceptual Correlation between a .NET assembly (with a .pdb file in the same directory, very important that!) and a .txt file containing requirements descriptions. (e.g., text copied and pasted from your acceptance tests, or use case documents).

You can download it as a ZIP file, and to use it, just unzip the contents to a folder, and run the command-line Conceptual.exe with exactly 2 arguments: the first is the file name of the .NET assembly, the second is the file name of the requirements .txt.


Conceptual.exe "C:\MyProjects\FlightBooking\bin\debug\FlightBooking.dll" "C:\MyProjects\FlightBooking\usecases.txt"

I've been using it as an external tool in Visual Studio, with a convention-over-configuration argument of $(BinDir)\$(TargetName)$(TargetExt) $(ProjectDir)\requirements.txt

I've tried it on some fair-sized assemblies (e.g., Mono.Cecil.dll), and on some humungous text files (the entire text of my 200-page TDD book - all 30,000 words), and it's been pretty speedy on my laptop and the results have been interesting and look plausible.

Assumes code names are in PascalCase and/or CamelCase.

Sure, it's no Mercedes. At this stage, I just want to see what kind of results folk are getting from their own code and their own requirements. Provided with no warranty with no technical support, use at own risk, your home is at risk if you do not keep up repayments, mind the gap, etc etc. You know the drill :)

Conceptual.exe uses Mono.Cecil to pull out code names, and LemmaSharp to lemmatize words (e,g, "reporting", "reports" become "report"). Both are available via Nuget.

Have fun!

July 4, 2017

Learn TDD with Codemanship

Are We Only Pretending To Care About Cost of Change?

Wise folk have occasionally told me - when I've claimed that "I really wanted X" - that, if X was within my control, then I couldn't have wanted it badly enough or I'd have X.

You know, like when someone says "I really wish I knew Spanish"? Obviously, they really don't. Or they'd know Spanish.

Likewise when development teams say "I really wish we understood our end users better". Evidently not. Or we'd understand our end users better.

And, talking about it today with colleagues, there's a nice little list of things development teams are only pretending to care about. If they did, they'd have done something about it.

Take the cost of changing code. Is your team tracking that? Do you know how much it cost to add, change or delete a line of code for your last release? Do you know how the cost of changing is, well, changing?

The vast majority of teams don't keep those kinds of records, even though the information is almost always available to figure it out. Got version control? You can get a graph of code churn. Got project management or accounts? Then you know how much money was spent during those same periods. Just divide the latter by the former, and - bazinga! - cost of changing a line of code.

The fact that most of us don't have that number to hand strongly suggests that, despite our loudest protestations, we don't really care about it.

And what's very interesting is that it's no different within the software craftsmanship community. We talk about cost of change a great deal, but I've yet to meet a self-identifying software craftsperson who knows the cost of changing their own code.

This seems, to me, to be like a club for really serious golf enthusiasts in which nobody knows what their handicap is. At the very least, should we not be building a good-sized body of data to back up our claims that code craft really does reduce the cost of change? It's been nearly a decade since the software craftsmanship manifesto. What have we been doing with our time that's more important than verifying its central premise?

July 2, 2017

Learn TDD with Codemanship

Conceptual Correlation - A Working Definition

During an enjoyable four days in Warsaw, Poland, I put some more thought into the idea of Conceptual Correlation as a code metric. (hours of sitting in airports, planes, buses, taxis, trains and hotel bars gives plenty of time for the mind to wander).

I've come up with a working definition to base a prototype tool on, and it goes something like this:

Conceptual Correlation - the % of non-noise words that appear in names of things in our code (class names, method names, field names, variable names, constants, enums etc) that also appear in the customer's description of the problem domain in which the software is intended for use.

That is, if we were to pull out all the names from our code, parse them into their individual words (e.g., submit_mortgage_application would become "submit" "mortgage" "application"), and build a set of them, then Conceptual Correlation would be the % of that set that appeared in a similar set created by parsing, say, a FitNesse Wiki test page about submitting mortgage applications.

So, for example, a class name like MortgageApplicationFactory might have a conceptual correlation of 67% (unless, of course, the customer actually processes mortgage applications in a factory).

I might predict that a team following the practices of Domain-Driven Design might write code with a higher conceptual correlation, perhaps with just the hidden integration code (database access, etc) bringing the % down. Whereas a team that are much more solution-driven or technology-driven might write code with a relatively lower conceptual correlation.

For a tool to be useful, it would not only report the conceptual correlation (e.g,, between a .NET assembly and a text file containing its original use cases), but also provide a way to visualise and access the "word cloud" to make it easier to improve the correlation.

So, if we wrote code like this for booking seats on flights, the tool would bring up a selection of candidate words from the requirements text to replace the non-correlated names in our code with.

I currently envisage this popping up as an aid when we use a Rename refactoring, perhaps accentuating words that haven't been used yet.

A refactored version of the code would show a much higher conceptual correlation. E.g.,

The devil's in the detail, as always. Would the tool need to make non-exact correlations, for example? Would "seat" and "seating" be a match? Or a partial match? Also, would the strength of the correlation matter? Maybe "seat" appears in the requirements text many times, but only once in the code. Should that be treated as a weaker correlation? And what about words that appear together? Or would that be making it too complicated? Methinks a simple spike might answer some of these questions.

June 25, 2017

Learn TDD with Codemanship

Conceptual Correlation

I've long recommended running requirements documents (e.g., acceptance tests) through tag cloud generators to create a cheap-and-cheerful domain glossary for developers to refer to when we need inspiration for a name in our code.

But, considering today how we might assess the readability of code automatically, I imagined what we could learn by doing this for both the requirements and our code, and then comparing the resulting lexicons to see how much conceptual overlap there is.

I'm calling this overlap Conceptual Correlation, and I'm thinking this wouldn't be too difficult to automate in a basic form.

The devil's in the detail, of course. "Noise" words like "the", "a", "and" and so on would need to be stripped out. And would we look for exact words matches? Would we wish to know the incidence of each word and include that in our comparison? (e.g., if "flight" came up often in requirements for a travel booking website, but only mentioned once in the code, would that be a weaker correlation?)

I'm thinking that something like this, coupled with a readability metric similar to the Flesch-Kincaid index, could automatically highlight code that might be harder to understand.

Lots to think about... But it also strikes me as very telling that tools like this don't currently exist for most programming languages. I could only find one experimental tool for analysing Java code readability. Bizarre, when you consider just what a big deal we all say readability is.