June 25, 2017

...Learn TDD with Codemanship

Conceptual Correlation

I've long recommended running requirements documents (e.g., acceptance tests) through tag cloud generators to create a cheap-and-cheerful domain glossary for developers to refer to when we need inspiration for a name in our code.

But, considering today how we might assess the readability of code automatically, I imagined what we could learn by doing this for both the requirements and our code, and then comparing the resulting lexicons to see how much conceptual overlap there is.

I'm calling this overlap Conceptual Correlation, and I'm thinking this wouldn't be too difficult to automate in a basic form.

The devil's in the detail, of course. "Noise" words like "the", "a", "and" and so on would need to be stripped out. And would we look for exact words matches? Would we wish to know the incidence of each word and include that in our comparison? (e.g., if "flight" came up often in requirements for a travel booking website, but only mentioned once in the code, would that be a weaker correlation?)

I'm thinking that something like this, coupled with a readability metric similar to the Flesch-Kincaid index, could automatically highlight code that might be harder to understand.

Lots to think about... But it also strikes me as very telling that tools like this don't currently exist for most programming languages. I could only find one experimental tool for analysing Java code readability. Bizarre, when you consider just what a big deal we all say readability is.

Posted 3 weeks, 4 days ago on June 25, 2017