July 6, 2017

...Learn TDD with Codemanship

Conceptual Correlation - Source Code + How To Build Your Own

Although it's only rough and ready, I've published the source code for my Conceptual Correlation calculator so you can get a feel for how it works and how you might implement your own in whichever language you're interested in.

It's actually only about 100 lines of code (not including tests), and if I put my brain in gear, it could well be signiicantly less. It's a pretty simple process:

1. Parse the code (or the IL code, in this case) using a parse, compiler, decompiler - whatever will get you the names used in the code

2. Tokenize those code names into individual words (e.g., thisMethodName becomes "this" "method" "name"

3. Tokenize the contents of a requirements text file

4. Filter stop words (basically, noise - "the", "at", "we", "I" etc) from these sets of words. You can find freely available lists of stop words online for many languages

5. Lemmatize the word sets - meaning to boil down different inflections of the same word ("report", "reports", "reporting" to a single dictionary root)

6. Optionally - just for jolly - count the occurances of each word

7. Calculate what % of the set of code words are also contained in the requirements words

8. Output the results in a usable format (e.g., console)

No doubt someone will show us how it can be done in a single line of F#... ;)

Posted 3 years, 10 months ago on July 6, 2017