July 21, 2017

...Learn TDD with Codemanship

Calculating Reading Ease for Code

More deliberating this week about program readability.

This week I've thought a lot about how to make code easy to read.

According to a popular measure of reading ease, one of these sentences is easier to read.

The Flesch Reading Ease Score (FRES) was developed by author and Plain English advocate Rudolph Flesch in 1948.

The formula uses the average length of sentences in the text and the average number of syllables in the words to estimate the required educational level (in US education, the "grade level") of the audience.

FRES = 206.835 - (1.015 x Average Words Per Sentence) - (84.6 x Average Syllables Per Word)

This produces a number typically between 0 - 100, with higher values meaning better reading ease. The text of an advert aimed at a wide consumer audience might have a FRES of about 80-89 (meaning it could be understood by your average 8th-9th grader). The text of a peer-reviewed scientific paper might have a FRES less than 20, meaning that only college graduates might be able to understand it.

As you can see from the formula, the greatest emphasis is placed on the number of syllables in words. So, although "More deliberating this week about program readability" is a shorter sentence than "This week I've thought a lot about how to make code easy to read", its reading ease score is much lower.

This week I've thought a lot about how to make code easy to read.

- words per sentence = 14, syllables per word = 1.14

=> FRES = 93.9

Very easy to read

More deliberating this week about program readability

- words per sentence = 7, syllables per word = 2.43

=> FRES = - 10.7

Very difficult to read

I've been wondering for some time if the FRES formula could be applied to code. For sure, in code we have words. And those words have syllables. It would be straightforward to find all the words in a block of code or a source file and count them.

A very simple regex can find all the alphanumeric strings in a block of code, and I'd already figured out how to parse identifiers written in PascalCase and camelCase. (Sorry, underscores lovers. I'll add that soon.)

Counting syllables is less straightforward, but if you're prepared to accept a compromise, it's only 3 lines of code.

First we count the incidences of a, e, i, o, u and y in the word. e.g., Orbit has 2, and an estimate of 2 syllables would be right.

But what about diphthongs, where two vowels appear consecutively (e..g, boot, meet)? So, we eliminate the dipthongs from our count.

And then there are words that end with "e", or "es" or "ed": like "more", "shoves" and "bowed". We eliminate those vowels from our count.

It's not a perfect system, but it's way simpler - and more performant - than more accurate alternatives that require Natural Language Processing. And it's good enough for my purposes.

So, we know how many words there are, and we know (roughly) how many syllables per word. But what's the code equivalent of sentences? In my implementation, I started by interpreting statements as sentences. But in OO code especially, we can write a lot of code that contains few actual executable statements. In my first pass at a prototype tool, I counted how often ";" appeared in the code. Yes, this would only work in C-like languages...

In my second pass, I interpret sentences as lines of code that contain actual text. The results I got from this were a bit more convincing, and also this had the distinct advantage of working on pretty much all programming languages.

And so, it's done: a rough and ready .NET console app*, which you can download from http://www.codemanship.co.uk/files/Codemanship.Readability.zip (requires .NET 4.5 framework).

It's very simple, only one command-line argument, which is the text you want it to calculate the FRES for.


> Codemanship.Readability.exe "public int foo() { return 100 * 100; }"

I've rigged mine up as an external tool in Visual Studio that processes any selected text.

Just select the text you want it to analyse, and then run the tool.

Easy as peas.

You can find the source code at https://github.com/jasongorman/Readability

* Provided with no warranty or support

Posted 2 years, 9 months ago on July 21, 2017