December 15, 2017

...Learn TDD with Codemanship

Code Reviews Are Just Another Kind Of Testing


One of my focuses in 2018 is going to be code reviews. As with any kind of testing (and, yes, code review is testing), the quality of the end product tends to reflect the effectiveness of our approach. More simply, teams who are better at code reviews tend to produce better code.

I think it really helps to see code review as testing. Framed in that context, we naturally consider the question of test assurance.

Test assurance has four dimensions:

Scope - How much of our code is tested?

Frequency - How often is our code tested?

Range - How many qualities are we testing for?

Efficacy - How good are our tests at catching bugs?

Code that isn't reviewed - unsurprisingly - tends to be of lower quality. So scope is our starter for ten. On many web projects, for example, I see a tendency to exclude JavaScript, HTML and CSS from regular review. This is probably for cultural reasons. Front end development is still playing catch-up with the server side when it comes to engineering practices.

The batch size of code reviews also seems to have a big impact on their effectiveness. When we're testing the logic of our code, we've learned that it's most economical to run our tests very frequently. Infrequent testing means large batch sizes of new or changed code. The old programmer joke about "show me a line of code and I'll tell, you what's wrong with it, show me 500 lines of code and I'll say 'looks okay to me'" applies here. Code review is best applied one design decision at a time. Infrequent reviews can miss a tonne of stuff.

Also, with lower frequency, comes later feedback. We know that logic errors cost exponentially more to fix the longer they go undiscovered. The same is true of maintainability bugs. The cost of change increases as the code grows. The cheapest time to refactor out a code smell is as soon as it's introduced.

Range is a no-brainer. We probably won't address code qualities we're not testing for. Simples. If we aren't checking for, say, Feature Envy, we'll likely end up with high coupling. When teams say "We do code reviews many times a day", it's important to qualify that with "reviewing for what, exactly?" In the majority of cases, code reviews are a highly subjective and ad hoc affair. What Jane may look for in the code, Jim might not.

Finally, the question arises "How good are our code reviews, anyway?" If Phil wrote a method that the team agreed is probably too long or too complex, does it get flagged by our review? The majority of code reviews are leaky buckets, letting through many, many issues because of their informal and ad hoc nature. Code reviews tend to be opportunistic, relying heavily on who is reviewing the code, which code they just happen to look at, and what happens to be going on in their brains at that specific moment. This has always been the weakness of relying on pair programming to ensure code quality.

Which brings me to a possible fifth dimension of test assurance: improvement. Are we learning and improving our tests? Inevitably, when we test for the stuff we thought of, we miss the stuff we didn't. Teams need to be regularly exploring the code for problems their code reviews might be missing, and adapt their inspections when necessary. And, over time, a developer's understanding of "good code" and "bad code" evolves. 25 years ago, I believed that a 100-line function was just fine. Today, I believe that 10 LOC is pushing it. Our code reviews need reviews, too.

Those of us who firmly believe that fast-running automated tests are the key to maintaining the integrity of our software as it evolves might ask ourselves how code review is any different. It's a kind of testing. Therefore, should it not be subject to the same factors as logic testing? It's not a huge leap to conclude that maybe it should.





Posted 4 months, 3 days ago on December 15, 2017