December 9, 2018
Big Dependency Problems Lie In The Small DetailsJust a quick thought about dependencies. Quite often, when we talk about dependencies in software, we mean dependencies between modules, or between components or services. But I think perhaps that can blinker us to a wider and more useful understanding.
A dependency is a relationship between two pieces of code where a change to one could break the other.
If we consider these two lines of code, deleting the first line would break the second line. The expression x + 2 depends on the declaration of x.
Dependencies increase the cost of changing code by requiring us, when we change one thing in our code, to then change all the code that depends on it. (Which, in turn can force us to have to change all the code that depends on that. And so on.)
If our goal is to keep the cost of changing code low, one of the ways we can achieve that is to try to localise these ripples so that - as much as possible - they're contained within the same module. We do this by packaging dependent code in the same module (cohesion), and minimising the dependencies between code in different modules (coupling). The general term for this is encapsulating.
If I move x and y into different classes, we necessarily have to live with a dependency between them.
Now, deleting x in Foo will break y in Bar. Our ripple spreads across the class boundary.
Of course, in order to organise our code into manageable modules, we can't avoid having some dependencies that cross the boundaries. This is often what people mean when they say that splitting up a class "introduces dependencies". It doesn't, though. It redistributes them. The dependency was always there. It just crosses a boundary now.
And this is important to remember. We've got to write the code we've got to write. And that code must have dependencies - unless you're smart enough to write lines of code that in no way refer to other lines of code, of course.
Remember: dependencies between classes are actually dependencies between the code in those classes.
As we scale up from modules to components or services, the same applies. Dependencies beween components are actually dependencies beteween modules in those components, which are actually dependencies between code inside those modules. If I package Foo in one microservice and Bar in another: hey presto! Microservice dependencies!
I say all of this because I want to encourage developers, when faced with dependency issues in large-scale architecture, to consider looking at the code itself to see if the solution might actually lie at that level. You'd be surprised how often it does.
December 2, 2018
Architecture: The Belated Return of Big Picture ThinkingA question that's been stalking me is "When does architecture happen in TDD?"
I see a lot of code (a LOT of code) and if there's a trend I've noticed in recent years it's an increasing lack of - what's the word I'm looking for? - rationality in software designs as they grow.
When I watch dev teams produce working software (well, the ones who do produce software that works, at least), I find myself focusing more and more on when the design decisions get made.
In TDD, we can make design decisions during four distinct phases of the red-green-refactor cycle:
1. Planning - decisions we make before we write any code (e.g., a rough sequence diagram that realises a customer test scenario)
2. Specifying- decisions we make while we're writing a failing test (e.g., calling a function to do what you need done for the test, and then declaring it in the solution code)
3. Implementing - decisions we make when we're writing the code to pass the test (e.g., using a loop to search through a list)
4. Refactoring - decisions we make after we've passed the test according to our set of organising principles (e.g., consolidating duplicate code into a reusable method)
If you're a fan of Continuous Delivery like me, then a central goal of the way you write software is that it should be (almost) always shippable. Since 2 and 3 imply not-working code, that suggests we'd spend as little time as possible thinking about design while we're specifying and implementing. While the tests are green (1 and 4), we can consider design at our leisure.
I can break down refactoring even further, into:
4a. Thinking about refactoring
4b. Performing refactorings
Again, if your goal is always-shippable code, you'd spend as little time as possible executing each refactoring.
Put more bluntly, we should be applying the least thought into design while we're editing code.
(In my training workshops, I talk about Little Red Riding Hood and the advice her mother gave her to stay on the path and not wander off into the deep dark forest, where dangers like Big Bad Wolves lurk. Think of working code as the path, and not-working code as the deep dark forest. I encourage developers to always keep at least one foot on the path. When they step off to edit code, they need to step straight back on as quickly as possible.)
Personally - and I've roughly measured this - I make about two-thirds of design decisions during refactoring. That is, roughly 60-70% of the "things" in my code - classes, methods, fields, variables, interfaces etc - appear during refactoring:
* Extracting methods, constants and such to more clearly document what code does
* Extracting methods and classes to consolidate duplicate code
* Extracting classes to eliminate Primitive Obsession (e.g., IF statements that hinge on what is obviously an object identity represented by a literal vaue)
* Extracting and moving methods to eliminate Feature Envy in blocks of code and expressions
* Extracting methods and classes to split up units of code that have > 1 reason to change
* Exctracting methods to decompose complex conditionals
* Extracting client-specific interfaces
* Introducing parameters to make dependencies swappable
And so on and so on.
By this process, my code tends to grow and divide like cells with each new test. A complex order emerges from simple organising principles about readabililty, complexity, duplication and dependencies being applied iteratively over and over again. (This is perfectly illustrated in Joshua Kerievky's Refactoring to Patterns.)
I think of red-green-refactor as the inner loop of software architecture. And lots of developers do this. (Although, let's be honest, too many devs skimp on the refactoring.)
But there's architecture at higher levels of code organisation, too: components, services, systems, systems of systems. And they, too, have their organising principles and patterns, and need their outer feedback loops.
This is where I see a lot of teams falling short. Too little attention is paid to the emerging bigger picture. Few teams, for example, routinely visualise their components and the dependencies between them. Few teams regularly collaborate with other teams on managing the overall architecture. Few devs have a clear perspective on where their work fits in the grand scheme of things.
Buildings need carpentry and plumbing. Roads need tarmaccing. Sewers need digging. Power lines need routing.
But towns need planning. Someone needs to keep an eye on how the buildings and the roads and the sewers and the power lines fit together into a coherent whole that serves the people who live and work there.
Now, I come from a Big ArchitectureTM background. And, for all the badness that we wrought in the pre-XP days, one upside is that I'm a bit more Big Picture-aware than a lot of younger developers seem to be these days.
After focusing almost exclusively on the inner loop of software architecture for the last decade, starting in 2019 I'm going to be trying to help teams build a bit of Big Picture awareness and bring more emphasis on the outer feedback loops and associated principles, patterns and techniques.
The goal here is not to bring back the bad old days, or to ressurect the role of the Big Architect. And it's definitely not to try to reanimate the corpse of Big Design Up-Front.
This is simply about nurturing some Big Picture awareness among developers and hopefully reincorporating the outer feedback loops into today's methodologies, which we misguidedly threw out with the bathwater during the Agile Purges.
And, yes, there may even be a bit of UML. But just enough, mind you.
October 19, 2018
How Not To Use An ORM?An anti-pattern I see often is applications - often referred to as "enterprise" applications - that have database transactions baked into their core logic via a "data access layer".
It typically goes something like this:
"When the order page loads, we fetch the order via an Order repository. Then we take the ID of that order and use that to fetch the list of order items via an Order Item repository. Then we load the order item product descriptions via a Product repository. We load the customer information for the order, using the customer ID field of the order, via a Customer repository. And then the customer's address via an Address repository.
"It's all nicely abstracted. We have proper separation of concerns between business logic and data access because we're using repositories, so we can stub out all the data access for testing.
"Yes, it does run a little slow, now that you ask. I wonder why that is?"
Then, behind the repositories, there's usually a query that's constructed using the object key or foreign keys - to retrieve the result of what ought to be a simple object navigation: order.items is implemented as orderItemRepository.items(orderId). You may believe that that you've abstracted the database because you're going through a repository interface, and possibly/probably using an object-relational mapping tool to fetch the entities, but if you're writing code that stitches object graphs together using keys and foreign keys, then you are writing the ORM tool. You're just using the off-the-shelf ORM as an xDBC substitute. It's the old "we used an X tool to build an X tool" problem. (See also "MVC frameworks built using MVC frameworks".)
The goal of an ORM is to make the mapping from tables and joins to object graphs Somebody Else's ProblemTM. That's a simpler way of defining true separation of concerns. As such, we should aim to write our core logic in the simplest object-oriented way we can, so that - ideally - the whole thing could run in memory with no database at all. Saving and fetching stored objects just happens. Not a foreign key or object repository in sight. It can vastly simplify the code (including test code).
The most powerful and flexible ORMs - like Hibernate - make this possible. I've written entire "enterprise" applications that could be run in memory, with the mapping and persistence happening entirely outside the core logic. In terms of hexagonal architecture, I treat data access as an external dependency and try to minimise it as much as possible. I don't write data access "layers".
Teams that go down the "layered" route tend to end up with heaps of code that depends directly on the ORM they're using (to write an ORM). It's a similar - well, these days, identical - problem to Java teams who do dependency injection using Spring and end up massively dependent on Spring - to the extent that their code can only be run in a Spring context.
At best, they end up with thousands of tests that have to stub and mock the data access layer so they can test ther core logic. At worst, they end up only being able to test their core logic with a database attached.
The ORM's magic doesn't come for free, of course. Yes, there's a heap of tweaking you need to do to make a completely seperated persistence/mapping component work. Many decisions have to be made (e.g., lazy loading vs. pre-emptive vs. SQL views vs. second level caching etc etc) to make it performant, but you were making those decisions anyway. You just weren't using the ORM to handle them, because you were too busy writing your own.
August 3, 2018
Keyhole APIs - Good for Microservices, But Not for Unit TestingI've been thinking a lot lately about what I call keyhole APIs.
A keyhole API is the simplest API possible, that presents the smallest "surface area" to clients for its complete use. This means there's a single function exposed, which has the smallest number of primitive input parameters - ideally one - and a single, simple output.
To illustrate, I had a crack at TDD-ing a solution to the Mars Rover kata, writing tests that only called a single method on a single public class to manipulate the rover and query the results.
You can read the code on my Github account.
This produces test code that's very loosely coupled to the rover implementation. I could have written test code that invokes multiple methods on multiple implementation classes. This would have made it easier to debug, for sure, because tests would pinpoint the source of errors more closely.
If we're writing microservices, keyhole APIs are - I believe - essential. We have to hide as much of the implementation as possible. Clients need to be as loosely coupled to the microservices they use as possible, including microservices that use other microservices.
I encourage developers to create these keyhole APIs for their components and services more and more these days. Even if they're not going to go down the microservice route, its helpful to partition our code into components that could be turned into microservices easily, shoud the need arise.
Having said all that, I don't recommend unit testing entirely through such an API. I draw a distinction there: unit tests are an internal thing, a sort of grey-box testing. Especially important is the ability to isolate units under test from their external dependencies - e.g., by using mocks or stubs - and this requires the test code to know a little about those dependencies. I deliberately avoided that in my Mars Rover tests, and so ended up with a design where dependencies weren't easily swappable in ths way.
So, in summary: keyhole APIs can be a good thing for our architectures, but keyhole developer tests... not so much.
June 29, 2018
.NET Code Analysis using NDependIt's been a while since I used it in anger, but I've been having fun this week reaquainting myself with NDepend, the .NET code analysis tool.
Those of us who are interested on automating code reviews for Continuous Inspection have a range of options for .NET - ranging from tools built on the .NET Cecil decompiler - e.g., FxCop - to compiler-integrated tools on the Roslyn platform.
Ou of all them, I find NDepend to be by far the most mature. It's code model is much more expressive and intuitive (oh, the hours I've spent trying to map IL op codes on to source code!), it integrates out of the box with a range of popular build and reporting tools like VSTS, TeamCity, Excel and SonarQube. And in general, I find I'm up and running with a suite of usable quality gates much, much faster.
Under the covers, I believe we're still in Cecil/IL territory, but all the hard work's been done for us.
Creating analysis projects in NDepend is pretty straightforward. You can either select a set of .NET assemblies to be analysed, or a Visual Studio project or solution. It's very backwards-compatible, working with solutions as far back as VS 2005 (which, for training purposes, I still use occasionally).
I let it have a crack at the files for the Codemanship refactoring workshop, which are deliberately riddled with tasty code smells. My goal was to see how easy it would be to use NDepend to automatically detect the smells.
It found all the solution's assemblies, and crunched through them - building a code model and generating a report - in about a minute. When it's done, it opens a dashboard view which summarises the results of the analysis.
There's a lot going on in NDepend's UI, and this would be a very long blog post if I explored it all. But my goal was to use NDepend to detect the code smells in these projects, so I've focused on the features I'd use to do that.
First of all, out of the box with the code rules that come with NDepend, it has not detected any of the smells I'm interested in. This is tyicaly of any code analysis tool: the rules are not your rules. They're someone else's interpretation of code quality. FxCop's developers, for example, evidently have a way higher tolerance for complexity than I do.
The value in these tools is not in what they do out of the box, but in what you can make them do with a bit of thought. And for .NET, NDepend excels at this.
In the dialog at the bottom of the NDepend window, we can explore the code rules that it comes with and see how they've been implemented using NDepend's code model and some LINQ.
I'm interested in methods with too many parameters, so I clicked on that rule to bring up its implementation.
I happen to think that 5 parameters is too many, so could easily change the threshold where this rule is triggered in the LINQ. When I did, the results list immediately updated, showing the methods in my solution that have too many parameters.
This matches my expectation, and the instant feedback is very useful when creating custom quality gates - really speeds up the learning process.
To view the offending code, I just had to double click on that method in the results list, and NDepend opened it in Visual Studio. (You can use NDepend from within Visual Studio, too, if you want a more seamless experience.)
The interactive and integrated nature of NDepend makes it a useful tool to have in code reviews. I've always found going through the code inspecting source files by eye looking for issues hard work and really rather time-consuming. Being able to search for them interatively like this can help a lot.
Of course, we don't just want to look for code smells in code reviews - that's closing the stable door after the horse has bolted a lot of the time. It's quite fashionable now for dev teams to include code reviews as part of their check-in process - the dreaded Pull Request. It makes sense, as a last line of defence, to try to prevent issus being checked into the code respository. What I'm seeing more and more, though, is that pull requests can become a bottleneck for the team. Like any manual testing, it slows us down and hampers Continuous Delivery.
The command-line version of NDepend can easily be integrated into your build pipeline, allowing for some pretty comprehensive code reviews that can be performed automatically (and therefore quickly, alleviating the bottleneck).
I decided to turn this code rule into a quality gate that coud be used in a build, and set a policy that it should fail the build if more than 5 examples of long parameter lists are found.
So, up and running with a simple quality gate in no time. But what about more complex code smells, like message chains and feature envy? In the next blog post I'll go deeper into NDepend's Code Query Language and explore the kinds of queries we can create with more thought.
June 20, 2018
Design Principles Are The Key To A Testing PyramidOn the 3-day Codemanship TDD workshop, we discuss the testing pyramid and why optimising your test suites for fast execution is critical to achieving continuous delivery.
The goal with the pyramid is to be able to test as much of our software as possible as quickly as possible, so we can re-test and reassure ourselves that our code is shippable very frequently (i.e., continuously).
If our tests take hours to run, then we can only run them every few hours. Those are hours during which we don't know if the software's shippable.
So the bulk of our automated tests - the base of testing pyramid - should be fast-running "unit" tests. This typically means tests that have no external dependencies. (That's my working definition of "unit" test, for the purposes of making the argument for excluding file systems, databases, web services and the like from the majority of our tests.)
The purpose of our automated tests is to detect when code is broken. Every time we change a line of code, it can break the software. Therefore we need a test to catch every potential broken LOC.
The key to a good testing pyramid is to minimise the tests that have external dependencies, and the key to that is minimising the amount of code that has external dependencies.
I explain in the workshop how our design principles help us achieve this - and three in particular:
* Single Responsibility
* Don't Repeat Yourself
* Dependency Inversion
Take the example of a module that has a method which:
1. Formats a SQL string using data from a business object
2. Connects to a database to execute that query
3. Unpacks the response (recordset or array) into a business object
To test any part of this logic, we must include a trip to the database. If we break it up into 3 methods, each with a distinct responsibility, then it becomes possible to test 1. and 3. without including 2. That's a third as many "integration" tests.
In a similar vein, imagine we have data access objects, each like our module above. Each can format a SQL string using an object's data - e.g., CustomerDAO, InvoiceDAO, OrderDAO. Each connects to the database for fetch and save that object type's data. Each knows how to unpack the database response into the corresponding object type.
There's repetition in this design: connecting to the database. If we consolidate that code into a single module, we again reduce the number of integration tests we need.
Finally, we have to consider the call stack in which database connections are being made. Consider this poor design for a video rental system:
When we examine the code, we see that the methods that have direct external dependencies are not swappable within the overall call stack.
We cannot test pricing a video rental without paying a visit to the external video ratins service. We cannot test rentals without trips to the database, either.
To exclude these external dependencies from a set of tests for Rental, we have to turn those dependencies upside-down (make them swappable by dependency injection, basically).
This is often what people mean when they talk about "testable" code. In effect, it means there's enough "swappability" in our design to allow us to test the bulk of the logic by mocking or stubbing external dependencies. The win-win here is that we not only get a better-proportioned testing pyramid, we also get a more flexible design that can more readily accommodate change. e.g., getting video ratings from Rotten Tomatoes instead.)
March 9, 2018
S.O.L.I.D. C# - Online Training Event, Sat Apr 14thDetails of another upcoming 1-day live interactive training workshop for C# developers looking to take their design skills to the next level.
I'll be helping you get to grips with S.O.L.I.D. and much more besides with practical hands-on tutorials and exercises.
Places are limited. You can find out more and grab your place at https://www.eventbrite.co.uk/e/solid-c-tickets-44018827498
December 1, 2017
Don't Succumb To "Facebook Envy". Solve The Problem In Front Of YouA trend that's been growing for some time now is what I call "Facebook envy". Dev teams working on bread-and-butter problems seem almost embarrassed not to be solving problems on the scale Facebook have to.
99.9% of developers are not working at this scale, and are never likely to. And yet I see a strange obsession with scale that too often distracts teams from more pressing problems.
I use the analogy of a rock band obsessing over how their songs can be arranged for a 90-piece orchestra for a performance at the massive O2 Arena in London, and failing to prepare for their upcoming gig in the bowling alley at the back of the local pub.
Of course, we hear the stories about tech start-ups who didn't prepare for greatness and discovered that their architecture didn't scale. We hear those stories precisely because of the pervasiveness of those businesses in our lives after the fact. Just like all the stories we read about how bands became mega-successful, because who wants to read about bands that didn't? History is written by the winners.
What we don't hear about is the other 999/1000 tech start-ups who did prepare for greatness and wasted their time and their money doing it.
Before a tech start-up needs to scale, it needs to survive. Surviving means solving the problems that are in front of you. By all means, keep one eye on the future - a sense of direction's important. But not both eyes.
It's a similar thing to cash flow. Sure, your product may be super-profitable eventually. But if you can't pay your staff and keep the lights on in the meantime, you won't be there to collect.
The best way to scale-proof your start-up is to solve today's problems in a way that doesn't block you from adapting to tomorrow's. This is why I work so hard to persuade teams to focus on the factors that make software and systems hard to change, instead of on trying to anticipate those changes at the start. It tends to make the end product far more complicated than it needed to be, and things rarely turn out the way you planned.
Some common technologies are inherently scalable, too. Indeed, these days, most technology stacks are, even if it takes a bit more imagination to achieve it. Facebook are the best example of this. Who'd have thought, 12 years ago, that PHP and MySQL would scale to a billion users? Facebook solved the problems that were in front of them. They didn't adapt those technologies speculatively... just in case they ended up with a billion users.
If you use scalable technologies, design your architectures in such a way that they would be easy to partition if needed (i.e., separate concerns cleanly from the get-go), and - most importantly - deliver code that can be changed when new times require it, then you'll be able to solve today's tangible problems and keep the door open to tomorrow's intangible possibilities.
July 22, 2017
Code Analysis for Dependency Inversion
As work continues on the next book and training course, I'm thinking about how we could analyse our code for adherence to the Dependency Inversion Principle (the "D" in S.O.L.I.D.)
The DIP states that "High-level modules should not depend upon low-level modules. Both should depend upon abstractions. Abstractions should not depend upon details, details should depend upon abstractions."
This is a roundabout way of saying dependencies should be swappable. The means by which we make them swappable is dependency injection (often confused with Dependency Inversion, and the two are very closely related.)
Dependency injection is simply passing an objects collaborators in (e.g., through a constructor) instead of that object instantianting them itself. When we directly instantiate an object, we bind ourselves to its exact type. This makes it impossible to swap that collaborator with a different implementation without modifying the client code, making our design inflexible and difficult to adapt or extend.
In practice, what this means is that most of our objects are composed from the outside.
For example, in my Reading Ease calculator, the Program class - the entry point for this console app - creates all of the objects involved in doing the calculation and "plugs" them together via constructors.
I've used the analogy of Russian dolls to describe how we compose simpler collaborations into more complex collaborations (collaborations within collaborations). This means that the lowest-level objects in the call stack typically get created first.
Inside those lower-level classes, there's no direct instantiation of collaborators.
So, when we analyse the dependencies, we should find that classes that have clients in our code - classes that are further down the call stack - don't directly instantiate their collaborators.
More simply, if things depend on you, then don't use new.
There are, of course, exceptions. Factories and Builders are designed to instantiate and hide the details. Integration code - e.g., opening database connections - is also designed to hide details. We can't very well pass our database connections into those, or we'd be spreading that knowledge. Typically what we're talking about here is dependencies on our own classes. And what a kerfuffle it would be to try to apply DIP to strings and ints and collections and other core library types all the time. Though, again, there are situations where that may be called for.
If I was measuring adherence to the Dependency Inversion Principle, then, I'd look at a class and ask "Do any other of my classes depend on this?" If the answer is "yes", then I'd check to see if it creates instances of any other of my classes. I might also check - and this would be language-dependent - if those dependencies are on abstract types (abstract classes, interfaces).
July 10, 2017
Codemanship Bite-Sized - 2-Hour Trainng Workshops for Busy Teams
One thing that clients mention often is just how difficult it is to make time for team training. A 2 or 3-day course takes your team out of action for a big chunk of time, during which nothing's getting delivered.
For those teams that struggle to find time for training, I've created a spiffing menu of action-packed 2-hour code craft workshops that can be delivered any time from 8am to 8pm.
- Test-Driven Development workshops
- Introduction to TDD
- Specification By Example/BDD
- Stubs, Mocks & Dummies
- Outside-In TDD
- Refactoring workshops
- Refactoring 101
- Refactoring To Patterns
- Design Principles workshops
- Simple Design & Tell, Don’t Ask
- Clean Code Metrics
To find out more, visit http://www.codemanship.co.uk/bitesized.html