November 21, 2017

...Learn TDD with Codemanship

What Can We Learn About Dev Team Performance from Distributed System Design?

There are all sorts of analogies for software development teams ("teams are like a box of chocolates" etc), and one I find very useful is to picture them as distributed information processing systems.

Each worker process (person) has a job to do. Each job has information inputs and outputs. Each job requires data (knowledge). And the biggest overhead is typically not the time or effort required to process the information in each process, but the communication overhead between processes. This is why, as we add more people (worker processes), performance starts to degrade dramatically.

But, ironically, most of the available thinking about dev team performance focuses on optimising the processes, and not the communication between the processes.

Following this analogy, if we apply performance patterns for distributed computing to dev teams, we can arrive at some basic principles. In particular, if we seek to minimise the communication overhead without harming outcomes, we can significantly improve team performance.

Processes communicate to share data. The less data they need to share, the lower the communication overhead. And this is where we make our classic mistake; the original sin of software development, if you like.

Imagine our processes can act on data at a low level, but all the conditional logic is executed by external management processes that coordinate the workflow. So every time a worker process needs a decision to be made, it must communicate with a management process and wait for a response. Yes, this would be a terrible design for a distributed system. And yet, this is exactly how most dev teams operate. Teamwork is coordinated at the task level - the level of details. A more performant design would be to give individual worker processes a goal, and then let them make any decisions required to achieve that goal. Tell them the what and then let them figure out the how for themselves.

And I can attest from personal experience that dev teams that empower their developers to make the technical decisions perform much better.

But, as any developer who's worked on a team will tell you, there still needs to be coordination beween developers to reach consensus on the how. A classic example is how many teams fail to reach a consensus on how they implement model-view-controller, each one coming up with their own architecture.

Often, the amount of coordination and consensus needed can be front-loaded. Most of the key technical decisions will need to be made in the first few weeks of development. So maybe just take the hit and have a single worker process (the whole team) work together to establish baseline data, a skeleton of technical and logical architecture, of technical standards and common protocols (e.g., check-in etiquette) on which everyone can build mostly autonomously later. I've been doing this with teams since the days affordable portable data projectors became available. These days they call it "mob programming".

And, of course, there's unavoidably one shared piece of mutable data all processes have no choice but to act on in parallel: the code.

Much has been said on the subject of distributed version control of source code, most of focusing on entirely the wrong problem. Feature Branching, for example, tries to achieve more autonomy between developers by isolating their code changes from the rest of the team for longer. If every check-in is a database transaction (which it is - don't say it isn't), then this is entirely the wrong lever to be pulling on to speed things up. When we have many processes committing transactions to shared database, making the transactions bigger and longer won't speed the system up, usually. We're aiming not to break the data. The only way to be sure of that is to lock the data while the transaction's being written to the database. (Or to partition the data so that it's effectively no longer shared - more on that in a moment.)

To avoid blocking the rest of the worker processes, we need transactions to be over as soon as possible. So our check-ins need to be smaller and more frequent. In software development, we call this Continuous Integration.

It also helps if we split the shared data up, so each blob of data is accessed by fewer worker processes. More simply, the smaller the shared codebase, the less of a CI overhead. Partition systems into smaller work products.

But, just as partitioning software systems into - say - microservices - can increase the communication overhead (what were once method calls are now remote procedure calls), partitioning shared codebases creates a much greater overhead of communication between teams. So it's also vitally important that the various codebases are as decoupled as possible.

I rail against developers who add third-party dependencies to their software for very simple pieces of work. I call it "buying the Mercedes to use the cigarette lighter". In the world of microservices, system component needs to be largely responsible for doing their own work. Only add a dependency when the development cost of writing the code to do that bit of work is significantly greater than the potential ongoing communication overhead. You have to be merciless about minimising external dependencies. Right now, developers tend to add dependencies far too lightly, giving the additional costs little or no thought. And our tools make it far too easy to add dependencies. I'm looking at you, Maven, NuGet, Docker etc.

So, to summarise, here are my tips for optimising the performance of development teams:

1. Give them clear goals, not detailed tasks

2. Make developers as autonomous as possible. They have the technical data, let them make the technical decisions.

3. Accept that, initially, parallelism of work will be very difficult and risky. Start with mob programming to establish the technical approach going forward.

4. Small and frequent merging of code speeds up team performance. Long-lived code branches tend to have the reverse effect to that intended.

5. Partition your architectures so you can partition the code.

6. Manage dependencies between codebases ruthlessly. Duplicated logic can be cheaper to live with than inter-team communication.





Posted 5 months, 6 days ago on November 21, 2017