December 19, 2015
The Multi-threaded Santa TDD ChallengeIf you've hunted on the Interwebs for TDD katas, you will no doubt have noticed that they tend to be based on relatively straightforward problems. In particular, there are no - as far as I'm aware - TDD katas that set more challenging high-volume, multi-threaded, multi-process problems to solve.
In the spirit of Christmas - if you celebrate such a thing (and this year, I won't be) - I set some students just such a problem, which I'm calling the Multi-threaded Santa TDD Challenge.
Now, read this very carefully. It goes something like this:
Multi-threaded Santa TDD Challenge
The goal is to make, wrap, load and deliver 1 million presents on Xmas Eve. Play this challenge in pairs, with multiple pairs competing to do it for the lowest cost.
Santa and his elves have to make, wrap, load and deliver 1 million presents. They work in 4 teams:
Each team must be implemented in its own process (e.g, a web service for each activity, or a daemon, etc etc).
Each team either sends to or takes from (or both) one of 4 queues, and each queue has a maximum size, beyond which no more presents can be added:
1. Made (max size = 1000)
2. Wrapped (max size = 2000)
3. Loaded (max size = 5000 - these are all the presents on Santa's sleigh for delivery)
4. Delivered (no size limit)
How does Santa deliver all those presents in just one night, though? Simple. Santa Time works differently to our time. An hour in Santa Time is just 1 second (1,000ms) of human time. So 3600 hours of Santa Time elapse in 1 hour of human time.
You must implement these time delays in a Present class, with 4 distinct lifecycle event methods:
* Present.make() - wait 50ms
* Present.wrap() - wait 10ms
* Present.load() - wait 5ms
* Present.deliver() - well, we'll get to that...
To perform work on a present, a worker thread must invoke the appropriate method, incurring a time cost for that work within that thread. Only after that work has been performed can the present be sent to the queue for the next process to take from. e.g. one thread in the Making process called present.make(), then sends that present to the Made queue, where the Wrapping process can pick it up.
In each process, there are one or more elves (worker threads). Elves are managed by a fifth process: the Elf Pool. To add an elf (worker thread) to a process (e.g., wrapping), you must request an Elf ID from the Elf Pool.
The Elf Pool records every elf assigned to every process for the duration of the work.
Because of the very strict rules of the Elf Union, elves - once assigned to the work - cannot be laid off. So the total number of elves employed can only go up, not down, until all presents have been delivered.
Elves can, however, be swapped between the 4 work processes: so an elf assigned to wrapping can be re-assigned to loading, for example. This, too, must be managed through the Elf Pool.
One worker process (e.g. wrapping) signals an intention to reassign one of its elves to the Elf Pool. The elf remains in that process until another worker process (e.g., Loading) requests another elf, at which point - instead of assigning a new elf from the pool - that elf is re-assigned instead. So the total number of elves employed remains the same. If the Elf Pool has no elves awaiting re-assignment, it assigns a new elf when one is requested.
There is only one Santa, and only one sleigh!
Okay, here's your bottleneck. There can only be one worker thread in the Delivering process, and the delivery round takes a fixed amount of time each time the sleigh goes out, regardless of how many presents or elves are assigned to Delivery. It takes Santa 500ms to make one round of deliveries. i.e. delivering all the presents on Santa's sleigh - no matter how many there are - incurs a time delay of 500ms. At the end of that time, all the presents in the sleigh (i.e., on the Loaded queue) are delivered. This could, for example, be achieved with a Sleigh class that implements the time delay in a deliver() method.
Be advised that no presents can be loaded for delivery while Santa's sleigh is out on a round. This is your bottleneck! For 500ms (at least), the elves doing the loading will be idle. So you may want to reassign them to making or wrapping. But... hmmm... how long will that take?
How to score?
The Elf Pool has another job in this challenge: elves don't work for nothing. They work for cookies. Specifically, one elf assigned for one hour of Santa Time (1s of human time) gets paid 1 cookie. So 100 elves assigned for 1000s of human time would cost 100,000 cookies.
The Elf Pool calculates the cookie payroll, multiplying the number of assigned elves by the time it takes to deliver all the presents in human seconds.
To win, your team needs to deliver all 1 million presents for less cookies than your competitors.
Never Give In. Never Surrender.
Now, here's the kicker: once the start whistle blows, there is NO STOPPING. If any of your worker processes falls over, the work becomes blocked, the clock keeps on ticking, and you keep burning cookies. Restarting any worker process must re-attach all of the elves that were assigned to it via the Elf Pool before it can begin working on presents again.
If your Elf Pool falls over: you are disqualified, and Santa gets to kick you in the backside for all eternity for ruining Xmas!!!
So, there you have it: 1 million presents, 4 worker processes, as many worker threads as you like (within the rules), 4 work queues, and one manager process keeping track of the elves and the cookies and the time.
Easy as cheese. Average time for students to complete was 4 hours. My advice: keep it as simple as you can. I've already piled on plenty-much complexity for you to deal with!
February 4, 2015
Why Distribution & Concurrency Can Be A Lethal Cocktail For The Unwitting Dev TeamPicture the scene: it's Dec 31st 1990, a small town in upper state New York. I'm at a New Year's Eve party, young, stupid and eager to impress. The host mixes me my first ever Long Island Iced Tea. It tastes nice. I drink three large ones, sitting at their kitchen table, waxing eloquent about life, the universe and everything in my adorable English accent, and feeling absolutely fine. Better than fine.
And then, after about an hour, I get up to go to the bathroom. I'm not fine. Not fine at all. I appear to have lost the use of my legs, and developed an inner-ear problem that's affecting my normally balletically graceful poise and balance.
I proceed to be not-fine-at-all into the bathroom sink and several other receptacles, arguably none of which were designed for the purpose I'm now putting them to.
Long Island Iced Tea is a pretty lethal cocktail. Mixed properly, it tastes like a mildy alcoholic punch with a taste not dissimilar to real iced tea (hence the name), but one look at the ingredients puts pay to that misunderstanding: rum, gin, vodka, tequila, triple sec - ingredients that have no business being in the same glass together. It is a very alcoholic drink. Variants on the name, like "Three Mile Island" and "Adios Motherf***er", provide further clues that this is not something you serve at a child's birthday party.
I end the evening comatose on a water bed in a very hot room. This completes the effect, and Jan 1st 1991 is a day I have no memory of.
Vowing never to be suckered into a false sense of security by something that tastes nice and makes me feel better-than-fine for a small while, I should have known better than to get drawn like a lamb to the slaughter into the distributed components craze that swept software development in the late 1990's.
It went something like this:
Back in the late 1990's, aside from the let's-make-everything-a-web-site gold rush that was reaching a peak, there was also the let's-carve-up-applications-that-we-can't-even-get-working-properly-when-everything's-in-one-memory-address-space-and-there's-no-concurrency-and-distribute-the-bits-willy-nilly-adding-network-deficiencies-distributed-transactions-and-message-queues fad.
This was enabled by friendly technology that allowed us to componentise our software without the need to understand how all the undrelying plumbing worked. Nice in theory. You carve it, apply the right interfaces, deploy to your application server and everything's taken care of.
Except that it wasn't. It's very easy to get and up and running with these technologies, but we found ourselves continually having to dig down into the underlying detail to figure out why stuff wasn't working the way it was supposed to. "It just works" was a myth easily dispelled by looking at how many books on how this invisible glue worked were lying open on people's desktops.
To me, with the benefit of hindsight, object request brokers, remote procedure calls, message queues, application servers, distributed transactions, web services... these are the hard liquor of software development. The exponential increase in complexity - the software equivalent of alcohol units - can easily put unwitting development teams under the table.
I've watched so many teams merrily downing pints of lethal-but-nice-tasting cocktails of distribution and concurrency, feeling absolutely fine - better than fine - and then when it's time for the software to get up and walk any kind of distance... oh dear.
It turns out, this stuff is hard to get right, and the tools don't help much in that respect. They make it easy to mix these cocktails and easy to drink as much as you think you want, but they don't hold your hand when you need to go to the bathroom.
These tools are not your friends. They are the host mixing super-strength Long Island Iced Teas and ruining your New Year with a hangover that will never go away.
Know what's in your drink, and drink in moderation.
September 25, 2014
Functional Programming Is Great. But It Ain't Magic.An increasing annoyance in my day-to-day job as a coach and trainer is what I call FPF, or "Functional Programming Fanaticism". Typically, it emanates from people who've recently discovered FP in the last few years, and have yet to realise that - like all programming innovations since the 1940's - it doesn't actually solve all the problems for us.
Putting aside the widely-held perception that functional programs can be considerably less easy to understand, even for experienced FP-ers, (and this is no small consideration when you realise that trying to understand program code is where programmers spend at least half our time), there is the question of side effects.
More specifically, people keep telling me that functional programs don't have any. This is patently not true: a program with no side effects is a program which does nothing of any use to us. Somewhere, somehow, data's got to change. Try writing a word processor that doesn't have side effects.
FP helps us write more reliable code - in particular, more reliable concurrent code - by limiting and localising side effects. But only if you do it right.
It's entirely possible to write functional programs that are riddled with concurrency errors, and, indeed, that's what many teams are doing as we speak.
How can this be so, though, if functions are said to be "clean" - side-effect free? Well, that bank account balance that gets passed from one function to next may indeed be a copy (of a copy of a copy) of the original balance, but from the external user's perspective, whatever the current balance is, that is the balance (and it has changed.)
The moment we persist that change (e.g., by writing it to the database, or through transactional memory, or however we're handling shared data), the deed is done. Ipso facto: side effect.
Languages like Haskell, Clojure and that other one that sounds like "Camel" don't do our concurrent thinking for us. If joint account holder A checks their balance before trying to use the debit card, but joint account holder B uses their debit card before A does, then - you may be surprised to learn - these languages have no built-in feature for reconciling joint account transaction paradoxes like this. You have to THINK ABOUT HOW YOUR SOFTWARE SHOULD HANDLE CONCURRENT SCENARIOS from the user's perspective.
In non-FP work, we seek to make concurrent systems more reliable and more, well, concurrent, by strictly limiting and localising concurrent access to shared data. FP just embeds this concept within the languages themselves, making that easier and more reliable to do.
Just as workflow frameworks don't decide what should happen in your workflows, functional programs don't decide how your application should handle side-effects. The best they can do is give you the tools to realise the decisions you make.
What I'm seeing, though, (and this was case when we were all prostrating before the Great Workflow Ju Ju In The Sky a decade or so ago), is that teams mistakenly lean on the technology, believing through some kind of magic that it will handle these scenarios for them. But, like all computer programs, they will only do exactly what we tell them to.
It's not magic, folks. It's just code.
June 23, 2014
What's My Problem With Node.js?So you may have guessed by now, if you follow me on The Twitters, that I'm not the biggest fan of Node.js.
If programming is hard to get right, then distributed concurrent programming is - relatively speaking - impossible to get right. You will almost certainly get it wrong. And the more you do of it, the more wronger what it do be.
The secret to getting concurrency right is to do as little of it as you can get away with. Well-designed applications that achieve this tend to have small, isolated and very heavily tested islands of concurrency. Often they have signs on the shore warning travellers to "turn back, dangerous waters!", "beware of rabid dogs!", "danger: radiation!" and "Look out! Skeletor!" You know; stuff that tends to send right-minded folk who value their lives running in the opposite direction.
Node.js is a great big friendly sign that says "Come on in. Hot soup. Free Wi-Fi.", and it's left to salvage specialists like me to retrieve the broken wrecks.
So, yes, Node.js does make it easier to do distributed concurrency, in much the same way that a hammer makes it easier to drive nails into your head. And both are liable to leave you with a hell of a headache in the morning.
March 6, 2009
Software Craftsmanship 2009 Follow-UpThis went out as an email to delegates, but is reproduced here for completeness
First of all, if you made it to SC2009 on Feb 26th then a big hearty thanks for helping to make the day the success that your generous feedback suggests it was.
An especially big thanks goes to everyone who ran the sessions. Your hard work is much appreciated.
Also, many, many thanks to Robin Doran from BBC Backstage, Peter Camfield from BBC Worldwide and Kerry Jones from BBC Future Media & Technology for all your invaluable assistance in putting the event together. Thanks, too, to all the BBC delegates who helped out on the day. You did a sterling job.
The event was very generously sponsored by BBC Worldwide and BBC Backstage. You should check them out, 'cause they're doing some pretty cool stuff these days:
A few announcements:
* Were you at Keith Braithwaite's excellent TDD As If You Meant It? session? Do you still have a copy of the code you worked on? If the answer is Yes, then you might want to consider donating it to science! Please send zipped-up copies of your code to Dr Sue Black from University of Westminster so she and Steve Counsell from Brunel University can perform deranged experiments on it that hopefully will produce further insights into software maintainability.
* Don't forget to pay a visit to Sue's website dedicated to Saving Bletchley Park and see if you can help.
* A reminder about the upcoming Software Practice Advancement conference, which is being held in London this year at the BCS offices in Covent Garden starting on April 5th. I'm running a session on scaling up design reviews using automated code analysis. Apart from that, the rest of the programme looks pretty good, though ;-)
* Another reminder about Rachel Davies' Agile coaches gathering event which is being held at Bletchley Park on 22-23rd May
Immo Huneke has posted the outputs from his excellent and intimate session on My Favourite Keyboard Shortcuts on the - now publicly accessible - conference Wiki:
Gojko Adzic has posted slides and related material from his well-received session on Specification Workshops:
Nat Pryce's notes for his fascinating session on Testing Asynchronous Systems can be find here:
Ivan Sanchez posts his session materials for 5 Reasons To Have Coding Dojo here:
Feedback from the Blogosphere
Kerry Buckley very nicely summarises some of the sessions:
Gojko Adzic's great write-up of TDD As If You Meant It:
Markus Gaertner's summary of the conference:
Diego Pino shares his thoughts here:
Richard Fennell gets his thoughts down on virtual paper:
BBC Worldwide's Rob Bowley blogs about SC2009 and posts a great pic that illustrates how busy lunch was!
.NET journeyman Tim Ross posts:
SC2009 Vox Pops video
Will there be a Software Craftsmanship 2010 conference? Very probably, yes. Watch my blog for developments, and get your thinking caps on for session ideas.
Where do we continue this dialogue? A good start would be to sign up for the Software Craftsmanship Google Group. Many of the folk you might have met on feb 26th are known to frequent the Extreme Tuesday Club in London. Also, you'll find us at Software Practice Advancement, XPDay and many other Agile-leaning events.
What about a specific regular meet-up on Software Craftsmanship? Bingo! Great idea. What a clever fellow you are for suggesting it :-) Yes, we shouldn't let the energy and momentum we gathered at SC2009 fixxle out, so I'm going to propose a regular meeting that will be a sort of mini-SC2009 where we specifically meet-up with our laptops for hands-on learning, sharing, practice and alcohol. (The four major food groups, I think you'll find.) The XTC venue is too noisy, really. So I'll be seeking out a friendly venue with a decent-sized function room and maybe even a projector and screen. I'm also leaning very heavily towards a weekend timeslot, because my brain is usually pretty frazzled by 7pm on a weekday! Keep an eye on my blog, and on the Google group, for announcements very soon.
I'm sure I've forgotten something, but isn't that always the way. Drop me a line with any thoughts, complaints or offers of easy money.
October 28, 2008
Software Craftsmanship 2009 - Conference In DevelopmentFirst the good news.
I'm in the process of launching a new conference here in sunny old London Town (or "Larndarn Tarn", if you happen to have been born here).
I can't give away too much just yet, because:
a. There's not that much to give away, and
2. There's many a slip twixt cup and lip, and there's always the danger of these things falling through
But I can tell you that the working title for the conference is Software Craftsmanship 2009.
And I can tell you that the focus is going to be on the "hard skills" that take years to master. You know, the actual craft of writing good software. OO design, test-driven development, refactoring, build automation, architecture, patterns, code generation, modeling, concurrent and distributed programming. That sort of thing. Certainly there won't be any sessions about yet more things you can do with coloured bits of card and lego. Well, not unless anyone's discovered a way to generate working code from them.
I can also tell you that we have a provisional date and a provisionally booked venue. The provisional date is February 26th 2009. I'm not going to reveal the venue just yet, though. But it will be in London, rest assured.
Finally, I can tell you that the program selection committe is already starting to shape up very nicely indeed. And the invites are still going out, so we're looking forward to a very healthy pool of world-class expertise to help pick the final schedule.
Keep your eyes peeled for more information posted on this blog, or join my Yahoo! group for announcements.
An informal request for session proposals will be going out in about a week's time. Email me if you'd like to be included in this mailing.
October 23, 2008
Writing Thread-safe Code - What Can Go Wrong With Multithreaded Logic?I've been getting a good deal of encouragement to talk about multithreaded programming some more, as it seems to be an area of great technical interest to many of you out there in developerland.
It's also quite obviously an area of some considerable pain...
I want to talk about two specific kinds of things that can go wrong when two or more threads of execution start interacting with the same data and objects:
1. The pre-condition paradox
Jill and John have a joint bank account with a balance of $100. (They're not big earners, I should stress. Probably teachers. Or nurses.)
Jill sees a pair of shoes she really likes in a shop window during her lunch hour. She calls her bank's 24-hour telephone banking service to check the balance of the account to see if she has the $75 needed to pay for the shoes.
Meanwhile, John spies a boxed set of Battlestar Galactica Series 4 (though, as it turns out, it's only the first 10 episodes of series 4, the cheating buggers!) He stops at an ATM to check the account balance to see if there's enough in there to cover the $40 he needs to buy the DVDs.
Both Jill and John see that the account balance is $100. But in the time it takes John to get from the ATM back to the DVD store, Jill has already used her debit card to pay for the shoes. So when John hands his card over to the cashier, the account now only has $25 in it.
His card is rejected, in front of a huge queue of shoppers who are all in the kind of hurry that only lunchtime shoppers can be in. He gets egg on his face, and leaves with the shame of knowing that everyone in that store thinks he's a no-good bum. Which, of course, he is - I mean, what kind of person buys Battlestar Galactica box sets when he's only got $100 left in his account?
But anyway, I digress. The point is that because Jill and John both interacted with the account concurrently, it was possible for Jill's transaction to break the pre-condition of John's transaction after John had checked that the pre-condition for his transaction was satisfied.
One way for Jill and John to avoid this kind of scenario would be if one of them could obtain exclusive access to the account until their transaction - the whole transaction, starting with checking the account balance - is complete. This would mean, in effect, obtaining a temporary lock on the account until the transaction ended. Jill gets the lock first, and so John has to wait until her payment for the shoes has cleared before he can access the account, by which time the balance has changed and his pre-condition will fail when he checks it.
In reality, Jill and John probably wouldn't need to access the joint account at the same except on rare occasions. So locking the account and forcing the other account holder to wait would be a viable option from a performance perspective.
But if many people were trying to access the same account throughout the day - perhaps a large organisation with multiple branches all sharing a single bank account - then the time spent waiting for the account to become unlocked might be very noticeable.
Jill, having blown most of their balance on a pair of shoes, calls her Mom and asks if she can wire her some money. A funds transfer between two bank accounts requires a lock on both of them.
Mom's bank locks her account and then asks Jill's bank to lock Jill's account. Safe as houses, yes?
In the meantime, though, John has been speaking to Dad (who shares a joint account with Mom), and cooked up a similar plan (he's got his eye on a remote-controlled Dalek now, you see). Dad's bank also needs a lock on both accounts. But this time his bank asks John's bank to lock John's account first, and then to lock Dad's.
It could happen that while Mom gets a lock on Mom and dad's joint account, Dad gets a lock on John and Jill's joint account. Now each has to wait for the other person to unlock the other account. And they'll be waiting indefinitely, because Mom can't proceed until John and Jill's account is unlocked, and Dad can't proceed until his and Mom's account is unlocked.
A few simple strategies could have prevented this deadlock:
a. Lock shared objects/data in the same order
If Mom and Dad's bank had both asked to lock their joint account first, the deadlock could not have happened because Dad would have had to wait until Mom was done with both accounts before proceeding.
b. Lock shared objects/data in a single atomic step
If you need both acconts to execute a transfer, then lock both accounts at the same time and make all other threads wait before they can try to get a lock on any of them.
If a thread has been waiting to long to get a lock on an object or piece of data, have it release all of its locked objects/data in case that's what another deadlocked thread has been waiting for.
One thing I'm putting though into now is how one might use automated tests (or other lightweight QA techniques) to describe multithreaded correctness and detect mutli-threading defects.