October 19, 2018

...Learn TDD with Codemanship

How Not To Use An ORM?

An anti-pattern I see often is applications - often referred to as "enterprise" applications - that have database transactions baked into their core logic via a "data access layer".

It typically goes something like this:

"When the order page loads, we fetch the order via an Order repository. Then we take the ID of that order and use that to fetch the list of order items via an Order Item repository. Then we load the order item product descriptions via a Product repository. We load the customer information for the order, using the customer ID field of the order, via a Customer repository. And then the customer's address via an Address repository.

"It's all nicely abstracted. We have proper separation of concerns between business logic and data access because we're using repositories, so we can stub out all the data access for testing.

"Yes, it does run a little slow, now that you ask. I wonder why that is?"

Then, behind the repositories, there's usually a query that's constructed using the object key or foreign keys - to retrieve the result of what ought to be a simple object navigation: order.items is implemented as orderItemRepository.items(orderId). You may believe that that you've abstracted the database because you're going through a repository interface, and possibly/probably using an object-relational mapping tool to fetch the entities, but if you're writing code that stitches object graphs together using keys and foreign keys, then you are writing the ORM tool. You're just using the off-the-shelf ORM as an xDBC substitute. It's the old "we used an X tool to build an X tool" problem. (See also "MVC frameworks built using MVC frameworks".)

The goal of an ORM is to make the mapping from tables and joins to object graphs Somebody Else's ProblemTM. That's a simpler way of defining true separation of concerns. As such, we should aim to write our core logic in the simplest object-oriented way we can, so that - ideally - the whole thing could run in memory with no database at all. Saving and fetching stored objects just happens. Not a foreign key or object repository in sight. It can vastly simplify the code (including test code).

The most powerful and flexible ORMs - like Hibernate - make this possible. I've written entire "enterprise" applications that could be run in memory, with the mapping and persistence happening entirely outside the core logic. In terms of hexagonal architecture, I treat data access as an external dependency and try to minimise it as much as possible. I don't write data access "layers".

Teams that go down the "layered" route tend to end up with heaps of code that depends directly on the ORM they're using (to write an ORM). It's a similar - well, these days, identical - problem to Java teams who do dependency injection using Spring and end up massively dependent on Spring - to the extent that their code can only be run in a Spring context.

At best, they end up with thousands of tests that have to stub and mock the data access layer so they can test ther core logic. At worst, they end up only being able to test their core logic with a database attached.

The ORM's magic doesn't come for free, of course. Yes, there's a heap of tweaking you need to do to make a completely seperated persistence/mapping component work. Many decisions have to be made (e.g., lazy loading vs. pre-emptive vs. SQL views vs. second level caching etc etc) to make it performant, but you were making those decisions anyway. You just weren't using the ORM to handle them, because you were too busy writing your own.

Posted 1 month, 2 days ago on October 19, 2018