July 27, 2014

...Learn TDD with Codemanship

Object-Relational Mapping Should Be Felt & Not Seen

Here's a hand-wavy general post about object-relational persistence anti-patterns that I still keep seeing popping up in many people's code.

First, let me set out what the true goal of ORM's should be: an ORM is designed to allow us to build good old-fashioned object oriented applications where the data in our objects can outlive the processes they run in by storing said persistent data in a relational database.

Back in the bad old days, we did this by writing what we called "Data Access Objects" (DAO's) for each type of persistent object - often referred to as entities, or domain objects, or even "Entity Beans" (if your Java code happened to have disappeared up that particular arse in the late 1990's.)

This was very laborious, and often took up half the effort of development.

Many development teams working on web and "enterprise" applications were coming from a 2-tier database-driven background, and were most familiar and comfortable with the notion that the "model" in Model-View-Controller was the database itself. Hence, their applications tended to treat the SQL Server, Oracle or wotnot back-end as main memory and transact every state change of objects against it pretty much immediately. "Middle tier" objects existed purely as gateways to this on-disk relational memory. Transactions and isolation of changes was handled by the database server itself.

Not only did this lead to applications that could only be run meaningfully - including for testing - with that database server in place, but it also very tightly coupled the database to the application's code, making it rigid and difficult to evolve. If every third line of code involves a trip to the database, and if objects themselves aren't where the data is to be found most of the time - except to display it on the user's screen - then you still have what is essentially a database-driven application, albeit with a fancy hifalutin "middle tier" to create the illusion that it isn't.

Developers coming from an object oriented background suffered exactly the opposite problem. We knew how to build an application using objects where the data lived in memory, but struggled with persisting that data to a relational database. quite naturally, we just wanted that to sort of happen by magic, without us having to make any changes to our pristine object oriented code and sully it with DAOs and Units of Work and repositories and SQL mappers and transaction handling and blah blah blah.

And, whether you've heard or not, frameworks like Hibernate allow us to do pretty much exactly that; but only if we choose to do it that way.

Sadly, just as a FORTRAN programmer can write FORTRAN code in any programming language you give them, 2-tier database-driven programmers can write 2-tier database-driven code with even the most sophisticated ORMs.

Typically, what I see - and this is possibly built on a common misinterpretation of the advice given in Domain-driven Design about persistence architectures - is developers writing DAO's using ORMs. So, they'll take a powerful framework like Hibernate - which enables us to write our persistent objects as POJO's that hold application state in memory (so that the logic will work even if there's no database there), just like in the good old days - and convert their SQL mappers into HQL mappers that use the Hibernate Query Language to access data in the same chatty, database-as-main-memory way they were doing it before. Sure, they may be disguising it using domain root object "repositories", and that earns them some protection; for example, allowing us to mock repositories so we can unit test the application code. But when they navigate relationships by taking the ID of one object and using HQL to find its collaborator in the database, it all starts to get a bit dicey. A single web page request can involve multiple trips to the database, and if we take the database away, depending on how complicated the object graph is, we can end up having to weave a complex web of mock objects to recreate that object graph, since the database is the default source of application state. After all, it's main memory.

Smarter developers rely on an external mapping file and the in-built capabilities of Hibernate to take care of as much of that as possible.

They also apply patterns that allow them to cleanly separate the core logic of their applications from persistence and transactions. For example, the session-per-request pattern can be implemented for a web application by attaching persistent objects to a standard collection in session state, and managing the scope of transactions outside of main page request processing (e.g., using Http modules in ASP.NET to re-attach persistent objects in session state before page processing begins, and to commit changes after processing has finished.)

If we allow it, the navigation from a customer to her orders can be as simple as customer.orders. The ORM should take care of the fetching for us, provided we've mapped that relationship correctly in the configuration. If we add a new order, it should know how to take care of that, too. Or if we delete an order. It should all be taken care of, and ideally as a single transaction that effectively synchronises all the changes we made to our objects in memory with the data stored in the DB.

The whole point of an ORM is to generate all that stuff for us. To take something like Hibernate, and use it to write a "data access layer" is kind of missing that point.

We should not need a "CustomerRepository" class, nor a "CustomerDAO". We should need none of that, and that's the whole point of ORMs.

As much as possible in our code, Object-Relational Mapping should be felt, and not seen.








Posted 3 years, 7 months ago on July 27, 2014