Archive for August, 2009

How NHibernate’s dirty checking violates encapsulation boundries

NHibernate’s dirty checking caught me by surprise when I first started using NHibernate. Since I was new to NHibernate I was still open to the idea that I wasn’t using the tool correctly and perhaps needed to embrace dirty checking rather than try to fight it. I still don’t much like dirty checking, but now I better understand why I felt uneasy about dirty checking in the first place — It breaks encapsulation!

To explain why lets start with Fowler’s Encapsulate Collection example from pg 208 of Refactoring (Martin Fowler) Before refactoring, the Person has a collection of courses which has public get/set properties. If we were to add a business rule that a person could not have more than 5 courses at any one time then public access to the collection becomes problematic, every method that sets the collection needs to be modified to enforce that business rule. To avoid duplicating this business logic, we can validate this rule in setCourses method. But when you just want add/remove a course from the existing course list it seems a bit inefficient perhaps to copy the entire collection, add/remove the course, and reassign the collection. So often with collections it makes sence to also expose a add/removeCourse methods as Fowler does. It doesn’t matter if the internal implementation replaces the set or modifies the existing instance. What is most important about this example is not the add/remove methods but that getCourses returns an immutable collection or a copy of the collection. If getCourses returns a mutable collection a call to person.getCourses().Add(course) would bypass your maximum courses rule and could leave the system in an invalid state.

Now lets make the business rules a just slightly more complex instead of maximum of 5 courses per person we say a maximum of 20 credit hours. Providing that course is immutable we just need to update addCourse to ensure the sum of existing credit hours + course.getCreditHours() does exceed 20. But if after looking at the course interface we find that in addition to a getCreditHours we find a setCreditHours(), then we could be a problem. At first we were just counting instances so it didn’t matter if course instances were immutable, but if the credit hours can change after the course had been added to the person we could have a problem. The solution would depend on if course is considered inside or outside of the aggregate boundary.

To further explore this issue, lets change the example to the Purchase Order, Line Item, Part relationship used in Domain Driven Design (Eric Evans, pg 130). A purchase order contains many line items each of which are associated with a part having a price. A business rule says the purchase order cannot exceed an approved limit. Adding additional line items, modifying the quantity of a line item, or changing the price of a part could potentially invalidate the business rule if we aren’t careful. To enforce this business rule Eric introduces the concept of an aggregate boundary, and defines purchase order as the aggregate root responsible for enforcing the business rules.

This boundary says that any changes made to objects within the boundary must be made by the aggregate root object. Part is outside of the boundary, and the price of the part may change in the future, but only the current price of the object is relevant. We are not going to send customer a bill if the price of the part increases in the future, so to ensure that line item price does not change unexpectedly when part is updated in the future it is necessary to copy the relevant information from the part to the purchase order line item to isolate it the price from future changes to part. Line Items however are within the boundary, changes to line items must be made by the PurchaseOrder object so that it can reject any changes that would place the purchase order in an invalid state. Adding additional line items can easily be prevented by encapsulating the collection as described above, but it is not only the collection that must be encapsulated but also any modifications to the line items.

For starters, we could make LineItem an immutable type. If you want to update the quantity, you must create a new line item remove the existing line item then add the new line item. This probably works fine initially, but as business objects and their internal relationships get more complex this add remove style can become a bit inconvenient. For example, perhaps there are footnotes which reference the line item number and removing the line item would also remove the associated notes. Just as the setCourses method may not be ideal if you want to add/remove a course from the existing list, adding and removing items entire LineItems may not be ideal either if you only want to update the quantity. So in addition to a add/remove methods it is often convenient to have a updateLineItem(index, lineItem) that will allow you replace a line item while preserving the existing identity.

In our example, LineItem is still an immutable type. However in enterprise applications line items have dozens of fields, such as cost codes and other accounting instructions. Large constructors are difficult to work with so often instead of using immutable types it is often easier to use a mutable type with a simple constructor, assign the desired properties and then add the object to the order. To preserve encapsulation the order would copy the line item before adding it to its internal collection and all references returned by getLineItem would also be a copy. Copying the item is essential part of preserving the encapsulation boundary and preventing unexpected changes like such as addLineItem(lineItem); lineItem.Quantity = 5 or getLineItem(0).Quantity = 5. So although we have made line items mutable the update method is still required to apply those changes to the internal lineitem giving the purchase order the opportunity to reject changes that would invalidate the business rule. The lineitems within the purchase order are isolated from changes to copies of the line items held by the user. I call this encapsulation via isolation as opposed to the former which I call encapsulation via immutability.

Encapsulation boundaries such as this exist at many layers within the application, for example at the service boundary it is common to copy the domain object to/from a DTO. The DTO might be binary, xml, or even html. The service does not care what happens to the DTO after it has crossed then service boundary, if you want to modify the object you must ask the service to perform the update modifications.

So even though PurchaseOrder is a an aggregate root it too lives within the context of a larger encapsulation boundary. We might for example want to bind the purchase order to a form and allow the user to arbitrarily add, remove, and change line items. The only condition is that they cannot save the purchase order in the repository while it is in an invalid state – encapsulation via isolation. All objects within the repository must be valid, we don’t care what happens to objects while they are outside of the repository.

So what does this have to do with NHibernate dirty checking?

The purpose of the repository is to encapsulate the persistence technology behind a collection-like interface. NHibernate is supposed to be transparent and provide the illusion that objects are stored in a in memory collection. We see that ISession has all the expected methods GetById, Find, Save, Update, Delete, so we add the ISession to the httpcontext and replace the in memory list with a call to ISession. Since ISession is hidden behind the repository the user must call the repository to save, update, or delete the database state where we can continue to check that order.IsValid() before persistence occurs. See the problem yet? If your new to NHibernate probably not, but the title of this post will probably give you a clue – update does not do what you might think it does.

Lets say a user adds two line items to a purchase order and click save, the page refreshes with an error message that order exceeds approval limit, but when the order is refreshed you see that one line item was saved… what happened? You would have expected that both line items were added or neither line items were added. so you start to debug the issue perhaps placing a breakpoint in repository.update. But this method is never called, eventually you find a try catch block that begins to modify the purchase order, add a line item all is good, but then when adding the second line item and an exception is thrown. The error is caught and handled by setting the error message, update is not called and the purchase order reference was discarded. See the problem yet?

As long as you refresh, evict, or rollback the transaction things will work as expected, but doing so also couples the application to NHibernate, not physically but logically. What makes this form of coupling particularly insidious is that the code that becomes dependent on the dirty checking isn’t obvious. Dirty checking isn’t some method you can easily replace when migrating from one persistence technology to another, you may not even be aware that some part of the code is dependent dirty checking.

Looking back at the original Person.getCouses if a user complains that person objects suddenly start showing up with more than 5 courses, is this a problem with the Person class or with the code that uses it. Some might say there is nothing wrong with the person class you are just using it incorrectly and to some degree they are correct. Somewhere in the code you’ll find person.getCourses().add(course) rather than person.setCourses like it should be. But does this mean that the Person class is nothing wrong with the Person class?

The problem is not just dirty checking, but the mutable references. Although you may have a CustomerService through which all customer related modifications should occur NHibernate will not prevent the OrderService from doing something stupid like order.Customer.CreditLimit = 1000000. Sure there is a problem with this code, but it could have been prevented. NHibernate’s like the original person.getCourses() method returns a mutable object and this violates the desired encapsulation boundaries.

Categories: Uncategorized Tags: