Home > Uncategorized > How NHibernate’s dirty checking violates encapsulation boundries

How NHibernate’s dirty checking violates encapsulation boundries

NHibernate’s dirty checking caught me by surprise when I first started using NHibernate. Since I was new to NHibernate I was still open to the idea that I wasn’t using the tool correctly and perhaps needed to embrace dirty checking rather than try to fight it. I still don’t much like dirty checking, but now I better understand why I felt uneasy about dirty checking in the first place — It breaks encapsulation!

To explain why lets start with Fowler’s Encapsulate Collection example from pg 208 of Refactoring (Martin Fowler) Before refactoring, the Person has a collection of courses which has public get/set properties. If we were to add a business rule that a person could not have more than 5 courses at any one time then public access to the collection becomes problematic, every method that sets the collection needs to be modified to enforce that business rule. To avoid duplicating this business logic, we can validate this rule in setCourses method. But when you just want add/remove a course from the existing course list it seems a bit inefficient perhaps to copy the entire collection, add/remove the course, and reassign the collection. So often with collections it makes sence to also expose a add/removeCourse methods as Fowler does. It doesn’t matter if the internal implementation replaces the set or modifies the existing instance. What is most important about this example is not the add/remove methods but that getCourses returns an immutable collection or a copy of the collection. If getCourses returns a mutable collection a call to person.getCourses().Add(course) would bypass your maximum courses rule and could leave the system in an invalid state.

Now lets make the business rules a just slightly more complex instead of maximum of 5 courses per person we say a maximum of 20 credit hours. Providing that course is immutable we just need to update addCourse to ensure the sum of existing credit hours + course.getCreditHours() does exceed 20. But if after looking at the course interface we find that in addition to a getCreditHours we find a setCreditHours(), then we could be a problem. At first we were just counting instances so it didn’t matter if course instances were immutable, but if the credit hours can change after the course had been added to the person we could have a problem. The solution would depend on if course is considered inside or outside of the aggregate boundary.

To further explore this issue, lets change the example to the Purchase Order, Line Item, Part relationship used in Domain Driven Design (Eric Evans, pg 130). A purchase order contains many line items each of which are associated with a part having a price. A business rule says the purchase order cannot exceed an approved limit. Adding additional line items, modifying the quantity of a line item, or changing the price of a part could potentially invalidate the business rule if we aren’t careful. To enforce this business rule Eric introduces the concept of an aggregate boundary, and defines purchase order as the aggregate root responsible for enforcing the business rules.

This boundary says that any changes made to objects within the boundary must be made by the aggregate root object. Part is outside of the boundary, and the price of the part may change in the future, but only the current price of the object is relevant. We are not going to send customer a bill if the price of the part increases in the future, so to ensure that line item price does not change unexpectedly when part is updated in the future it is necessary to copy the relevant information from the part to the purchase order line item to isolate it the price from future changes to part. Line Items however are within the boundary, changes to line items must be made by the PurchaseOrder object so that it can reject any changes that would place the purchase order in an invalid state. Adding additional line items can easily be prevented by encapsulating the collection as described above, but it is not only the collection that must be encapsulated but also any modifications to the line items.

For starters, we could make LineItem an immutable type. If you want to update the quantity, you must create a new line item remove the existing line item then add the new line item. This probably works fine initially, but as business objects and their internal relationships get more complex this add remove style can become a bit inconvenient. For example, perhaps there are footnotes which reference the line item number and removing the line item would also remove the associated notes. Just as the setCourses method may not be ideal if you want to add/remove a course from the existing list, adding and removing items entire LineItems may not be ideal either if you only want to update the quantity. So in addition to a add/remove methods it is often convenient to have a updateLineItem(index, lineItem) that will allow you replace a line item while preserving the existing identity.

In our example, LineItem is still an immutable type. However in enterprise applications line items have dozens of fields, such as cost codes and other accounting instructions. Large constructors are difficult to work with so often instead of using immutable types it is often easier to use a mutable type with a simple constructor, assign the desired properties and then add the object to the order. To preserve encapsulation the order would copy the line item before adding it to its internal collection and all references returned by getLineItem would also be a copy. Copying the item is essential part of preserving the encapsulation boundary and preventing unexpected changes like such as addLineItem(lineItem); lineItem.Quantity = 5 or getLineItem(0).Quantity = 5. So although we have made line items mutable the update method is still required to apply those changes to the internal lineitem giving the purchase order the opportunity to reject changes that would invalidate the business rule. The lineitems within the purchase order are isolated from changes to copies of the line items held by the user. I call this encapsulation via isolation as opposed to the former which I call encapsulation via immutability.

Encapsulation boundaries such as this exist at many layers within the application, for example at the service boundary it is common to copy the domain object to/from a DTO. The DTO might be binary, xml, or even html. The service does not care what happens to the DTO after it has crossed then service boundary, if you want to modify the object you must ask the service to perform the update modifications.

So even though PurchaseOrder is a an aggregate root it too lives within the context of a larger encapsulation boundary. We might for example want to bind the purchase order to a form and allow the user to arbitrarily add, remove, and change line items. The only condition is that they cannot save the purchase order in the repository while it is in an invalid state – encapsulation via isolation. All objects within the repository must be valid, we don’t care what happens to objects while they are outside of the repository.

So what does this have to do with NHibernate dirty checking?

The purpose of the repository is to encapsulate the persistence technology behind a collection-like interface. NHibernate is supposed to be transparent and provide the illusion that objects are stored in a in memory collection. We see that ISession has all the expected methods GetById, Find, Save, Update, Delete, so we add the ISession to the httpcontext and replace the in memory list with a call to ISession. Since ISession is hidden behind the repository the user must call the repository to save, update, or delete the database state where we can continue to check that order.IsValid() before persistence occurs. See the problem yet? If your new to NHibernate probably not, but the title of this post will probably give you a clue – update does not do what you might think it does.

Lets say a user adds two line items to a purchase order and click save, the page refreshes with an error message that order exceeds approval limit, but when the order is refreshed you see that one line item was saved… what happened? You would have expected that both line items were added or neither line items were added. so you start to debug the issue perhaps placing a breakpoint in repository.update. But this method is never called, eventually you find a try catch block that begins to modify the purchase order, add a line item all is good, but then when adding the second line item and an exception is thrown. The error is caught and handled by setting the error message, update is not called and the purchase order reference was discarded. See the problem yet?

As long as you refresh, evict, or rollback the transaction things will work as expected, but doing so also couples the application to NHibernate, not physically but logically. What makes this form of coupling particularly insidious is that the code that becomes dependent on the dirty checking isn’t obvious. Dirty checking isn’t some method you can easily replace when migrating from one persistence technology to another, you may not even be aware that some part of the code is dependent dirty checking.

Looking back at the original Person.getCouses if a user complains that person objects suddenly start showing up with more than 5 courses, is this a problem with the Person class or with the code that uses it. Some might say there is nothing wrong with the person class you are just using it incorrectly and to some degree they are correct. Somewhere in the code you’ll find person.getCourses().add(course) rather than person.setCourses like it should be. But does this mean that the Person class is nothing wrong with the Person class?

The problem is not just dirty checking, but the mutable references. Although you may have a CustomerService through which all customer related modifications should occur NHibernate will not prevent the OrderService from doing something stupid like order.Customer.CreditLimit = 1000000. Sure there is a problem with this code, but it could have been prevented. NHibernate’s like the original person.getCourses() method returns a mutable object and this violates the desired encapsulation boundaries.

About these ads
Categories: Uncategorized Tags:
  1. Stefan Steinegger
    2009/08/24 at 1:51 am | #1

    I disagree. See my points here:

    http://groups.google.com.ar/group/nhusers/browse_thread/thread/11e5ff4c27006c4b/acad1fee5e4be17c

    Excerpt:
    Making changes in memory but not storing them is NOT the solution. How
    would to reliably validate entities if you have invalid data in
    memory? How would you calculate the price of an order when you allow
    any part of the software changing the price of a product in memory?
    Your memory needs to be as consistent as the database.

    • kurtharriger
      2009/08/24 at 11:28 am | #2

      Sure it is!

      Let us say your editing some source code, you step away for some coffee and when you come back the cat is laying on your keyboard. The text file is now destroyed. If your lucky those changes haven’t been “saved” so you just close the file and “discard” the changes. But by your reasoning, all changes made in memory should be saved.

      The cat obviously does’t know anything about valid code and the changes don’t even compile, so by your reasoning the text editor should probably have prevented invalid code from being typed in the first place. Such an editor, however, would be useless because multiple keystrokes are often required to move from one valid state to a new valid state – sometimes the code file must temporarily enter an invalid state to reach a new valid state.

      You could perhaps develop an editor that would prevent invalid code from being saved. Since you are working on a COPY of the document in memory the editor could refuse to save any files that are not syntactically valid. The copy of the document in memory must be allowed to enter an invalid state, but the same is not necessarily true for the document on disk.

      Even if you were to ensure that files were syntactically valid on save there is no easy way to guarantee that they are functionally valid. Thus each developer has his/her own COPY of the files from source control on their local machine where code may fail to compile and unit tests may temporarily break, and only those files have entered a valid state are they committed to source control where they are deployed to a TEST environment for further validation before finally reaching production, at any one of these validation steps the code could be discarded.

      So the answer to your question is simple:
      > How would to reliably validate entities if you have invalid data in
      > memory? How would you calculate the price of an order when you allow
      > any part of the software changing the price of a product in memory?

      Because it is a copy! To replace the original order you must call update. Update will ensure the order given is valid then if and only if it is valid will overwrite the original order with the new order. If you fail to call update then copy valid or invalid is simply discarded.

      • Stefan Steinegger
        2011/08/11 at 5:42 am | #3

        Ok, I was probably misunderstood. The “copy” in memory is either stored completely (commit) or thrown away completely (rollback). It’s very dangerous to store only a part. The stored part could be consistent with the (unsaved) rest in memory, but not with the database!

        Example:
        - Order line 1: price 10 in memory, price 11 in the db.
        - Order line 2: price 5 in memory, price 11 in the db.

        Calculate order total (in memory): 15
        Validate order (in memory) total < 20: pass
        Store order because of changed order total, not store order lines for some reason.

        You see the problem here? There isn't a good reason to not store the order lines, as they changed in memory and the business logic used that data for calculations and validations! If the order lines failed the validation, the whole transaction needs to be rolled back.

        I don't know what you are doing when you "see that one line item was saved". To me it looks like wrong transaction handling, for instance the famous session-per-call anti-pattern. I mean: you must have called commit despite of the validation problem!

        NHibernate implements so called persistence ignorance. You write the business logic as if there wasn't any persistence. This allows plain old object oriented programming. Whenever you have a problem with automatic dirty check, think about this: How would you implement it without a persistence layer behind? Eg. How would your application recognize changes on the business objects? Remember: memory state is as important as database state. So changes in the database are not more relevant. The question "how does your application know when an object changes?" (and other such questions) is a typical programming question and does not have anything to do with databases. And that's the beauty of it.

  2. inca
    2009/10/17 at 4:27 am | #4

    “Your memory needs to be as consistent as the database.” — this is not an absolute truth, you know. Just your own project’s design decision, and nothing more than that.
    You know, it is a common practice to use Open Session In View pattern for web applications, collect user-entered data directly to model objects, validate them and only if the validation is passed, persist them. With auto-dirty-checking this pattern is no longer applicable (though it does not produce any error, but it silently updates entity prior to validation — this is way more hard to maintain).
    I think I’d finish my post with a bit more absolute truth, than Stephan’s one: the developer needs control over his own application.

  3. Radim K
    2011/12/14 at 11:51 am | #5

    Kurt
    Thanks for you post. It is really great that you spent your time, to put it on a paper.
    Even if I found it now – after years … I would like to say, that I see and think about it the very similar way.
    In fact, I can hardly agree with Stefan approach.
    My experience with multi-tier architecture learned me, that I cannot count on some feature
    (even awesome) like NHibernate. I simply have to solve the stuff, on right place, and once.

    So we have a Data layer. Represented by IDao providing us with Add(), Update(), GetById()….
    And I also have a Business Layer, represented by IFacade, which has similar methods (at first look) Add(), Update(), GetById()…. – and there, in these methods, is a call to data layer: Dao.Add(), Dao.Update()… Explicit call

    It does not matter which layers are above (Web application, Importer parsing MS Excel, Routine processing WS request/message)

    The point is that for example in IFacade.Update() method we do business validations. And these can result in undesirable state. And it could really mean that previously recieved object (via GetById() method), was changed/bounded in upper layer and is dirty.
    And at this place, the business tier drives the processing by not calling Dao.Update()
    (and there could be very complete tree of object beeing updated – in one transaction or NOT)

    What I like on this approach is, that the Facade, the only place for checking the Business rules, drives the process of persistence. Why?

    Because we do not need “persistence ignorance” provided by NHibernate only, we need IDao implementation ignorance!
    And as a proof I would like to argue that, in our applications we did implement IDao in NHibernate (my favorite way) but also in ADO.NET SP (we had too) as well as in LDAP, XML, external WS…

    Because there is separation of concern, driven by tiers, not by NHibernate features, we can use standard framework in many scenarios.

    Thanks again

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: