Defying OO convention: Hibernate and private data reflection

It's an oft-recited design principle when building object-oriented software that you should always protect a class' data members by making them private and only accessing them via public "get" and "set" methods. The mechanism by which this is achieved varies according to the language, but the idea is the same: if you access your data via methods rather than directly, then you have a lot more flexibility for refactoring later, without breaking your class interface. A less appreciated fact is, that as for pretty much every simple rule, there are a plethora of quite reasonable exceptions. In this article I'll focus on one such exception --- how maintaining object relationships with the Java Hibernate persistency framework is best done by directly accessing data fields and keeping them private! One characteristic of passing from being a novice programmer to being an experienced developer is learning to make decisions based on the logic behind such 95% rules rather than sticking rigidly to the letter of the law just because. It's a bit like in kung fu films --- the more styles you know, the better you're equipped to deal with difficult situations! The main class of exceptions in this case is where your object is little more than a glorified data container. If you're really sure that the current variables will forever be the relevant ones or, if not, that method wrappers will do little to protect you from refactoring anyway, then there's little point in typing all those extra lines and parentheses.

Another such situation turned up for me recently when using the Hibernate Java object persistency system. Hibernate is one of the many marvellous high-level libraries for Java, and is exactly the sort of thing whose emergence is making Java such a powerhouse "enterprise development" platform these days. (On the days when I have to write C++ and my worries are all on the level of "how do I make this string lower-case?", I really pine for Java, where the worrying things are so much more interesting.) Hibernate sits in-between a set of Java objects and a relational database and does the database magic for you, so that the developer relly just has to worry about the object semantics. It's very clever, and now can make use of the Java 1.5+ annotations framework, so you barely even need configuration files to describe the object-db mapping.

Natually, when I started working with Hibernate and my particular set of objects (the model behind HepData, I made Hibernate perform all the persistency operations via the public get and set methods. So far so good. However, I started noticing problems when using the "delete-orphan" relationship, getting an error message like this:

A collection with cascade="all-delete-orphan" was no longer referenced by the owning entity instance

delete-orphan should be a neat way to ensure that when you delete an object from the database, its "child objects", as defined by the Hibernate mapping, also get deleted. Clearly something was going on here that made Hibernate lose track of the objects it's meant to be managing. The answer, as provided by Scott Leberknight in this article, is that if you make Hibernate use the get and set methods to access object contents, then those methods had better not manipulate the data! In fact, it pretty much forces you to have get and set methods which look like

public Foo getFoo() {        return _foo;      }

public void setFoo(Foo foo) {        _foo = foo;      }

This rings a bell --- if we're not allowed to derive any benefit from the get and set methods, then what's the point in using them? Should we just expose all the data members of our Hibernate classes? Hardly nice: the whole idea is that Hibernate is pretty transparent, so if it starts making major impositions on the public interface of our classes then it's doing a pretty piss-poor job. Fortunately, life is nicer than this, and Hibernate is still an excellent tool. But first a momentary diversion on why the prospective loss of these get and set methods (particularly the set methods) might be a real show-stopper rather than just an unpleasant aesthetic constraint.

Anyone who's used Hibernate in anger, or at least had a good read of the manual, won't be surprised to hear that the key issue is bidirectional relations between objects. Hibernate does a damn good job, but it's not magic and relationships between objects still need to be handled in your Java code. For example, if you have a object of class type Parent and it contains a collection of several Child objects, then there is a one-to-many relationship defined between the Parent and its children and you would tell Hibernate about this relationship. Obviously, in Java-land the parent can always find its children, because it has a data member (the collection) which contains the references to them. But what about the reverse? In pure Java terms, if you acquired a reference to one of the children there's no way to find out which Parent object "owns" it. For this reason it's a nice idea to add a "back-reference" from the child to its parent, say via a private Parent _parent data member in Child.

This is all very nice, and undoubtedly good practice, but now we have a new issue: if we start adding or removing Childs from the collections in Parents, we'd better make sure that the back-reference is kept up to date. This won't happen automatically, so some code will be required to ensure that this relationship is kept consistent. Such a consistency operation is sometimes described as an "invariant", and is exactly what get and set methods are best employed to enforce. Here's an example --- first we'll define the appropriate bit of Parent:

public class Parent {        private SortedSet<Child> _children;

  public SortedSet<Child> getChildren() {          return _children;        }

  public Parent setChildren(SortedSet<Child> children) {          _children.clear();          for (Child c :

children) addChild(c); return this; }

  public Parent addChild(Child child) {          if (child != null) {            child.setParent(this);          }

return this; } }

Note here that I'm being a bit careful about testing for nullness (but not as careful as I'd really need to be), I'm delegating the set method to a more "atomic" addChild method, and addChild itself calls an as-yet mysterious setParent method on Child. I've also used the "return self" idiom on the set methods, just because I think it's a nice thing to do :-)

Now for Child, and in particular that setParent method:

public class Child {        private Parent _parent;

  public Parent getParent() {          return _parent;        }

  public Child setParent(Parent parent) {          if (parent != null) {            // I should probably remove

myself from the current parent, too... but I won't! _parent = parent; _parent.getChildren().add(this); } return this; } }

You can probably see that this is more complex than you'd expect for a boilerplate operation --- unfortunately that's just life at the moment, although there may be code-generation frameworks which take some of the pain out of this sort of thing. You can also see that this essential consistency operation is exactly the sort of thing that makes Hibernate throw a wobbly if it's trying to access the data using the same get and set methods. Oops.

Fortunately, as alluded to above, there is a neat answer: Hibernate only talks to the objects for the purposes of persisting them to and from the database --- these relationship semantics only exist in the pure Java part of the system. So, if we make Hibernate talk direct to the fields, and only let the objects talk to each other via the consistency-enforcing interface methods, then Hibernate will only ever have to deal with consistent data structures. What's more, and this is the neat bit, Hibernate can even persist private fields! It does this via the magic of reflection, which is an excellent example of how Java's richness and flexibility as an application platform can allow clever applications to do good things in the best possible way. See this article and this one for discussion of this issue.

Using JPA and Hibernate annotations, we can then add Hibernate mappings to our classes and Hibernate will talk direct to the fields. If that appalls you --- after all, shouldn't data members always be accessed through public wrapper methods? --- then think again about what I had to say about rules and their exceptions at the start. We don't care how Hibernate does things, other than to be impressed by its cleverness: it's just a tool, and used this way it allows us to apply those rules to our part of the code more robustly. Here's the annotations:

@Entity      public class Parent {        @OneToMany(mappedBy="_parent")        // We'd also want Hibernate-specific

annotations for cascade and sorting private SortedSet _children; ...

and

@Entity      public class Child {        @ManyToOne        private Parent _parent;        ...

The only problem is that now your Hibernate HQL queries will have to refer to properties by their raw field name, conventional leading underscore included. It would be nice if there was an annotation for providing an official property name when declaring a property, to solve this aesthetic problem and protect against external susceptibilities to internal variable names, but on the whole it works pretty well. Or at least, it does for us --- we're now working on a different kind of Hibernate problem. One's work is never done, eh?

As a last note, while trying to solve this problem in the first place I found the articles cited above, which really helped. I also ran across Joel Spolsky's article on leaky abstractions, which for some reason was new to me. It's probably nothing new to anyone who might read this, but I like how he expressed the idea, so maybe you'll find it interesting, too. I can only hope that this article has been comparably informative and entertaining :-)

Comments

Comments powered by Disqus