It's an oft-recited design principle when building object-oriented software that you should always protect a class'
data members by making them private and only accessing them via public "get
" and "set
" methods. The mechanism by
which this is achieved varies according to the language, but the idea is the same: if you access your data via methods
rather than directly, then you have a lot more flexibility for refactoring later, without breaking your class interface.
A less appreciated fact is, that as for pretty much every simple rule, there are a plethora of quite reasonable
exceptions. In this article I'll focus on one such exception --- how maintaining object relationships with the Java
Hibernate persistency framework is best done by directly accessing data fields and keeping them private!
One characteristic of passing from being a novice programmer to being an experienced developer is learning to make
decisions based on the logic behind such 95% rules rather than sticking rigidly to the letter of the law just because.
It's a bit like in kung fu films --- the more styles you know, the better you're equipped to deal with difficult
situations! The main class of exceptions in this case is where your object is little more than a glorified data
container. If you're really sure that the current variables will forever be the relevant ones or, if not, that method
wrappers will do little to protect you from refactoring anyway, then there's little point in typing all those extra
lines and parentheses.
Another such situation turned up for me recently when using the Hibernate Java object
persistency system. Hibernate is one of the many marvellous high-level libraries for Java, and is exactly the sort of
thing whose emergence is making Java such a powerhouse "enterprise development" platform these days. (On the days when I
have to write C++ and my worries are all on the level of "how do I make this string lower-case?", I really pine for
Java, where the worrying things are so much more interesting.) Hibernate sits in-between a set of Java objects and a
relational database and does the database magic for you, so that the developer relly just has to worry about the object
semantics. It's very clever, and now can make use of the Java 1.5+ annotations framework, so you barely even need
configuration files to describe the object-db mapping.
Natually, when I started working with Hibernate and my particular set of objects (the model behind
HepData, I made Hibernate perform all the persistency operations via the public
get
and set
methods. So far so good. However, I started noticing problems when using the "delete-orphan"
relationship, getting an error message like this:
A collection with cascade="all-delete-orphan" was no longer referenced by the owning entity instance
delete-orphan
should be a neat way to ensure that when you delete an object from the database, its "child objects", as
defined by the Hibernate mapping, also get deleted. Clearly something was going on here that made Hibernate lose track
of the objects it's meant to be managing. The answer, as provided by Scott Leberknight in this
article, is that if you make Hibernate use the get
and
set
methods to access object contents, then those methods had better not manipulate the data! In fact, it pretty much
forces you to have get
and set
methods which look like
public Foo getFoo() { return _foo; }
public void setFoo(Foo foo) { _foo = foo; }
This rings a bell --- if we're not allowed to derive any benefit from the get
and set
methods, then what's the
point in using them? Should we just expose all the data members of our Hibernate classes? Hardly nice: the whole idea is
that Hibernate is pretty transparent, so if it starts making major impositions on the public interface of our classes
then it's doing a pretty piss-poor job. Fortunately, life is nicer than this, and Hibernate is still an excellent tool.
But first a momentary diversion on why the prospective loss of these get
and set
methods (particularly the set
methods) might be a real show-stopper rather than just an unpleasant aesthetic constraint.
Anyone who's used Hibernate in anger, or at least had a good read of the manual, won't be surprised to hear that the key
issue is bidirectional relations between objects. Hibernate does a damn good job, but it's not magic and relationships
between objects still need to be handled in your Java code. For example, if you have a object of class type Parent
and
it contains a collection of several Child
objects, then there is a one-to-many relationship defined between the
Parent
and its children and you would tell Hibernate about this relationship. Obviously, in Java-land the parent can
always find its children, because it has a data member (the collection) which contains the references to them. But what
about the reverse? In pure Java terms, if you acquired a reference to one of the children there's no way to find out
which Parent
object "owns" it. For this reason it's a nice idea to add a "back-reference" from the child to its
parent, say via a private Parent _parent
data member in Child
.
This is all very nice, and undoubtedly good practice, but now we have a new issue: if we start adding or removing
Child
s from the collections in Parent
s, we'd better make sure that the back-reference is kept up to date. This won't
happen automatically, so some code will be required to ensure that this relationship is kept consistent. Such a
consistency operation is sometimes described as an "invariant", and is exactly what get
and set
methods are best
employed to enforce. Here's an example --- first we'll define the appropriate bit of Parent
:
public class Parent { private SortedSet<Child> _children;
public SortedSet<Child> getChildren() { return _children; }
public Parent setChildren(SortedSet<Child> children) { _children.clear(); for (Child c :
children) addChild(c); return this; }
public Parent addChild(Child child) { if (child != null) { child.setParent(this); }
return this; } }
Note here that I'm being a bit careful about testing for nullness (but not as careful as I'd really need to be), I'm
delegating the set
method to a more "atomic" addChild
method, and addChild
itself calls an as-yet mysterious
setParent
method on Child
. I've also used the "return self" idiom on the set
methods, just because I think it's a
nice thing to do :-)
Now for Child
, and in particular that setParent
method:
public class Child { private Parent _parent;
public Parent getParent() { return _parent; }
public Child setParent(Parent parent) { if (parent != null) { // I should probably remove
myself from the current parent, too... but I won't! _parent = parent;
_parent.getChildren().add(this); } return this; } }
You can probably see that this is more complex than you'd expect for a boilerplate operation --- unfortunately that's
just life at the moment, although there may be code-generation frameworks which take some of the pain out of this sort
of thing. You can also see that this essential consistency operation is exactly the sort of thing that makes Hibernate
throw a wobbly if it's trying to access the data using the same get
and set
methods. Oops.
Fortunately, as alluded to above, there is a neat answer: Hibernate only talks to the objects for the purposes of
persisting them to and from the database --- these relationship semantics only exist in the pure Java part of the
system. So, if we make Hibernate talk direct to the fields, and only let the objects talk to each other via the
consistency-enforcing interface methods, then Hibernate will only ever have to deal with consistent data structures.
What's more, and this is the neat bit, Hibernate can even persist private fields! It does this via the magic of
reflection, which is an excellent example of how Java's richness and flexibility as an application platform can allow
clever applications to do good things in the best possible way. See this
article and this
one for discussion of this issue.
Using JPA and Hibernate annotations, we can then add Hibernate mappings to our classes and Hibernate will talk direct to
the fields. If that appalls you --- after all, shouldn't data members always be accessed through public wrapper
methods? --- then think again about what I had to say about rules and their exceptions at the start. We don't care how
Hibernate does things, other than to be impressed by its cleverness: it's just a tool, and used this way it allows us to
apply those rules to our part of the code more robustly. Here's the annotations:
@Entity public class Parent { @OneToMany(mappedBy="_parent") // We'd also want Hibernate-specific
annotations for cascade and sorting private SortedSet _children; ...
and
@Entity public class Child { @ManyToOne private Parent _parent; ...
The only problem is that now your Hibernate HQL queries will have to refer to properties by their raw field name,
conventional leading underscore included. It would be nice if there was an annotation for providing an official property
name when declaring a property, to solve this aesthetic problem and protect against external susceptibilities to
internal variable names, but on the whole it works pretty well. Or at least, it does for us --- we're now working on a
different kind of Hibernate problem. One's work is never done, eh?
As a last note, while trying to solve this problem in the first place I found the articles cited above, which really
helped. I also ran across Joel Spolsky's article on leaky
abstractions, which for some reason was new to me. It's
probably nothing new to anyone who might read this, but I like how he expressed the idea, so maybe you'll find it
interesting, too. I can only hope that this article has been comparably informative and entertaining :-)