Durham staff survey

I filled in the Durham University staff survey yesterday, which got me once again rather miffed about the way that postdoctoral positions work within this university, but also more generally. A good number of the questions were of the "can you do your work within normal working hours?", "do you have enough resources to do your job?", "do you feel stressed by work?" and so on. Answers: no, no, and yes. But while this "dialogue" is all very well in principle, there's nothing that can actually be done about these things: research positions are funded to do research in a particular area and if that's a lot of work for one person, tough. And if you don't manage to do it, also tough: no permanent position for you, matey. So hurrah for the new touchy-feely Durham University, but if they stump up University funds for the needed extra manpower to make my post a full success, I'll eat a hat of anyone's choosing.

The other major annoying aspect of working for the University is the complete lack of institutional incentive to work hard or well. Except for the fact that I'm actually a reasonably intelligent and self-motivated person who doesn't like to turn out slack or shoddy work, my "best strategy" would be to work as little as I can without being fired (or, since I have a fixed term position, without failing to get my contract renewed). My pay will increment by about 800 a year as long as I'm there, regardless of whether I (continue to) work my arse off. What's more, there's no meaningful system for hopping up that ladder a bit faster: allegedly such a scheme exists, but I'm told that it's not worth applying because it's a lot of paperwork and anyway the University has put a moratorium on such accelerated progress. Now, I can understand that being older is likely to be correlated with having kids and big houses and the accompanying expenses, but nevertheless it burns to do more or better work than someone who's been around for ages and to take home half their wage: maybe I want that house or kids right now. I don't even regard myself as much of a materialist: I just like to think that what I'm getting given is proportional to what other people doing the same work get. This is by no means a problem unique to university posts, but Durham's pay and promotion system is remarkably geared towards maintaining a status quo of age-coupled mediocrity. A mediocracy, if you will.

Anyway, I get the distinct impression that this review is not for the benefit of academics. I hope it's for the benefit of someone other than just the HR section. It's institutionally accepted that academic life is underpaid, overstressed and under-resourced, and getting a few thousand surveys back to confirm it won't change any of those factors. Unfortunately, academic life is also wonderfully flexible, informal and interesting: we're our own worst enemies. So go on, drop out and go work for a bank: the fewer applicants for those stressful, undervalued lectureships the better.

In case you thought I was dead...

...think again. Although I have been curiously quiet for the last month or so, it's all to do with the usual "swamped by work" excuse, rather than either death or lack of things to moan about. Plenty in the pipeline, just all going very slowly: that's the problem with attempting to parallelise your workload.

On the plus side, I went for a big MTB ride around darkest Northumberland on Saturday, which was knackering and awesome, then we got out for our first climbing in ages today at Brimham Rocks. At this rate, maybe the incipient belly can be staved off for a few more years!

Ooh, pretty things

Last week we received our wedding gifts delivery and put up a few pictures as mementos of the wedding and honeymoon (sorry Tom, your gift isn't up yet...). Particularly since I put up new kitchen shelves to accommodate the New Stuff and am basking in the associated masculinity of drilling and spirit-levelling, here are a few photos of no interest to anyone other than gift donors and the irrepressibly curious:

Living room Wedding photo and Lake Louise painting New kitchen shelves and STUFF! Bread maker and Athabasca Falls painting

Python indentation considered boneheaded

I've been using Python for maybe 4 or 5 years now. On the whole, the experience has been very positive: big pluses include the excellent (although rather stylistically disjoint) standard library; built-in collection types and list comprehensions; the experience, at least, of finding that duck typing actually "sort of works"; and the clean syntax. However, the "elegant" indentation-based scoping for which it's so famous is, all told, a very bad idea, regardless of what die hard Pythonistas may tell you.

Let's start with a consideration of syntax --- the most visceral feature of a language. Syntax is the most immediate feature of a language, and you know right from the start whether or not you like the feel of it. User interfaces appear secretly everywhere, from the symbols chosen to represent particular quantities in algebra to the syntactic sugar aspects of a programming language. Some experienced coders, especially those who know several languages well, may dismiss syntax-worrying as pointless and superficial, but I'm not so sure: a good syntax makes it not only fast to code up common tasks, but it will also emphasise the structure of an algorithm at a glance and be based on a few consistently applied core concepts. This is a lot more than lily gilding. Strictly, I can do anything in any Turing complete language, so the whole point of a good language is that it makes code for its target tasks readable, elegant and extensible: syntax plays a major role here. Python does well from this point of view: you can see right away that this is a language which doesn't render your own code unreadable when you go for a whole week without reading it. Contrast Lisp or Perl, or PHP: all serious languages suffering from serious syntactic defects (okay, so the most serious is the least heavily used... but that's got a lot to do with it having the worst syntax).

The one feature that everyone notices about Python is the indentation thing: scoping is denoted by indentation rather than braces or other explicit constructs. It's a feature that has dissuaded many potential users, who just think it's a bit too weird, despite the reassurances of the official tutorial. Well, I bit the bullet a while back and bought into the indentation thing for a few years. It was okay... actually, it was a non-issue: the indentation scoping seemed to work, provided your editor gave you a bit of help. However, recent experiences have convinced me that my gut reaction was right and that some sort of explicit block closure is required.

Here's my conclusion:

  • invisible markup is an accident waiting to happen; * scope structure should be unambiguous.

The first point is obvious in retrospect: one of the longest-running and most pointless debates in programming is the "spaces vs. tabs" indentation war. It shouldn't matter, yet everyone has an opinion on it, and despises the alternative. Personally, I'm a spaces-only kind of guy, and yes, I'm slightly militant about it as all good religious fanatics should be. The point is not really that one or the other is right, but that providing the opportunity to get confused between two kinds of invisible and barely distinguishable tokens is going to cause trouble some day. In most languages, this is fine but just a matter of aesthetics, and there are code formatters that will happily turn someone else's convention into yours and back if you're really that bothered.

Amazingly, Python does nothing in its syntax definition to sidestep the tabs vs. spaces war. It explicitly says that you can have both, mix them however you like, even change your definition of how much indentation each scope region needs, but make sure you follow the ensuing mess of rules. It's very cunning, and like most cunning things in computing, we'd be much better off without it. If a language designer can say things like "here is an example of a correctly (though confusingly) indented piece of Python code", there should be alarm bells going off in their head. I'm told that this also makes the definition of Python's EBNF grammar pretty grim, and I can believe that (though personally I can't even find reference to tabs, newlines and spaces in the official grammar document .) If more than one person, with different editor settings with regard to tab/space indentation edits the same Python code, a mix of spaces and tabs is pretty much guaranteed. In any other language, this doesn't affect the bvehaviour of the code: in Python the invisible markup can completely change the logic. Oops.

Second, since your blocks don't explicitly end, it's generally impossible to apply automatic reformatting of Python source to fix indentation screw-ups.

As a demonstration, how would you correct the indentation here (assume I'm just using spaces, so this is a relatively simple problem --- I'll improve this later!):

def myfunction(foo, bar):          foo.boing()         for i in bar.fizzle(foo):            baz = i**2

foo.wibble(baz) return foo, baz

If you imagine being a simple state machine, walking through this code, when you get to the foo.wibble(baz) line, have you or haven't you dropped out of the scope of the for loop? The difference is obviously significant, but you can't tell what was intended. Now, this is a pretty piss-poor contrived example, but I experienced this sort of thing for real recently and it was entirely because of working collaboratively via a version control system with someone whose editor liked to use tabs and 3-space indents together --- what the code would actually do was anyone's guess. In such a situation, it doesn't matter what Python's cunning rules say --- the correct answer could only be derived by re- considering the semantics of the function and "doing what makes it give the right answer". Essentially, you have to recode the function --- or your whole application, line by line, just by adjusting the indentation.

This is pretty idiotic, so why is it such a stubbornly established language feature? And it is stubborn:

andy@parity:~$ python      Python 2.5.1 (r251:54863, May  2 2007, 16:56:35)       [GCC 4.1.2 (Ubuntu

4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from future import braces File "", line 1 SyntaxError: not a chance >>>

Hmm. What's so wrong with encouraging good indentation, but actually delimiting block scopes explicitly? Maybe just that it's Python's "thing" --- but the cleanness of the rest of the syntax is Python's "thing" for me: the indentation is Python's "boneheaded, annoying thing". Consider this:

def myfunction(foo):          for i in range(10):              foo.process(i)          endfor          return

foo.result() enddef

Is that really so bad, Pythonistas? Really, this makes me want to find something else. Unfortunately Ruby, which has lots of lovely features, does explicitly close blocks and is very like Python in lots of ways, seems to belong to the Perl school of @cryptic :${modifie#rs}. sigh

Keyboard envy

Ooh mama - I was just looking up some more info on Colemak, especially to see if my GB layout hack is useful, and found this amazing keyboard linked from this handy forum post. Wow! Every key face is dynamic, with little colour LCD screens under each one, so if/when you update your software keyboard layout, your hardware keyboard changes to match! At over $1500, I don't think anyone will be getting me one for Christmas, though... but maybe one of these would be almost as good :-)

Oh yes, in other news I got married, spent a lovely 3 weeks in western Canada with my new wife, and met a bear while out mountain biking near Jasper. But you didn't need to know that, right? Photos will be available once I've torn my hair out over EXIF alignment headers a bit more... I'll get on to that again when I get back from the parallel programming course that's taking up this week.

My new keyboard layout: Colemak

In the interests of RSI-avoidance and just interest itself, I've finally made a move to ditch my QWERTY (i.e. "normal") keyboard layout. As Jared Diamond and others like to point out, QWERTY is an anti-engineered layout which became popular and obsolete at the same time. QWERTY typists have to work hard for no reason, risking RSI. Since I sometimes get typing-related wrist pain, a switch seems sensible.

I initially planned to switch to Dvorak, the most popular alternative layout, but a recent discovery of the Colemak layout, which is more computer shortcut-friendly, means that that has become my target layout. So far, so good: I've installed the X server keymap, adapted it a bit for my GB keyboard and moved the keys on my laptop. While it's painfully slow at the moment, I can already feel how much more efficient than QWERTY it is. It's bound to be a bit of a journey, so I'll post any interesting developments or experiences here. When my typing is back at more than 5 w.p.m., that is!

OS X, extern and autotools

Apparently, building C++ code against C or Fortran on OS X introduces an unexpected error of duplicate symbol declarations. "Apparently", because I don't have a Mac of my own anymore --- I'm relying on others' reports here. What seems to happen is that forward-declaring a non-C++ symbol using the extern keyword actually creates a fully-fledged symbol, and then the linker goes mental when it finds the "duplicate" definition --- i.e. the real one.

It seems that the MACOSX_DEPLOYMENT_TARGET environment variable can fix this if set to the appropriate OS X version number: 10.3, 10.4, etc. Here's a configure.ac snippet which apparently solves the problem for projects using GNU autotools:

## OS X
AC_CHECK_TOOL(SWVERS, sw_vers)
  if test x$SWVERS != x; then MACOSX_DEPLOYMENT_TARGET=`$SWVERS -productVersion | cut -f 1,2 -d.`
    AC_MSG_NOTICE([MACOSX_DEPLOYMENT_TARGET = $MACOSX_DEPLOYMENT_TARGET])
  fi

Oh good, more airport hassle

My sincerest apologies to the bunch of idiots driving gas canisters and burning cars around the UK at the moment --- if this is meant to either inflict any real damage or have me quaking in my boots it really isn't working. Grow up, please.

On the other hand, if the idea was to give airport security guards more reasons to confiscate my nail scissors, bottled water and armpit spray as suspected WMDs, its been a great success. Chances are if all Brits abroad from this point on have no access to deodorant, we'll be an international pariah before too long.

I'm so glad this has happened just in time for our honeymoon... travelling to and from Canada via a succession of foam- mouthed airport staff is bound to be a delight.

Defying OO convention: Hibernate and private data reflection

It's an oft-recited design principle when building object-oriented software that you should always protect a class' data members by making them private and only accessing them via public "get" and "set" methods. The mechanism by which this is achieved varies according to the language, but the idea is the same: if you access your data via methods rather than directly, then you have a lot more flexibility for refactoring later, without breaking your class interface. A less appreciated fact is, that as for pretty much every simple rule, there are a plethora of quite reasonable exceptions. In this article I'll focus on one such exception --- how maintaining object relationships with the Java Hibernate persistency framework is best done by directly accessing data fields and keeping them private! One characteristic of passing from being a novice programmer to being an experienced developer is learning to make decisions based on the logic behind such 95% rules rather than sticking rigidly to the letter of the law just because. It's a bit like in kung fu films --- the more styles you know, the better you're equipped to deal with difficult situations! The main class of exceptions in this case is where your object is little more than a glorified data container. If you're really sure that the current variables will forever be the relevant ones or, if not, that method wrappers will do little to protect you from refactoring anyway, then there's little point in typing all those extra lines and parentheses.

Another such situation turned up for me recently when using the Hibernate Java object persistency system. Hibernate is one of the many marvellous high-level libraries for Java, and is exactly the sort of thing whose emergence is making Java such a powerhouse "enterprise development" platform these days. (On the days when I have to write C++ and my worries are all on the level of "how do I make this string lower-case?", I really pine for Java, where the worrying things are so much more interesting.) Hibernate sits in-between a set of Java objects and a relational database and does the database magic for you, so that the developer relly just has to worry about the object semantics. It's very clever, and now can make use of the Java 1.5+ annotations framework, so you barely even need configuration files to describe the object-db mapping.

Natually, when I started working with Hibernate and my particular set of objects (the model behind HepData, I made Hibernate perform all the persistency operations via the public get and set methods. So far so good. However, I started noticing problems when using the "delete-orphan" relationship, getting an error message like this:

A collection with cascade="all-delete-orphan" was no longer referenced by the owning entity instance

delete-orphan should be a neat way to ensure that when you delete an object from the database, its "child objects", as defined by the Hibernate mapping, also get deleted. Clearly something was going on here that made Hibernate lose track of the objects it's meant to be managing. The answer, as provided by Scott Leberknight in this article, is that if you make Hibernate use the get and set methods to access object contents, then those methods had better not manipulate the data! In fact, it pretty much forces you to have get and set methods which look like

public Foo getFoo() {        return _foo;      }

public void setFoo(Foo foo) {        _foo = foo;      }

This rings a bell --- if we're not allowed to derive any benefit from the get and set methods, then what's the point in using them? Should we just expose all the data members of our Hibernate classes? Hardly nice: the whole idea is that Hibernate is pretty transparent, so if it starts making major impositions on the public interface of our classes then it's doing a pretty piss-poor job. Fortunately, life is nicer than this, and Hibernate is still an excellent tool. But first a momentary diversion on why the prospective loss of these get and set methods (particularly the set methods) might be a real show-stopper rather than just an unpleasant aesthetic constraint.

Anyone who's used Hibernate in anger, or at least had a good read of the manual, won't be surprised to hear that the key issue is bidirectional relations between objects. Hibernate does a damn good job, but it's not magic and relationships between objects still need to be handled in your Java code. For example, if you have a object of class type Parent and it contains a collection of several Child objects, then there is a one-to-many relationship defined between the Parent and its children and you would tell Hibernate about this relationship. Obviously, in Java-land the parent can always find its children, because it has a data member (the collection) which contains the references to them. But what about the reverse? In pure Java terms, if you acquired a reference to one of the children there's no way to find out which Parent object "owns" it. For this reason it's a nice idea to add a "back-reference" from the child to its parent, say via a private Parent _parent data member in Child.

This is all very nice, and undoubtedly good practice, but now we have a new issue: if we start adding or removing Childs from the collections in Parents, we'd better make sure that the back-reference is kept up to date. This won't happen automatically, so some code will be required to ensure that this relationship is kept consistent. Such a consistency operation is sometimes described as an "invariant", and is exactly what get and set methods are best employed to enforce. Here's an example --- first we'll define the appropriate bit of Parent:

public class Parent {        private SortedSet<Child> _children;

  public SortedSet<Child> getChildren() {          return _children;        }

  public Parent setChildren(SortedSet<Child> children) {          _children.clear();          for (Child c :

children) addChild(c); return this; }

  public Parent addChild(Child child) {          if (child != null) {            child.setParent(this);          }

return this; } }

Note here that I'm being a bit careful about testing for nullness (but not as careful as I'd really need to be), I'm delegating the set method to a more "atomic" addChild method, and addChild itself calls an as-yet mysterious setParent method on Child. I've also used the "return self" idiom on the set methods, just because I think it's a nice thing to do :-)

Now for Child, and in particular that setParent method:

public class Child {        private Parent _parent;

  public Parent getParent() {          return _parent;        }

  public Child setParent(Parent parent) {          if (parent != null) {            // I should probably remove

myself from the current parent, too... but I won't! _parent = parent; _parent.getChildren().add(this); } return this; } }

You can probably see that this is more complex than you'd expect for a boilerplate operation --- unfortunately that's just life at the moment, although there may be code-generation frameworks which take some of the pain out of this sort of thing. You can also see that this essential consistency operation is exactly the sort of thing that makes Hibernate throw a wobbly if it's trying to access the data using the same get and set methods. Oops.

Fortunately, as alluded to above, there is a neat answer: Hibernate only talks to the objects for the purposes of persisting them to and from the database --- these relationship semantics only exist in the pure Java part of the system. So, if we make Hibernate talk direct to the fields, and only let the objects talk to each other via the consistency-enforcing interface methods, then Hibernate will only ever have to deal with consistent data structures. What's more, and this is the neat bit, Hibernate can even persist private fields! It does this via the magic of reflection, which is an excellent example of how Java's richness and flexibility as an application platform can allow clever applications to do good things in the best possible way. See this article and this one for discussion of this issue.

Using JPA and Hibernate annotations, we can then add Hibernate mappings to our classes and Hibernate will talk direct to the fields. If that appalls you --- after all, shouldn't data members always be accessed through public wrapper methods? --- then think again about what I had to say about rules and their exceptions at the start. We don't care how Hibernate does things, other than to be impressed by its cleverness: it's just a tool, and used this way it allows us to apply those rules to our part of the code more robustly. Here's the annotations:

@Entity      public class Parent {        @OneToMany(mappedBy="_parent")        // We'd also want Hibernate-specific

annotations for cascade and sorting private SortedSet _children; ...

and

@Entity      public class Child {        @ManyToOne        private Parent _parent;        ...

The only problem is that now your Hibernate HQL queries will have to refer to properties by their raw field name, conventional leading underscore included. It would be nice if there was an annotation for providing an official property name when declaring a property, to solve this aesthetic problem and protect against external susceptibilities to internal variable names, but on the whole it works pretty well. Or at least, it does for us --- we're now working on a different kind of Hibernate problem. One's work is never done, eh?

As a last note, while trying to solve this problem in the first place I found the articles cited above, which really helped. I also ran across Joel Spolsky's article on leaky abstractions, which for some reason was new to me. It's probably nothing new to anyone who might read this, but I like how he expressed the idea, so maybe you'll find it interesting, too. I can only hope that this article has been comparably informative and entertaining :-)

Covariant returns and method chaining in Java

As of version 1.5, or "5" as the marketing people have it, Java has been equipped with covariant method return values. Not a particularly obvious name, is it? What this means is that if a method of class Base returns an instance of class RetBase, then a derived class of Base (let's call it Derived) can implement a method with the same signature which returns a derived class of RetBase, rather than a RetBase itself (let's call it RetDerived).

All very well, but does this solve any problems?

Well, undoubtedly there are cases where a derived class would like to be able to specialise the return type of a method specified in its superclass/interface. One of these is the "self return", which I'm particularly fond of. This is essentially a trick to allow several lightweight method calls to be conveniently chained into one statement, which can be nice when all you're doing is using a bunch of property setters and it's arguably more readable to have them all on one line for once. Here's a demo:

public class MyClass {
    private int _a;
    private String _b;

    public MyClass setFoo(int a) { _a = a; return this; }

    public MyClass setBar(String b) { _b = b; return this; }

    void main(String[] args) {
        MyClass baz = new MyClass();
        baz.setFoo(42).setBar("Special number");
    }
}

You get the idea. I feel like I'm missing out on something if I return void from set methods!

There is an issue with the self-return in an inheritance setting, though, which is this: if I call a method of the base class in a method chain, the returned object will be a reference to the base class, and the character of the chain will change. Once a base class method has been called in the chain, only methods declared in the base class can be called afterwards.

public class MyBase {
    private int _a;
    public MyBase setFoo(int a) { _a = a; return this;}
}

public class MyDerived extends MyBase {
    private String _b;
    public MyDerived setBar(String b) { _b = b; return this; }

    void main(String[] args) {
          MyDerived baz = new MyDerived();
          baz.setFoo(42).setBar("Special number"); //< Error: setFoo() returns a MyBase
    }
}

This can be pretty surprising, when you forget to leave calls to base class methods to the end of the chain, so if you're serious about chaining you'd better make use of the covariant return and trivially override base class methods in each derived class. For example, in MyDerived:

public MyDerived setFoo(int a) { return super.setFoo(a); }

Of course, you could always stop chaining methods, but that wouldn't be half as much fun, would it?