ROOT, eight years later

It's now nearly eight years since my infamous flame war with ROOT developers and their hangers-on on the now defunct ROOTtalk mailing list, which in itself was a response to this web page: http://www.insectnation.org/howto/problems-with-root . The main discussion is archived here and I found it interesting to read again, for the first time in that 8 year interval. What I'm glad to see, in retrospect, is that I was relatively calm and balanced, by comparison to the ROOT fanboys lining up to be snarky. And I still pretty much 100% disagree with all Rene Brun's responses which sadly don't cover the full range and go a bit straw man and ad hom in places.

Anyway, with the recent release of ROOT6, the first major major version since that discussion, and a few interesting user/developer posts on the ROOT blogs which I recently encountered, I thought I might wade ill-advisedly back into the debate and see what has changed in that time.

In the intervening years I have used ROOT, but not that much. It's remarkable how, in a field which is almost totally dominated by ROOT data, you can live to a large extent without using it directly. Partly this is because some fraction of my data wrangling work has been on Rivet and friends, but it's really because every time I have to engage with ROOT (which I usually do via PyROOT these days, unless writing histogramming code within ATLAS software) it's such a frustrating fight that I walk away swearing "never again". So at least that headline experience has not changed for me in the last decade, which of course is a shame; after the reception that it got, I never imagined that my (I thought reasonable and relatively sober) critique would change the world, but there was always the glimmer of hope that it might change something. In fact, I've kept occasionally trying to improve ROOT by reporting bugs in histogram behaviours, interface designs incompatible with ROOT's own TString class, rendering defects, PyROOT installation issues, etc. Not one has ever been acted on, so you'll excuse me for some scepticism re. that legendary level of user support.

In fact, for Rivet we couldn't convince ourselves that any of the ROOT histogram limitations would be fixed (and I think that was a good call), so we wrote our own histogramming package which if you ask me is at least an order of magnitude nicer to use, and a lot more powerful. I'm biased, of course, and the meaning of "more powerful" needs clarification: we don't have more tools for doing things with histograms -- ROOT leads the world in monolithic bloat for that purpose -- but the histograms themselves are far more powerful data objects. Key to this was a lot of design iteration over several years: exactly the degree of reinvention and renewal that ROOT never had. In the world of "design one to throw away; you will anyway", ROOT's histogram classes are the abortive first version that never got chucked. Evolution would have been possible, e.g. by making new THist1 etc. types with converters to/from the old ones, and gradually encouraging everyone to use the new versions. This isn't super-hard, so I have to conclude that one of ROOT's two main use-cases has not been a major developer priority in the last 20 years. That's quite something!

Why the obsession with histograms? Well, what is ROOT for, fundamentally? Two (or three) things:

it's a data format for ntupling and histograms;
a binning/aggregation (histogramming) tool for that data;
and a data-plotting interface.

Only the first of these is really a success, and it doesn't look that impressive compared to more modern persistency systems. The jury is out on optimal performance on HEP data, but the usability remains... very 1991.

On the other aspects, ROOT fails as a library because it can't make up its mind about whether it's a application or an API, and a clear picture of layering those things (making the root program a public client of its own API) was clearly not part of the design process. Despite claims, ROOT also really is not at all a modular system, but one of the most monolithic I've ever seen: it doesn't play well with others, hence having "needed" to assimilate the CERN Minuit and MathCore packages, making them less useful into the bargain: a whacking great dependency on the control-freaky ROOT system is a very different business to a dependency on a sub-MB focused util library.

ROOT's plotting output quality and particularly the typography also remains terrible, and obviously terrible by contrast with other tools such as matplotlib, tioga, pgfplots, and even Rivet's make-plots tool (though that is intentionally restricted in capabilities). The really remarkable thing is that no-one seems to be at all exercised by these usability and quality issues, with much of the effort in recent years focused on updating the C++ interpreter... which is neat work, I guess, but a better implementation of a bad idea is still a bad idea! Who in their right mind wants to use C++ as their main interface in an interactive session?!

Anyway, a pity that there's no apparent hunger for change and improvement: this is one of those situations where I'd be more than happy to be proven wrong! FYI, here are a few interesting discussions from the ROOT forum archives, showing I'm also certainly not the only one pining for something better implemented and more user-focused:

http://root.cern.ch/drupal/content/do-we-need-yet-another-custom-c-interpreter#comment-1028
http://root.cern.ch/drupal/comment/reply/882/1038
http://root.cern.ch/drupal/content/root6-and-backward-compatibility#comment-1063

UPDATE: ROOT's Drupal comment threads seem to have been purged. For reference interest, here's the reply I'd posted to the last one:

Hi Axel,

As you might guess from past interactions, my sympathies are firmly with Code Monkey. When I have to deal with ROOT
directly rather than in a way that's been heavily wrapped by my experiment, I typically spend a day or two banging my
head against the screen trying to figure out what workaround I need to make to do something that should be simple.
That's not code purism, but frustration at it being harder to get a science result out of ROOT than it should be. If you
think that "novice users" aren't driven nuts by ROOT's idiosyncracies, you need to talk to more users!

Typically it's the histogram class hierarchy, incompleteness of histogram info, or object ownership that creates the
problems and I agree that these aren't things whose behaviour can be changed without breaking a lot of existing code.
But I think they do need to change -- well that's obvious, I wouldn't have written YODA http://yoda.hepforge.org if ROOT
could do what I needed. But the same re-invention can also take place in ROOT itself: the TH1 <- TH2 etc. hierarchy for
example could be profitably re-thought with 25 years of feedback in the form of entirely independent new THist1 etc.
classes. New users could be encouraged to use those, while the reams of old TH1 code will continue to work until you
finally withdraw support 10 years from now.

I have always thought that ROOT would serve scientific needs far more flexibly as a less monolithic system: at the
moment you interface your code to ROOT, it starts demanding the right to handle the command line; you have to either
fight hard or capitulate re. object ownership, threading becomes super-hard, etc. For me, certainly, this has been too
high a cost for using what ROOT provides in our projects -- and we really agonised over the decision, because writing
your own histogramming library is not an easy route to take! If I could link YODA/Rivet/etc. against a single component
lib of ROOT to get e.g. minimiser or FFT functionality without any of the "system" stuff, I would be very happy indeed.
Heck, maybe I would even find a use for cling, although I still reckon interpreted C++ is a bit of a red herring -- if
you're going to write interpreted code and use JIT compilation, then why not use a friendly and expressive language like
Python, via PyPy?!

Maybe the rise of other tools for data handling (e.g. ProtoBuf, a4, YODA, Pandas, Julia, ...) and presentation
(matplotlib, tioga, pgfplots, ...) means that the role of ROOT will change. I think the statements about ROOT being
faster than ProtoBuf are quite fragile: a couple of years ago when studies first came out showing significant
improvements over ROOT I/O speed, the authors got all the optimisation tricks they could from the ROOT authors to make
the comparison as fair as possible. ROOT was still whupped. But due to the difficulty of replacing the data types
through the whole LHC chain, of course ROOT has the incumbent defence of being the only one used on so large a scale:
there's a Zeno-type argument for never changing! I suspect there is a reason that Google and Facebook do not use ROOT
I/O internally. One aspect is of course the monolithic nature again, the useful standalone RIO (ROOT IO) library having
been hounded out of existence many years ago for daring to be genuinely modular. Looking forward to seeing how all this
pans out in the next few years, and hoping that PyROOT will improve as one of the impacts of cling.

http://pandas.pydata.org/  http://julialang.org/