ROOT, eight years later

It's now nearly eight years since my infamous flame war with ROOT developers and their hangers-on on the now defunct ROOTtalk mailing list. The main discussion is archived here and I found it interesting to read again, for the first time in that 8 year interval. What I'm glad to see, in retrospect, is that I was relatively calm and balanced, by comparison to the ROOT fanboys lining up to be snarky. And I still pretty much 100% disagree with all Rene Brun's responses which sadly don't cover the full range and go a bit straw man and ad hom in places.

Anyway, with the recent release of ROOT6, the first major major version since that discussion, and a few interesting user/developer posts on the ROOT blogs which I recently encountered, I thought I might wade ill-advisedly back into the debate and see what has changed in that time.

In the intervening years I have used ROOT, but not that much. It's remarkable how, in a field which is almost totally dominated by ROOT data, you can live to a large extent without using it directly. Partly this is because some fraction of my data wrangling work has been on Rivet and friends, but it's really because every time I have to engage with ROOT (which I usually do via PyROOT these days, unless writing histogramming code within ATLAS software) it's such a frustrating fight that I walk away swearing "never again". So at least that headline experience has not changed for me in the last decade, which of course is a shame; after the reception that it got, I never imagined that my (I thought reasonable and relatively sober) critique would change the world, but there was always the glimmer of hope that it might change something. In fact, I've kept occasionally trying to improve ROOT by reporting bugs in histogram behaviours, interface designs incompatible with ROOT's own TString class, rendering defects, PyROOT installation issues, etc. Not one has ever been acted on, so you'll excuse me for some scepticism re. that legendary level of user support.

In fact, for Rivet we couldn't convince ourselves that any of the ROOT histogram limitations would be fixed (and I think that was a good call), so we wrote our own histogramming package which if you ask me is at least an order of magnitude nicer to use, and a lot more powerful. I'm biased, of course, and the meaning of "more powerful" needs clarification: we don't have more tools for doing things with histograms -- ROOT leads the world in monolithic bloat for that purpose -- but the histograms themselves are far more powerful data objects. Key to this was a lot of design iteration over several years: exactly the degree of reinvention and renewal that ROOT never had. In the world of "design one to throw away; you will anyway", ROOT's histogram classes are the abortive first version that never got chucked. Evolution would have been possible, e.g. by making new THist1 etc. types with converters to/from the old ones, and gradually encouraging everyone to use the new versions. This isn't super-hard, so I have to conclude that one of ROOT's two main use-cases has not been a major developer priority in the last 20 years. That's quite something!

Why the obsession with histograms? Well, what is ROOT for, fundamentally?

fundamentally two (or three) things

data format for ntupling and histograms

data histogramming

data plotting

Only the first of these is really a success, and it doesn't look so impressive compared to modern persistency systems. The jury is out on optimal performance on HEP data, but...

New plotting?

Modularity

red herrings: C++ interpreter? Why?!? The amount of manpower wasted on this white elephant is just awesome. The new version will JIT interpret and compile C++... technically cool, but it's still an awful language for HEP scientific use, and who in their right mind wants to use it as their main interface in an interactive session? Consider the difficulty in just getting a list of keys in a file:

command line grabbing

excessive verbosity

browsing

sampling

http://www.insectnation.org/howto/problems-with-root

http://root.cern.ch/drupal/content/do-we-need-yet-another-custom-c-interpreter#comment-1028 http://root.cern.ch/drupal/comment/reply/882/1038 http://root.cern.ch/drupal/content/root6-and-backward- compatibility#comment-1036

Hi Axel,

As you might guess from past interactions, my sympathies are firmly with Code Monkey. When I have to deal with ROOT directly rather than in a way that's been heavily wrapped by my experiment, I typically spend a day or two banging my head against the screen trying to figure out what workaround I need to make to do something that should be simple. That's not code purism, but frustration at it being harder to get a science result out of ROOT than it should be. If you think that "novice users" aren't driven nuts by ROOT's idiosyncracies, you need to talk to more users!

Typically it's the histogram class hierarchy, incompleteness of histogram info, or object ownership that creates the problems and I agree that these aren't things whose behaviour can be changed without breaking a lot of existing code. But I think they do need to change -- well that's obvious, I wouldn't have written YODA http://yoda.hepforge.org if ROOT could do what I needed. But the same re-invention can also take place in ROOT itself: the TH1 <- TH2 etc. hierarchy for example could be profitably re-thought with 25 years of feedback in the form of entirely independent new THist1 etc. classes. New users could be encouraged to use those, while the reams of old TH1 code will continue to work until you finally withdraw support 10 years from now.

I have always thought that ROOT would serve scientific needs far more flexibly as a less monolithic system: at the moment you interface your code to ROOT, it starts demanding the right to handle the command line; you have to either fight hard or capitulate re. object ownership, threading becomes super-hard, etc. For me, certainly, this has been too high a cost for using what ROOT provides in our projects -- and we really agonised over the decision, because writing your own histogramming library is not an easy route to take! If I could link YODA/Rivet/etc. against a single component lib of ROOT to get e.g. minimiser or FFT functionality without any of the "system" stuff, I would be very happy indeed. Heck, maybe I would even find a use for cling, although I still reckon interpreted C++ is a bit of a red herring -- if you're going to write interpreted code and use JIT compilation, then why not use a friendly and expressive language like Python, via PyPy?!

Maybe the rise of other tools for data handling (e.g. ProtoBuf, a4, YODA, Pandas, Julia, ...) and presentation (matplotlib, tioga, pgfplots, ...) means that the role of ROOT will change. I think the statements about ROOT being faster than ProtoBuf are quite fragile: a couple of years ago when studies first came out showing significant improvements over ROOT I/O speed, the authors got all the optimisation tricks they could from the ROOT authors to make the comparison as fair as possible. ROOT was still whupped. But due to the difficulty of replacing the data types through the whole LHC chain, of course ROOT has the incumbent defence of being the only one used on so large a scale: there's a Zeno-type argument for never changing! I suspect there is a reason that Google and Facebook do not use ROOT I/O internally. One aspect is of course the monolithic nature again, the useful standalone RIO (ROOT IO) library having been hounded out of existence many years ago for daring to be genuinely modular. Looking forward to seeing how all this pans out in the next few years, and hoping that PyROOT will improve as one of the impacts of cling.

http://pandas.pydata.org/ http://julialang.org/