.. title: The problem with ROOT (a.k.a. The ROOT of all Evil)
.. slug: problems-with-root
.. date: 2007-08-27 19:13:46
.. type: text
.. category: blog


**This page was written in about 2003, and has been little updated since
then. If I write another ROOT critique, I'll do it in a separate location,
since this one is interesting as an historical artifact.**

-----------------------

**Update (03/08/2006):** I thought I should update this page after some
substantial discussions on the ROOT mailing list about this page and, more
particularly, the `ROOT page on Wikipedia
<http://en.wikipedia.org/wiki/ROOT>`_, to which I had added a criticism
section. Here are some Web links to the mailing list archive:

- `ROOTtalk 06 mailing list <http://root.cern.ch/root/roottalk/roottalk06/index.html>`_: search for "Wikipedia criticism" to find the threads.
- `First post in thread <http://root.cern.ch/root/roottalk/roottalk06/0763.html>`_: later becomes mainly a discussion about supporting the FITS format in ROOT
- `The main discussion thread <http://root.cern.ch/root/roottalk/roottalk06/0782.html>`_: separate because I wasn't originally on the list. Rene promises to respond to the criticisms, but doesn't. In fact, they were never addressed.
- `The ROOT Wikipedia talk page <http://en.wikipedia.org/wiki/Talk:ROOT>`_: also contains some very informed discussion (and also some not so informed!)

But maybe I just like it because it includes phrases like this:

    In my experience, many people who use ROOT at least have vague feelings that it is making their life
    more difficult than it rightly should. Nearly everyone I know that writes code
    that other people use feel even more strongly that ROOT's poor design leads to
    productivity losses. *I grant that it is less frequent that someone levels
    the criticism as succintly and accurately as Andy.*

Ha! And now the original page:

--------------------------

This piece is aimed at researchers, primarily in high energy physics, who use
the ROOT analysis software and find it lacking in lots of ways. I'm going to
whinge about the things that annoy me in ROOT and then suggest a few of the
things that I do to minimise the pain. Suggestions and comments are very
welcome. Apologies in advance for the fact that this article may be implicitly
HEP-political, but I genuinely believe that ROOT's poor design is a very
dangerous thing for particle physics and the other disciplines that use it.

Before beginning, I should point out that these are simply my own views and that
I hold no animosity against the developers --- their design simply doesn't work
for me. Presumably there are many people "out there" who think ROOT an excellent
piece of software. In _complete_ honesty, though, I have yet to meet any of
them. In fact, I've never had any complaints that this article mis-represents
ROOT, and I've had a fair bit of "fan mail", not mention discussions with
well-respected developers and physicists who hold precisely the same views :-)
If you feel this way, then you might also be interested in my articles on
`dealing with some of ROOT's flaws </articles/basic-root.html>`_ and `wishlist
for things to be fixed </articles/root-wishlist>`_.


The problem(s) with ROOT
------------------------

ROOT can be an awkward piece of software --- but unless you want to use the
defunct PAW program for your data analysis there's not really anything else
around that handles histograms and tuples in the way that particle physicists
have come to expect.

Some of the *ideas* in ROOT are good --- a data analysis application, with an
open, robust (in principle) API, useable as stand-alone modular
libraries. Great! However ROOT has failed to meet its promise for several reasons:

* it retains a great deal of legacy behaviour from PAW, which is commonly
  acknowledged to be "rubbish, but we've inherited 20 years' worth of
  work-arounds"

* rather than focus on providing a stunningly good set of these core facilities,
  the development team has proceeded to add more and more arcane functionality
  to the ROOT kernel (e.g. for GUI-building). It looks to me like ROOT's design
  is largely driven by the needs of the Alice collaboration, with a monolithic
  "many into one must go" design model, which is quite depressingly inflexible
  and unscalable if true.

* ROOT's class structure is very broken: it bears all the hallmarks of a project
  to learn C++ and OO programming that was never thrown away (as they always
  should be) when the initial design mistakes were realised. I consider this one
  of the biggest problems; even if the code-bloat and interface mis-design
  issues are fixed, the inherently broken class structure and functional
  delegation are still there and fixing them will more-or-less involve a
  ground-up re-write and a breaking of backwards compatibility with existing
  "ROOT-leveraging" code. Or just use something else.

* insistence on re-inventing the wheel: there are plenty of external projects
  which supply alternative, better-developed and standardised functionality
  which has also been developed in the context of ROOT. Admittedly, in some
  cases this is due to ROOT having started first... but they should know when to
  swap.  Examples are the C++ STL objects, e.g. :code:`std::string`, and
  containers, which are not properly supported in ROOT (see the next point);
  data formats and interfaces like AIDA, FITS and HDF5; and code documentation
  with Doxygen (ROOT's own C++ documentation class is a travesty by comparison
  with Doxygen's syntax and flexibility).

* "(Matsuhara) Goto considered harmful". The ROOT team continues to insist that
  ROOT's natural runtime environment is the CINT interpreter. I disagree on
  several levels --- first the practicalities: CINT cannot and is unlikely to
  *ever* properly interpret ANSI/ISO C++. Its current deficiencies makes several
  possible ROOT improvements difficult or impossible, and have forced design
  decisions which would never have been made otherwise, most obviously the lack
  of real C++ templation support, and hence STL objects. Only pre-compiled
  faux-STL objects are possible from within CINT, and hence ROOT's own classes
  cannot be template-based or use the STL.  CINT also encourages sloppy coding
  style (no pointer/object distinction, no required semi-colons, ...) which
  makes conversion to proper C++ code non-trivial. A second level of criticism
  is that C++ is a deeply inappropriate language for a high-level activity like
  data analysis: it's syntactically complex and forces explicit memory
  management by the user (while histogramming! and made harder by ROOT's
  ownership semantics). This is largely alleviated by PyROOT, although CINT is
  for some reason still the main interface and the underlying classes are still
  sub-optimal.

I will now consider several of these points in more detail:

General design issues
---------------------

* Why, oh why is it called "ROOT"? If there is one name that is guaranteed to
  cause confusion, conflicts and general aggro, it's choosing the same name as
  the system admin account. I'm almost surprised that the Windows version isn't
  distributed under the name "Administrator". The only worse name I can think of
  is ":code:`/`".

* The whole system is huge and bloated. What most physicists want from ROOT is
  not a GUI-building system, but a statistical data analysis system. That would
  involve providing a large array of statistical analysis tools, wide support
  for input and output formats (including the AIDA interfaces and data formats
  like FITS, HDF4/5 and plain text comma/tab/etc.-delimited data). Basic 1, 2
  and 3D plotting, contour plots, pixel plots and suchlike would also be nice,
  but there are external systems that can do that very well given a standard
  output format, so why re-invent the wheel? (ROOT could always build its
  plotting functionality on external libraries.)

* Why is CINT's pseudo-"interpreted C++" considered a good user interface? C++
  is good if you want to write compiled programs, but I can't imagine it's ever
  been thought of as a *good* language for interactive commands: much of the
  syntax is designed to enforce strong type-safety and various code-reuse
  software engineering solutions that no-one whats to have to think about when
  they try to plot a dataset, or calculate a statistical measure. Why not write
  the backend code in C++, provide a Python-C++ interface for those who want to
  do things with full programmatic power (Python because it was actually
  designed as an interpreted language) and provide a simple "gnuplot-style"
  command interface for the basic stats, data I/O and plotting functionality?
  (Actually, ROOT does now have a Python interface, but the class structure is
  so poor that it doesn't make it much easier to use -- and you have to deal
  with type mismatches, too, since the binding hasn't been written very
  well. It's nicer to use... just. I think that the class structure would make a
  decent gnuplot-like interface hard to do, as well, hence my comment above that
  even if all the other issues are dealt with, the underlying classes are so bad
  that ROOT is probably unfixable without breaking all backwards compatibility).


Class structure issues
----------------------

* No native STL support, even where it could be introduced seamlessly,
  e.g. :code:`std::string` function arguments can transparently handle
  :code:`char*` old-style C strings and are much safer and more powerful. [1]_

* Perverse inheritance structure: is a 2D histogram *really* a kind of 1D
  histogram? ROOT thinks so, to the extent that 1D histograms (happily available
  in :code:`TH1F`, :code:`TH1I` and so-on flavours for floats, integers etc. ---
  a prime case for templation) contain an accessor method for the histogram's
  z-axis. Just don't touch that method if your histogram is *really* 1D! I would
  love to see a :code:`THistogram` abstract base class for all histograms (or
  even better, a :code:`ROOT::Histogram`, but namespaces also seem to have
  passed them by).

* No separation of data and presentation: if you want to ensure that the data in
  a histogram is unmangled by declaring it const, then you can't change its plot
  style either because there's no separation between the data part of a
  histogram and its presentation. Other systems do this much better, with some
  separation like :code:`Histogram` objects for the data container and e.g. a
  :code:`HistogramPainter` object which contains *all* the presentation
  aspects. This also adds the flexibility of modular design.

* Should classes have 300 methods? ROOT thinks so. This is largely due to a flat
  and monolithic design whereby hundreds of convenience methods are designed
  which simply pass on the work to other classes. For example, histograms can
  fit themselves to mathematical functions --- why not a :code:`Fitter` class?
  Well, there is one, but ROOT is "helpful" enough to hand all its methods on to
  unrelated classes like histograms, too. It breaks a major, empirically
  successful rule of software design: each object should do one job and do it
  well.

* Another rule of OO design broken --- ROOT will happily delete objects that
  it's given, even if you want to use them again. Take this, for example:

  .. code-block:: c++

      void test(TH1* histo1, TH1* histo2) {
        THStack* hs = new THStack();
        if (0.5 < rand()) hs->Add(histo1); else hs->Add(histo2);
        delete hs;
      }

      int main() {
        TH1* histo1 = new TH1F(/* ... */);
        TH1* histo2 = new TH1F(/* ... */);

        test(histo1, histo2);

        delete histo1;
        delete histo2;

        return EXIT_SUCCESS;
      }

  The code will core dump either on :code:`delete histo1` or :code:`delete
  histo2`, because the :code:`THStack` destructor deletes the contained elements,
  even though it doesn't own them.  To use code like this, the :code:`test` method
  has to copy the passed histos, a needless waste of memory and CPU. Gah.


Functionality issues
--------------------

* Dreadful default plot style: you might think that, data presentation being
  almost the primary reason for ROOT's existence, it might be good at it. Well,
  for some reason the default plot style is unfeasibly ugly (grey background?!)
  and difficult to fix. In fact to fix it you have to go via several global ROOT
  objects. Gah.

* Awkward ntuple handling: in particular handling indexed tuple entries is a
  nightmare of obsfucation.

* What's with the "T" prefix on everything? Hello? Even CLHEP has got the hang
  of namespaces now: I would much rather deal with a :code:`ROOT::Tree` than a
  :code:`TTree`. Update: I think it's now :code:`ROOT::TTree`, which misses the
  point even further.

* Dataset error handling is very dangerous: binomial errors are calculated when
  a histogram is filled and aren't updated thereafter, presumably because the
  user might have over-ridden the error-settings by hand. This means that if you
  re-scale your histogram by 0.001, the errors are likely about 100-1000 times
  bigger than the data peaks! Solutions might be to always re-scale the data
  properly and to provide histograms with an error-calculating functor or member
  function (or a set of such things). That way a histogram could be sub-classed
  and the error handling over-ridden in a scalable way. There's a mismatch here
  between simple user interfaces and software engineering, but since it ends up
  mapping on to the same dichotomy between getting the wrong result or the right
  one, I know what I'd pick.

* Unusability of the ACliC compiler: for increased performance, ROOT can call
  ``g++`` from within CINT and compile your ROOT macros. You'd think that that
  might involve taking your single file with a bunch of user macros and building
  a binary library file from them, i.e. adding the standard C++ and ROOT header
  :code:`#includes` and so-on behind the scenes so that any macro that will run
  in CINT can be compiled in ACLiC. But that isn't the case: ACLiC needs the
  full set of header declarations that a full C++ program needs to already be in
  the file to be compiled.  And it can't handle the splitting of user classes
  into header and implementation files, which seems to be necessary. In
  addition, if ACLiC fails to compile your macros file (probably for one of the
  above reasons e.g. missing :code:`#includes`), then debugging the failure
  point in ACliC is very hard, specifically because it uses lots of temporary
  files but doesn't map the C++ compiler errors back to the CINT macro file, so
  the reported error won't be easily reconcilable with any of your input
  files. Aaargh.  In short, ACLiC requires you to have written your macros as if
  they're C++ programs to be compiled (with full C++ syntax strictness: none of
  the sloppiness encouraged by CINT will work), but actually makes things harder
  for you than if you ran the C++ compiler explicitly because it obsfucates the
  compiler output. Nice one, ACLiC.

* What's up with the whole "passing processing directives by string" rubbish?
  For example, to plot a histogram stack (on to the "current canvas" --- a
  typical example of global scope in action) I might call this monstrosity:

  .. code-block:: c++

      _hs->Draw("HIST,E,9,NOSTACK");

  What sort of argument is that? For starters I don't get to specify which
  :code:`TCanvas` to draw it on to; instead I have to do some sort of hideous
  :code:`gROOT->cd("mydirectoryname");` thing first. And second, there's
  absolutely no type safety: that string has to be decoded at runtime and the
  parsing rules are not clearly defined. A set of class enums or, better, a
  config object (or collection thereof) would be much safer.  Why would you do
  something as horrible as this? Yep: CINT and interactive use. Lovely.

* Global objects and some horrible concept of a "currently focussed directory"!
  Uurgh --- this sort of thing would be okay if all ROOT scripts were linear and
  no more than 20 lines long. But they're not and this is a truly nasty
  "feature".

* Why, in a C++ program, is there still a horrid mush of type-unsafety?  Reading
  objects out of ntuples requires lots of blind casts from ROOT TObject to
  whatever you *think* your persistified object is. This reminds me of C casts
  from <code>void*</code>, and that's simply unacceptable in a C++
  system. Surely there are other C++ persistence interfaces that don't have this
  problem (using RTTI or similar)?

**In short, ROOT sucks and isn't likely to change its ways any time soon. Sorry HEP.**


What to do about all this?
--------------------------

The best thing to do, in my opinion, would be to take what there is of ROOT and
to split it into a kernel and a set of modules and for the whole thing to take
the form of a C++ library rather than an executable. The executable is really
secondary to the class structure. In addition, the class structure needs
overhauled, STL compliance needs to be introduced, standard I/O formats and
interfaces need to be developed, external solutions need to be dropped into
place in many cases, and so-on. In fact, pluggable architectures like this have
given rise to excellent collections of user contributed modules elsewhere, so
it's a potentially rewarding move from a community standpoint, too. I can't see
it happening :-( </p> <p> Next-best, or possibly best given the unfeasability of
the above and the existence of better systems anyway, is to move your analysis
to a multi-stage one which ignores ROOT as much as possible, uses Hippodraw, JAS or the
BaBar StatPatternRecognition code [2]_ to do the statistical analysis, and uses
something like <a href="http://pyx.sf.net">PyX</a> or <a
href="http://tech-www.informatik.uni- hamburg.de/applets/jfig/">jFig</a> to
produce the publication-quality plots, again using a standard data file format
(or even just columned ASCII files) for communication in the final
step. Although these programs don't (currently) support 3D plots, I don't
believe that these often give information that can't be expressed more clearly
in several 2D plots. The exception is rendering of actual 3D systems like
detector structure, which admittedly can be useful in event reconstruction
analyses. **Update: I nowadays recommend SciPy, matplotlib and other
Python-based scientific tools to everyone unhappy with ROOT. They are simply
much better tools than anything HEP has yet produced.**

Actually, I like this "modular" statistical analysis and presentation idea most
of all: I've only put "rehacking ROOT" as the most desirable solution due to its
large, established user base, since personally I'm more than happy to leave ROOT
alone entirely. You might find my `list of HEP software
</articles/hep-software.html>`_ to be useful if you are similarly-minded. I see
definite parallels here with the Unix "small tools, each of which does its job
well" philosophy here; it's peculiar that high-energy physics has set its heart
so firmly on monolithic systems given a) its traditional centring around Unix
computing and b) the obvious success of the Unix philosophy. But maybe not that
surprising, given that many physicists treat computing methods with contempt, as
something that gets in the way of producing good work. Rant over!

As a next-to-next-best approach, if you really aren't *allowed* to use anything
other than ROOT (maybe you depend on a bunch of ROOT analysis macros written by
someone else, very probably you need to at least read ROOT data files), we can
try to use the good bits of what ROOT and minimise the interaction with the lame
bits. This primarily involves ignoring CINT entirely and using ROOT as a library
--- note that you will still have to deal with the world's worst class
structure! Hence, in addition I try to write STL wrapper classes of my own when
possible. This tends to occur on an ad-hoc basis. Note that if ROOT had been
done right in the first place, no-one would ever have to do any of these
things. You can find some workarounds described in `my article on basic ROOT
usage </articles/basic-root.html>`_, which in fact contains entirely of
workarounds since *any* attempt to do robust statistical analysis in ROOT is
made hideously complicated by its flaws! If I haven't convinced you of that by
now, I never will!

--------------------

Thanks for reading and please feed back your thoughts to me. Hopefully someone
will listen and ROOT can be made into a well-designed, robust data analysis
system for the LHC. [3]_


.. rubric:: Footnotes

.. [1] :code:`TString` is actually ok -- it can be implicitly constructed from
       both :code:`std::string` and :code:`char*`. But as of 2015 and ROOT6, it
       is still barely used in ROOT function signatures despite being the
       obvious ROOTy string type. I `reported this
       <http://savannah.web.cern.ch/savannah/HEP_Applications/savroot/bugs/99577.html>`_
       a few *years* ago...

.. [2] All these have since died, drained of nutrition by HEP's ROOT
       monoculture. But truly external tools like SciPy, NumPy, pytables,
       scikit-learn, Pandas, Julia, R, etc. are much better replacements. I like
       the ones that connect well to Python (and Cython, Numba, etc. deal nicely
       with performance bottlenecks). Doubtless this footnote will also go out
       of date rapidly... such is life, but I hope and expect the external
       situation will continue to *improve* and make HEP's resources seem
       increasingly anaemic.

.. [3] Nope.
