XML: restraining the knee-jerk reflex

XML is a useful technology, sometimes: that's about as positive as I can be about it these days. While there was a period when I got quite excited about the idea of a standard syntax for, well, everything, time tempers such enthusiasm. See, 90% of the time, XML is just too damn cumbersome. When what you want to say is

Param1 = 3

or something of that complexity, then having to write

<param name="Param1" value="3"/>

is just a bit hefty.

Anyone who's ever tried using Ant or Maven will surely sympathise with the idea that XML is a pain in the arse way to write make-files. And XSL transforms? Phew, glad I'm not going back into that arse-end of software engineering gone mad. It's also a pain in the arse to read XMLified data in C++ or Fortran. Hell, even Python and Java make you jump through SAX or DOM-shaped hoops in their lowest common denominator implementations! Frankly, XML is a serious candidate for "no silver bullet"-type debunking: merely wrapping everything in angle brackets actually solves nothing, even if it looks totally SOAP-AJAX-Web 2.0, dude. Much of the time, something simpler is quite enough. Unfortunately, the message hasn't quite drilled through to everybody's cortexes (cortices?) yet, and I saw several HEP presentations recently where some data was marked up in XML as if that was an achievement in itself. What do you want... a biscuit? Sadder still, I was told by a starry-eyed student that this was a great way to provide a unified data format. To take this out of the HEP context, lets say I invented a new XML-oriented data format, EML. It's short for "Everything Markup Language", dontcha know?

EML is super-whizzy-clever: it's in XML for a start, which means immediately that it's 21st century (or beyond) and future-proof, unlike all those column-delimited plain text data files that insufficiently technical people keep bandying about. You can parse it with all sorts of tools (but not without pain if you use an old-style language whech actually compiles to machine code, duh), and there is future potential for some sort of Web services asynchronous HttPRequest coolness. Frankly, it's genius. And so simple! See, all you have to do to make a file in EML, is to take its normal binary representation, convert any accidental non-ASCII bits to XML entities, and slap <eml>...</eml> around it. Not forgetting an XML namespace declaration, of course: those are really useful, and really provide a good mechanism for schema evolution. Tada! Instant interoperability.

If this doesn't seem immediately stupid, please give yourself a good slap and get a job somewhere where you don't have the option to impose your half-wittedness on anyone impressionable enough to believe in this garbage --- the flow must be stemmed!

In short, if you have very hierarchically structured data, which is only likely to be processed in languages which provide easy XML parsing, and no-one is likely to have to write (or read, 90% of the time) the format by hand, then XML may be for you. Otherwise, you would do well to restrain that knee-jerk XML reflex and really think about how you would have best described your data in a world where XML never existed. Believe me, if this is news to you, you'll thank me for it.

Comments

Comments powered by Disqus