Sunday, January 27, 2008

the pains of xml processing

This is part one of my xml/e4x ramblings...

XML is all the rage these days. But, like many over-hyped technologies, it has become a victim of its own success, leading to all sorts of crazy ideas. If you find yourself using XML for anything other than document markup or data transport, worry.

The most common challenge you will face when dealing with XML is extracting the data. This is known as XML processing has often been accomplished by using XSLT, DOM, or SAX.

Do you speak XSLT?

XSLT is a bit like learning a foreign language:

  • declarative programming
  • strange and complex syntax
  • templates, patterns, and rules

The easiest way to scare you away from XSLT is with a tiny example. That's crazy, right? Using XSLT on XML looks a lot like "fighting fire with fire." Let's be honest, despite all the hype, XML is no joy ride; it can be so aggravating that it makes the Linux guy almost use the F-Word. Add XSLT to the mix, and who wouldn't start cursing like a sailor?

The dark days of DOM

So, with XSLT out of reach for most sane people, many turned to DOM (Document Object Model). The great thing about DOM is the consistent mapping between the structure of the XML and the structure of the DOM objects. Start at the root node, get the child nodes, then get the children of those child nodes. And on it goes.

Unfortunately, the DOM syntax almost encourages ugly code. Even the more respectable sites were putting out crap examples like this:

document.getElementById("to").innerHTML=xmlDoc. / getElementsByTagName("to")[0].childNodes[0].nodeValue;

If you've spent any time in DOM-land, you probably carry the shame of writing at least one of these brittle nuggets:

xml.firstChild.childNodes[2].lastChild.childNodes[4]. / getAttribute('i_hope_this_xml_never_changes');

SAX player

Now let's contrast the declarative style of XLST with the event-based approach of Serial API for XML (SAX). Using SAX reminds me of those old music boxes. The ones with the metal drums that turn, making the bumps pluck the metal forks to produce each note. In SAX, you create callback functions (the metal forks), and as the SAX parser (the metal drum) reads through the XML, it fires events (the bumps).

How was that for a strained analogy?

So, SAX has big problems:

  1. the burden it puts on the programmer to keep track of their location within the document
  2. the event-based model can be off-putting to programmers who are more comfortable with the typical step-by-step procedural coding

Stay tuned next week for...e4x to the rescue (sort of)

No comments:

Post a Comment