Thursday, May 8, 2008

e4x to the rescue (sort of)

Note: This is the follow up to a old, poorly-written, rant on e4x where I used the wrong terms and left huge holes in my arguments. I can only hope to do so poorly this time.

A simpler DOM

Many developers (myself included) tried to scab together our own object-ish XML processing. Usually this meant converting the XML into a big "object tree" where the names of the object properties were the same as the XML nodes. This would allow us to turn:

<root><foo>hello</foo></root>

into:

root.foo // returns "hello"

And when we did this, we quickly ran into problems:

  • Attributes: What if an attribute had the same name as a child node?
  • Collections: It would be nice to skip array-like access for nodes that only have a single child, right?
  • Naming collisions: How do we make sure that the properties and methods that we need to have on a node don't conflict with the names of the child nodes? For example, what if a node is named "children?"

e4x bring object traversal to the masses

So e4x comes along, and it looks a lot like that stuff that developers had been scabbing together. It added some nice syntax for "querying," and they prefixed the attributes with an @ symbol -- so we don't have to worry about attribute names and node names colliding -- but otherwise it's mostly the same.

The two places where they dropped the ball were on collections and the naming collisions. Let's ignore the collection problem for now and talk about their "solution" to the naming collisions that can occur between methods/properties and node names. They decided that all methods/properties of a node (children, length, toString, etc.) would be accessed as methods, requiring parentheses, and all the child nodes would be accessed as properties; no parentheses.

It's too bad that they didn't just use a prefix to distinguish between traversing xml nodes and methods/properties. Much like the attributes are prefixed with the @ symbol, I wonder why they didn't use some prefix (a # sign, perhaps) for xml nodes.

This is a big problem because it creates an inconsistency with most languages, because they follow a rarely discussed rule-of-thumb (see the When to Use Properties vs. Methods section)...

* methods/functions are verbs: perform an action
* properties are nouns/adjectives: read/write values

Let's look at some Javascript for example:

* String.length
* Document.links
* Window.parent
* String.toUpperCase()
* Document.write()
* Window.close()

Now let's see if e4x follows that same rule:

* XML.name()
* XML.length()
* XML.parent()
* XML.replace()
* XML.insertChildAfter()
* XML.contains()

Nope.

And what happens when you try to implement e4x in a language like Ruby? Ruby has no properties and parenthesis on method calls are optional.

In case you missed it, I had a bug where I called node.children, instead of node.children() because I relied on that rule-of-thumb. It served as another fine example of why consistency is so critical to usability.

Random light switch trivia:
In the United States it is universal for the "on" position of a toggle switch to be "up", whereas in the UK, Australia, and New Zealand it is "down." -wikipedia



Tom - May 9th, 2008 9:51 am

Good points. As you pointed out, the issue comes down to XML element name mapping in the object graph. Since they went down the road of using the exact names from the XML, they needed to make the built-in properties be methods to avoid name collisions with elements that might be named "parent" or "name", etc. (which are probably commonly found in XML).

e.g. given <foo><name>bar</name></foo>, x.name is different than x.name().

But you could also make the argument that constantly writing x.#foo.#bar is less convenient (and hence less usable) than x.foo.bar. It's really more of a trade-off for consistency in attribute/method conventions vs. the readability of the code/API.

No comments:

Post a Comment