Anne van Kesteren

DOM: attributes sadness

19 August 2014

I have been reinstating “features” related to attribute handling in DOM. We thought we could get rid of them, but usage counters from Chrome and compatibility data from Gecko showed we could not. This is very sad so I thought I would share the pain.

A simple design for attributes would consist of each having a name and a value (both strings) and a simple map-like API on every element would be sufficient to deal with them. The getAttribute(name), setAttribute(name, value), and removeAttribute(name) methods. As well as a way to iterate through the names and values.

However, back in the day getAttribute(name) was required to return the empty string rather than null for a missing attribute, so hasAttribute(name) also exists. Fixing the specification to make getAttribute() return null was highly controversial back then. I even misguidedly ranted against developers who were making use of this feature as it prevented Opera from becoming standards compliant. “Please leave your sense of logic at the door, thanks!” was not a popular phrase back then.

Unfortunately namespaced attributes are a thing. And instead of simply adding a namespace field to our existing name and value, a namespace, namespace prefix, and local name field were added. Indeed, the local name is not necessarily equal to the name of an attribute. The idea was to have some kind of modality where before namespace and after namespace attributes would not really interact. That never happened of course. To deal with namespaces we have getAttributeNS(namespace, localName), setAttributeNS(namespace, name, value) (indeed, name, not localName, so bad), removeAttributeNS(namespace, localName), and hasAttributeNS(namespace, localName).

The real kicker is that the first four methods ignore the namespace fields, but can create attributes you cannot access with the *NS methods. There is no universal attribute API, though if you stay clear from namespaces everywhere you are probably mostly fine (except perhaps with SVG and such).

This was still too simple. There is also attributes which returns a NamedNodeMap (only used for attributes these days). And hasAttributes() which can tell you whether that map is empty or not. These two used to be on all nodes (to limit the amount of casting in Java), but we are moving them to element since that is where they make sense. NamedNodeMap contains a collection of zero or more Attr objects so you can inspect their individual fields. The map has a length property, an item(index) method, and is implemented with some kind of JavaScript proxy so attributes.name works, as well as attributes[0]. Good times. Attr objects also allow manipulation of an attribute's value. Due to mutation observers this requires an element field on attributes to point back to the element the attribute belongs to. Namespace prefix also used to be mutable field, but fortunately this was poorly implemented and recently killed.

The real reason attributes are so complicated, and more complicated still, ignoring namespaces for the moment, are DTDs. The SGML crowd was not brave enough to cut the lifeline when they did XML. Then XML got popular enough to end up in browsers and the DOM. This meant that attributes cannot contain just text, but also entity references. And therefore attributes became a type of node. Entity references were really never implemented and we managed to remove that cruft from the platform fortunately. However, attributes are still a type of node.

The last things we are investigating is whether attributes can stop having child nodes and perhaps stop being a node altogether. Meanwhile, we had to add createAttribute(localName) on document, getAttributeNode(name), setAttributeNode(attr), and removeAttributeNode(attr) on element, and getNamedItem(name), setNamedItem(attr), and removeNamedItem(name) on NamedNodeMap back as sites use these. Oh wait, and all their *NS counterparts of course, bar removeAttributeNodeNS().

Added together, we have twenty-five methods to deal with attributes rather than three. And attributes require six internal fields rather than two. And this is assuming we can get rid of child nodes and attributes being nodes, both semi-implemented today.