Anne van Kesteren

`xml:id`

15 September 2005

I have written before about this attribute in the past. Now it is a recommendation and it already has quite some implementations. No browser vendor has yet taken the time to implement it though at least Opera and Mozilla have plans to do it. As Norman Walsh notes there are some problems in combination with Canonical XML, which was the main reason for some people to object to the introduction of the attribute. (He explains the issues, don't worry.) Fortunately those issues were resolved as Canonical XML being broken by design and therefore xml:id could advance to the final stage in the W3C process.

xml:id removes the need for DTDs declaring attributes of type ID. Another way to avoid DTDs for declaring attributes of type ID is using namespace specific bindings. For example, for all elements that can validly appear in the XHTML namespace (http://www.w3.org/1999/xhtml) you can specify an attribute with name id that must be recognized and treated as it was of type ID. That means that you can use getElementById to retrieve the element node it was specified upon or use the CSS ID selector to style it. The same applies to SVG and MathML and the current sXBL draft. They all define an attribute named id which must be treated as type ID. (Future versions of SVG are going to reuse xml:id. XHTML 5 and XHTML 2.0 might do the same.)

The advantage of xml:id is of course that it applies to any element node, whatever namespace (or not in a namespace, even) it is in. This does not entirely remove the need for DTDs yet unfortunately, we still have entities. However, those are only relevant for XHTML and MathML. New languages are not likely to introduce new ones. Perhaps they will even be standardized in some way. Tim Bray had a proposal for that once.

Comments

I like how there is no particular point to this story :). It is just that, a story, heh... Or at least to me it seems like that. Now I know what Canonical XML is, too ^_^.
~Grauw
Posted by Laurens Holst at 4:07AM
By the way, that article by Tim Bray is interesting, however... UTF-8 its design is so extremely cool. It takes everything into account, with regard to easy processing in an 8-bit context, and backwards compatibility with existing 8-bit applications. There are tons of little fun things. For example, a 0 (or actually all characters in the range 0-127) is always code point 0, so the end of a 0-terminated string can always be found by simply scanning for a 0 without special treatment of multi-byte sequences. Vice versa, if a 0 appears in a multi-byte encoded sequence, it can never be mistaken to be part of the sequence. And if you index into (or split up) a string and end up in the middle of a multi-byte sequence, it is recognisable and you can skip ahead to the next ‘real’ character.
Unfortunately, Tim Bray’s solution breaks with that last one. If you have ‘xxx & yyy’, and you index to byte 5, the end result will be ‘amp; yyy’ and not ‘& yyy’ or ‘ yyy’. So it might then be called ‘UTF-8+names’, but it isn’t really in the spirit of UTF-8. Doesn’t necessarily matter, as long as it’s practical, of couse.
~Grauw
Posted by Laurens Holst at 4:37AM
(Future versions of SVG are going to reuse xml:id. XHTML 5 and XHTML 2.0 might do the same.)

I think you mean HTML 5, right? But HTML is not an application of XML, so it shouldn't/can't use xml:id IMO. :-S
Posted by minghong at 8:29AM
No, I meant XHTML 5. As you can clearly read in the specification, it also extends the http://www.w3.org/1999/xhtml namespace. Which is quite obvious, as browsers tend to implement some parts of the elements in HTML and XHTML in a common way.
Also, you can use xml:id in text/html documents, through the DOM. I was planning to write a story about that later on.
Laurens, does his proposal work in that way? Or does his proposal work with a set of known entities? I thought the latter and in that case nothing would break.
Posted by Anne at 2:26PM
Laurens, you are right. Perhaps this is not so optimal after all. Perhaps a specification should be developed for predefined entities from XHTML 1 and MathML 2 and XML parsers should be required to implement it.
Posted by Anne at 12:29AM