Anne van Kesteren

Extending a markup language

22 February 2005

Let’s say you created FooML. FooML has it’s own insignificant namespace, http://example.org/2005/02/22#foo-ns, and a couple of elements. Now after you and others gained some experience with FooML you want to add some new features to the markup language and make some changes to the existing elements. You decide that the element BAR must be a child of the BAZ element and documents that differ from that are non-conformant documents. Or so to say, invalid. Since your language is already in the running for a couple of months and heavily used there is a lot of legacy content that promptly becomes invalid. People blame FooML.

(The fact that the namespace looks quite new and is actually based on today’s date does not make the story less relevant. If you think it does, substitute 2005 with 2004. Thanks.)

It is quite obvious that FooML misses some kind of versioning. However, it is quite difficult to come up with a versioning system that is solid and is agreed upon. The W3C did provide a policy for this, but specifications did not really follow it. Besides, it is questionable if that policy makes sense. Introducing a new attribute in the XML namespace might make sense. (xml-id does that. And so did xml:base.) Back to FooML. A VERSION attribute. Let’s introduce a VERSION attribute on the root element and solve this problem once and for all. We break legacy content this time, but for every future revision the version number changes and therefore software knows if it does or does not support this version of FooML. This might actually quite work, especially when you specify exactly what software should do when it encounters a version higher than it supports. For XML (please note that I’m now talking about a meta language, not a language) this is done quite effectively. When an XML parser encounters a version in the (optional, but recommend) XML declaration it does not support it should stop parsing. The only flaws it has is that the XML declaration is optional and that a parser could therefore accidentally parse an XML 1.1 document while it only is a XML 1.0 parser.

There is another way. I personally prefer this one and at the moment I can not really recall the arguments why I preferred this one, but I can at least explain it. The first thing you is giving the language you are developing it’s own MIME type. In our case, that will be application/fooml+xml following conventions created in RFC 3023. (As a side note. This RFC is currently being updated to obsolete text/xml and text/xml-external-parsed-entity. Which is very nice of course, but not really relevant.) The second thing you do is creating a dated namespace. Perhaps I added to many figures, but you get the point. There is no third thing, just a bit more explanation. For every change you make to your language in the future you are going to look at a couple of things. More specifically, how backwards compatible is my change? If it is not backwards compatible, your language needs a new namespace. If your change is backwards compatible, you can easily add it to language. Of course, when you adopt such a versioning system it might be wise to document it and tell implementers exactly what they should do with unrecognized elements, attributes et cetera. (This also allows extensions, although you probably want to specify that you can only extend the language using new namespaces for the added elements and or attributes. Or, the horror, for new added values. O yes, read all about QNames. They are evil.)

If your language is already doing this. Good, all the better. If your language is using non of the above, I would love to hear your story.

Comments

The only flaws it has is that the XML declaration is optional and that a parser could therefore accidentally parse an XML 1.1 document while it only is a XML 1.0 parser.

The XML declaration is only optional for XML 1.0 documents, so a parser cannot accidentally parse a 1.1 document as 1.0 unless it is a non-conformant parser. An author could, of course, neglect to identify the version of a 1.1 document by accident, but you can't fault the system because of easily avoidable human error. It's not an unreasonable requirement.
Posted by J. King at 12:08PM
I just read that if an author does that, it automatically becomes an XML 1.0 document which is good. I could have known they have taken care of that problem. O well.
Posted by Anne at 5:15PM
You most certainly should not create a MIME type in the IETF tree. In this case you would need to choose the appropriate subtree depending on your situation, probably either the vendor or personal trees.
Posted by mike at 12:33AM