Anne van Kesteren

Why do we need MicroXML?

21 December 2010

James Clark — known for XSLT and RELAX NG — proposes MicroXML on his blog. It is not replacing XML, HTML, or JSON, but is intended to become yet another format. It should be simpler than XML in the sense that XML was simpler than SGML and it should have some convergence with HTML5.

Dropping a few features from XML would certainly make it less complex. To users and implementors. E.g. dropping the internal subset from DOCTYPEs would half the amount of tokenizer states. But I am not convinced that is really worth it. Going to XML from SGML made sense. Nobody managed to implement SGML fully. Implementing XML, while non-trivial, has been done a fair number of times.

With the XML syntax of HTML5 — XHTML5 — James points out two problems. Not having incremental rendering in Gecko and XHTML not working in Internet Explorer. Now adding a new language to the ecosystem, because fixing existing bugs is not deemed important enough, seems backwards. Furthermore, Internet Explorer will soon support XHTML and Gecko already has incremental rendering of XHTML. As do other browsers, such as Opera and Chrome.

Whenever the debate comes around to HTML5 and XML there seem to be two important issues. HTML does not support XML namespaces and XML does not support HTML-style error handling. Neither of these is being addressed by MicroXML. Namespaces is classified as something that may warrant looking into later and HTML-style error handling is said to be something a companion specification could address.

I do not think XML is sufficiently complex to warrant a new language. If we should do anything, it should be XML5. Replace XML with XML that has HTML-style error handling — though a lot simpler — and is backwards compatible with XML 1.x and Namespaces in XML 1.x as long as there is no external subset.