Anne van Kesteren

Mozilla FAQ on `application/xhtml+xml`

8 December 2004

The Mozilla Web Author FAQ was recently updated with two sections about application/xhtml+xml. An interesting part of Should I serve application/xhtml+xml to Mozilla? is:

However, if you are using the usual HTML features (no MathML) and are serving your content as text/html to other browsers, there is no need to serve application/xhtml+xml to Mozilla. In fact, doing so would deprive the Mozilla users of incremental display, because incremental loading of XML documents has not been implemented yet. Serving valid HTML 4.01 as text/html ensures the widest browser and search engine support.

This is something I know for a while and one of more important reasons to use HTML 4.01 (when I make the calls) on projects. However, the story continues:

There is a fad of serving text/html to IE but serving the same markup with no added value as application/xhtml+xml to Mozilla. This is usually done without a mechanism that would ensure the well-formedness of the served documents. Mechanisms that ensure well-formed output include serializing from a document tree object model (eg. DOM) and XSLT transformations that do not disable output escaping. When XHTML output has been retrofited to a content management system that was not designed for XML from the ground up, the system usually ends up discriminating Mozilla users by serving tag soup labeled as XML to Mozilla (leading to a parse error) and serving the same soup labeled as tag soup to IE (not leading to a parse error).

You might want to read the differences between text/html and application/xhtml+xml parsing as well. You can read there that you really shouldn't rely on entities except for the 5 predefined on in XML. That's also something I been telling before in for example Quick guide to XHTML and Well-Formed.

Have fun with switching to HTML. I know I do.

Comments

...incremental loading of XML documents has not been implemented yet.

This is an interesting one. What happens to a large document when half of it has loaded, or fails to load completely? With HTML, the browser can start to output it, but with XML, surely it cannot, as that would lead to invalid code. I have thought about this happening for a while but never come to a solution. Well now it seems that Mozilla might be working on one. Of course the answer is simple! Repair the document by closing any open tags, right up to the first one. This may already be happening with tag soup anyway. I'm glad to know they are working on this one.
Posted by Chris Hester at 6:54PM
Well, to my knowledge, it certainly isn't a priority to get that right (or different) from any of the hackers. There are already parsers out there that work with incremental rendering, so I guess Mozilla could base it on them instead of finding solutions of their own.
Note also that incremental rendering has nothing to do with fixing the document.
Posted by Anne at 7:25PM
Anne, it's still is a great pleassure to me, reading your weblog, and your decision to use html 4.01 in some contexts painted a large grin on my face.
You defend your postion to be one of the wisesest men on the net and about the net with humor, honor and suprise.
Posted by ben at 7:54PM
I've known about this incremental rendering thing for quite some time now, but never thought it really through. When sIFR is used in Mozilla in a XML page the initialization had to be scheduled to happen on window onload in order to work around this.
Opera doesn't need this though, so that would lead me to conclude it's using incremental rendering already.
Posted by Mark Wubben at 8:05PM
Or you conclusions are based on wrong assumptions. Perhaps Mozilla has some bug with initialization. I can assure you there is no commonly used browser that does icremental rendering.
Posted by Anne at 8:13PM
Well, I suppose I could have explained this better. sIFR allows you to initialize before the body tag has been closed. Without incremental rendering this means the page hasn't been styled yet, so sIFR won't work properly. In Opera this is not the case.
(It could also be that the test file isn't treated as XML by Opera, I'll have to look into that.)
Posted by Mark Wubben at 8:16PM
Thanks for the publicity.
For the record, the lack of incremental loading is not a parser limitation. Mozilla uses expat, which reports parsing events to the app as it parses. (The expat API is similar to SAX.) Bug 18333 puts the blame on the content sink (the thing that receives the parse events).
Posted by Henri Sivonen at 9:53PM
FYI, last time I checked (nested) tables don’t load incrementally either. Yet, those are still commonly used by many web sites.
Have fun with switching to HTML. I know I do.

If you just say this for the shock effect: that has been done before. This is reeking a bit like Dive Into Mark. No offense, of course, I just hope you're not referring to this site. Because that would be a shame, it works perfectly the way it does now, and XHTML is cool. The incremental loading thing will be fixed one day and until then I'd hardly call it a big issue, as apparantly no-one has really noticed it up till now.
~Grauw
Posted by Laurens Holst at 12:24AM
incremental loading of XML documents has not been implemented yet.

Of course there are ways of writing an XML parser that can display a half-loaded document. And individual pages shouldn't be so large anyway. So, what's the big deal?
And about the CMS problems, of course you will have them, unless you're using something like Fidelis or a hacked version of Wordpress. I don't see this site (Anne's weblog) crash in Mozilla?
What Mozilla is saying is basically this: Don't use application/xhtml+xml in a CMS not properly designed for XHTML on a site you're not testing in Mozilla regularly.
I don't see them saying that you shouldn't use application/xhtml+xml at all. And they're also not saying that you shouldn't use XHTML sent as text/html (and there are some valid reasons for doing that).
Posted by Charl van Niekerk at 12:32AM
To elaborate on that last message...
For an example with the tables vs. incremental loading, try msx.org on a slow connection or on a ‘Slow Server Day’. It will only show the navigation, and once the entire content is loaded, it ‘snaps’ into place. I don’t know the exact mechanism, but basically I think it doesn’t show a table cell (or row) until it’s fully loaded. If there’s another table inside, it will wait until that’s loaded.
Now, as I said, there’s tons of sites like that out there, and incremental loading is not an issue for any of them. So I don’t think this ‘discovery’ really changes anything for XHTML. Especially since this is a bug, which can and will be fixed at some point in the future.
~Grauw
p.s. (steekje onder water) What's next, will you be abandoning your hobbies involving angle brackets? :)
Posted by Laurens Holst at 12:33AM
Oh Lord, how this argument about math being the only reason to use XHTML bores me to death. What a ridiculous notion. It will fade, like all stupid fashions/semi-religions of the day.
Posted by Moose at 12:41AM
Anne: With entities, did you meant character references?

Numeric character references are safe, too

Mozilla advises to send documents as text/html. The two main reasons are: 1) there's no incremental loading yet and 2) many documents aren't well-formed. But I don't see a line in which they say not to use XHTML. My advice: use XHTML and serve as text/html. Validate with application/xhtml+xml. Now you have the best of both worlds.
Posted by Michiel at 12:47AM
Of course there are ways of writing an XML parser that can display a half-loaded document.

As stated above, it is not a parser issue.
And individual pages shouldn't be so large anyway. So, what's the big deal?

Have you tried dial-up lately?
And about the CMS problems, of course you will have them,

It is possible to produce well-formed markup progammatically. The FAQ even mentions two methods. Seriously, piecing together markup as strings is not the best way.
What Mozilla is saying is basically this: Don't use application/xhtml+xml in a CMS not properly designed for XHTML on a site you're not testing in Mozilla regularly.

Actually, the point was suggesting that producing XML with unsuitable tools fails. It fails occasionally even for people who really try hard to get it right if they are using WP or MT and not XML tools. And no, people who use content management systems do not keep testing their own content even if they care.
I don't see them saying that you shouldn't use application/xhtml+xml at all.

The FAQ says pretty clearly that one should use application/xhtml+xml for XHTML+MathML.
And they're also not saying that you shouldn't use XHTML sent as text/html

Hmm. Perhaps a link to Sending XHTML as text/html Considered Harmful is needed after all. :-)
(and there are some valid reasons for doing that).

Such as?
Posted by Henri Sivonen at 1:14AM
Hmm. Perhaps a link to Sending XHTML as text/html Considered Harmful is needed after all. :-)
- Escaping script and style elements is stupid. Lynx-clones don't need it, embedded web browsers don't need it,Netscape 4 doesn't need it, and IE doesn't need it. It's a waste of bandwidth.
- I know how style cascading works and all my tagnames are lowercase.
- I seldom touch Javascript, and I'd kill myself before using document.write().
- Aside from Emacs, browsers either ignore /> or interpret it the XHTML way.
- I validate. Therefore, this document probably does not apply to [me].
GOTOs are harmful because their use results in write-only, bug-prone, unmaintainable code. Sending XHTML as text/html does not result in any of that. Though I admire Ian Hickson a great deal, I have to respectfully disagree with him on this.
Posted by Leons Petrazickis at 2:00AM
With entities, did you meant character references?

No, I meant entities. Character references work fine in any XML document, entities from HTML or XHTML don't.
Posted by Anne at 3:21AM
As stated above, it is not a parser issue.

Of course, there are various ways of bypassing this issue. Fixing the code before it is sent to the parser, or whatever. That isn't the point.
Have you tried dial-up lately?

Anybody still uses dial-ups? :-)
It is possible to produce well-formed markup progammatically. The FAQ even mentions two methods. Seriously, piecing together markup as strings is not the best way.

You can make a plan to get your XML well-formed, but at the end you'll still need a good CMS to handle generating valid XHTML.
Actually, the point was suggesting that producing XML with unsuitable tools fails.

Basically, yes.
It fails occasionally even for people who really try hard to get it right if they are using WP or MT and not XML tools.

Very true, just remember that you can create some horrors in plain old HTML too that will mess a page up in any browser. That's why there's no excuse for not testing in various browsers.
And no, people who use content management systems do not keep testing their own content even if they care.

Like I said, it's necessary anyway and there's no excuse.
The FAQ says pretty clearly that one should use application/xhtml+xml for XHTML+MathML.

That's one reason to use it, yes.
Hmm. Perhaps a link to Sending XHTML as text/html Considered Harmful is needed after all. :-)

Henri, I have read that article months ago. And mind you, I agree with some of it, but not with all of it.
(and there are some valid reasons for doing that).
Such as?

Forcing the user to create neater markup. If you've ever marked up some complex documents, you'll realise how difficult it is to keep all of your tags sorted. HTML is not strict enough, and more strictness helps in debugging.
But of course, this is an authoring issue, not a browser issue.
Posted by Charl van Niekerk at 3:17PM
As stated above, it is not a parser issue.

Of course, there are various ways of bypassing this issue. Fixing the code before it is sent to the parser, or whatever. That isn't the point.

Incremental rendering does not require fixups before parsing. When a DOM tree is built from SAX-like parse events, the tree is in a state that is equivalent to the document tree of a well-formed document after each parse event callback once the root element node has been created. (The root element node is created when the start tag of the root element has been seen.)
Forcing the user to create neater markup. If you've ever marked up some complex documents, you'll realise how difficult it is to keep all of your tags sorted. HTML is not strict enough, and more strictness helps in debugging.

You could use Page Valet in the Fussy Mode. The W3C Validator is over-hyped.
Posted by Henri Sivonen at 3:46PM
Funny, because with application/xhtml+xml there is no need for incremental rendering, in my experience (except perhaps on slow dialup when encountering giant pages): using application/xhtml+xml makes pages show up much faster than they do when using text/html.
And anne, are you ever going to get over your silly HTML-fad?
Posted by Faruk Ates at 6:01PM
Incremental rendering does not require fixups before parsing. When a DOM tree is built from SAX-like parse events, the tree is in a state that is equivalent to the document tree of a well-formed document after each parse event callback once the root element node has been created. (The root element node is created when the start tag of the root element has been seen.)

Yes, but then it is a parser issue because for a parser to behave like this you need to edit the parser. But I'm probably missing something.
You could use Page Valet in the Fussy Mode.

Thanks, that looks good. Will try it later.
The W3C Validator is over-hyped.

Definitely. And it's full of bugs too. ;-)
Funny, because with application/xhtml+xml there is no need for incremental rendering, in my experience (except perhaps on slow dialup when encountering giant pages): using application/xhtml+xml makes pages show up much faster than they do when using text/html.

Faruk, you've just got a big fan here. :-)
As far as I know the point has always been to keep your pages so small that this wouldn't ever be an issue (even when using a dial-up). If individual pages are so long, they should probably be broken down. Rather fix the cause than the symptoms.
And anne, are you ever going to get over your silly HTML-fad?

My question too. But knowing Anne, probably not. ;-)
Posted by Charl van Niekerk at 6:24PM
The issue here is that a well-formed XML document can, by the nature of the internet and the transmission of data over modems, be received in bits. Therefore the browser must be able to deal with this situation. Ever seen a page where most of it loads, but you're stuck waiting for that last image or block of text? This happens even on broadband. The net is only as fast as the servers can run. So it is conceivable that the browser may use a method to 'repair' a partly-loaded XML document before it has fully loaded, in order to display it. Otherwise, no-one will use XML (including XHTML sent as application xhtml/xml) if it means a complete blank page until all the page has loaded. (If this is not what's meant by 'incremental loading', forgive me Anne.)
Dial-up is still in vast use. Think about third-world countries. Also XML documents, by their very nature, tend to be large - imagine the database of a big company. I doubt such a file would ever load in one go. One reason why some of us have abandoned XML in favour of raw delimited data using text files. No need for all those wasteful tags unless you're sharing the data with everyone else. And probably faster parsing too.
I have seen Opera display nested tables before a page was loaded. It stopped before all the cells in a row were complete. I remember seeing the background colour show through in the last cell on the row. Opera is able to render a page as soon as it gets it, which is wonderful. Of course you can delay this in the settings if it causes messy results. I have also seen Mozilla show PHP-generated tables before they were finished.
Posted by Chris Hester at 6:28PM
Think about third-world countries.

In a third-world country here. On broadband.
Here is how it works in Africa: Either you're sitting in a country such as Zimbabwe which is so totally retarded that you can't even buy a dial-up with a chest full of money, or you're sitting in a country such as South Africa where ADSL is readily available.
Recently, South African Telecom ("Telkom") put up an advertisement about a surgeon sitting in Cape Town operating on a patient in Tanzania by using ISDN technology. Now there's big trouble - it's all a lie because Tanzania doesn't even have ISDN capability.
Anyway, this is going completely off-topic.
Posted by Charl van Niekerk at 8:22PM
Yes, but then it is a parser issue because for a parser to behave like this you need to edit the parser. But I'm probably missing something.

See comment #7.
So it is conceivable that the browser may use a method to 'repair' a partly-loaded XML document before it has fully loaded, in order to display it.

No need to repair. See comment #17.
Posted by Henri Sivonen at 12:38AM
Chris, I really have to disagree on the importance you give to incremental loading. That aside, Charl has a good point about that becoming an issue being a sign that you might have to rethink your pages for a bit :).
About XML supposedly ‘by their very nature’ being heavy in size, that is nonsense. There are many uses for XML, and especially when it is used for web pages I see no reason why the files wouldn’t be small like any other web page. XML files used by software to store data in are usually not meant to be presented online, and that aside it’s also common to have several small XML files (at my job we have both). By the way, if one really needs to store a serious amount of data, the best option is to just use database software like MySQL, it has all kinds of optimizations for faster searching etc.
The big advantage of XML is that it’s interoperable and has many parsing and transformation tools available, and additionally is in human-readable format so that you can easily manually edit it. For that reason, I can see no logic at all in abandoning XML in favour of some limited, unknown, badly structured text-based format... So what if it's a few bytes more, since when do we have to use our harddisks of 60MB again? In this day and age, ‘byte-nibbling’ is usually just a waste of time (though I'm not saying it shouldn't be paid attention to).
As for those 3rd world countries... I'd say that especially among dial-up users the IE share is above average, and as Microsoft Internet Explorer perfectly well renders the page incrementally, I'd say that problem is taken care of for now. If you insist, I have no moral objections against hereby officially recommending users of dial-up connections who find the loading time of my webpage an issue to use Internet Explorer or Opera to browse it :). Well, actually... can I take that back?
Finally, about incremental loading of tables, you may have seen it working, which means that it works in some cases (never said it didn't), but when people use more complex table layouts like with nested tables as I mentioned, they definitely don't load incrementally (in Mozilla, at least). Try AnimeNfo, or search for ‘Gundam’ on AnimeNewsNetwork. On a slow connection or slow server day, of course. Go download a lot of anime fansubs from a peer-to-peer network to slow your connection down to a crawl or something, that's a good test :).
~Grauw
Posted by Laurens Holst at 4:47AM
Well, I switched-off my UMTS connection, connected my computer via BT to Siemens SX1 with GPRS, cleared Firefox 1.0 cache, and looked at my website. Surprise! It is XHTML 1.1, supplied as application/xhtml+xml, and it clearly rendered main text first, menu on the right being delayed for a second or so. I am not so sure Firefox does not support incremental rendering ;)
Also, application/xhtml+xml has one big advantage for myself, it is a "debugger", an easy one, how to keep your site valid all the time (and thus better accesible and also better rated by search engines).
Posted by Radek Hulán at 5:20AM
Valid doesn't equal better rated and more accessible.
Posted by Anne at 10:06PM

Anne van Kesteren

Mozilla FAQ on application/xhtml+xml

Comments

Mozilla FAQ on `application/xhtml+xml`