Anne van Kesteren

Perfect Markup

You should be aware the following is based on my opinion, you could have your own opinions about what a perfectly valid markup up site is. Please comment if you have these thought, 'cause it's always good to see things from a different point of view.

Second thing: this post is about XML files (XHTML) so it goes about thing which are not applicable for HTML. This said I think there shouldn't be a discussion whether to use HTML or XHTML and which from the two has better markup.

When it comes to XML, it's is important that the browser gets it right and it doesn't think you are still using HTML. If you use XHTML you should give the browser the following MIME-type: application/xhtml+xml. The problem is that some browsers don't support this MIME-type. Probably the most annoying browser who doesn't support this is IE for Windows, this browser only understand text/html. This MIME-type "problem" is one of the biggest reason's authors don't want to use XHTML, since they think they need to browser sniff. Fortunately that is not true. Mozilla has application/xhtml+xml in the accept headers, so you just have to check if it is there or not. Mark Pilgrim has written some examples of this (in different server-side techniques) at the bottom of an article for O'reilly.

In my comments Sean points out that you can also send application/xhtml+xml towards the following online services/UAs:

It is up to you if you want to sniff for these, since there is not an easy solution for this. Well the solution is "easy" (you just have to sniff for these), but you may not like to sniff. If you don't like you are best of with Mark Pilgrim's solution.

Some may say that the W3C only says that you should use application/xhtml+xml: XHTML Media Types [#summary], but as Ian Hickson points out in the css-discuss list: A 'should' is as good as a 'must', really, unless you have very convincing reasons to ignore it.

Ian has a good point here, but it brings other complications with it, <?xml-stylesheet?> anyone? The W3C says about this (I quoted this before, but I'm doing it again for a better reference):

Authors SHOULD explicitly identify the XHTML namespace through the namespace declaration when they serve an XHTML Family document as 'application/xml' to facilitate the chance for reliable processing. The XML stylesheet PI SHOULD be used to associate style sheets.

Thus both sending as application/xhtml+xml and the use of the XML stylesheet PI are specified with SHOULD. Does this mean that if you use one you have to use the other? I think so, unless you have a really good reason not to. The following should also never be used, newer browsers just ignore it (Mozilla) and this can be done with either a XML prolog or a HTTP header:


The above things are the biggest changes if you are planning to use XHTML. If you are going to follow the specification a bit more, there is probably also some semantics you should/must use: like the DEL and INS elements. These can be really handy to markup document changes. If you don't want to see those elements directly you could use a user style sheet and use the following rule: del{display:none;}ins{text-decoration:none;}. I think that's enough to de-style these elements. I'm not sure if Google uses these elements, but if (or any other application), you really should use them, so Google can re-index that part of you site if it sees the something is deleted or inserted. You should not use the s element like Jeffrey Zeldman does, this is semantically incorrect and is only useful for visual UA's which will render the s element like this: s{text-decoration:line-through;.

Other semantic element like headers, the link element, abbr and acronym elements have all been "reviewed" lately.


  1. Mark points out at mezzoblue that we should also respect the q= settings

    What this means is that a UA can say I'll take this MIME type, but only if you haven't got this one. There aren't any that I know of that 'prefer' application/xhtml+xml to text/html.

    But, and this is the question I want to ask, even if there were, does it matter? This SHOULD be served as application/xhtml+xml. text/html is a hack.

    Posted by Sean at

  2. One problem I ran into recently was that IE6 drops out of standars compliant mode when a XML declaration (<?xml ... ?>) is added. A reason for me not to include it.

    BTW, I noticed that you regularly mention "Mark Pelgrim", while it should really be "Mark Pilgrim"

    Posted by karma at

  3. For me that is the reason I use the xml prolog. It makes live easier, especially the Box Model Hack. You only have to use the 'Be nice to Opera rule' and everything is fine.

    I agree with you, it should be send as application/xhtml+xml if the UA accepts it, no matter what it prefers.

    Posted by Anne at

  4. What this means is that a UA can say I'll take this MIME-type, but only if you haven't got this one. There aren't any that I know of that 'prefer'

    This is of course only applicable if you provide multiple versions of the same document. The same goes for languages... If you have a multi lungual site you kan check q for the prefered language, otherwise use the default one.

    If you don't serve different versions, you've got no choice but to serve the page as it must. Just like the fact that I can only view this page in English.

    Posted by Bas Hamar de la Brethonière at

  5. I agree with you Anne. I started to serve my pages as application/xhtml+xml recently, using a PHP script, to those browsers that accept it, and also sniffing for Opera 7 and W3C Validator. Those browsers also get the xml prolog and stylesheet PI from me, others will get text/html, no xml prolog and a meta tag with the charset. I think that is a good way of dealing with this (esp. the IE mess). BTW, I am listed with the X-Philes now :-)

    Posted by Ben at

  6. If I were to sniff for Opera 7 and W3C Validator (anything else worth sniffing for?), how would you recommend doing it in PHP? Basically, what header should I be looking at and what strings should I be looking for?

    Posted by Gary F at

  7. This is what I use:

    Posted by Ben at

  8. Thanks, Ben. I took the headers from your page, made them into Superglobals ($_SERVER['HTTP_USER_AGENT'] etc) and shoved them in. The validator is getting the right content-type, and so should Opera (although I can't for the life of me find the Page Info options).

    Thanks again!

    Posted by Gary F at