Anne van Kesteren

XHTML5

In the past I wrote quite a lot about HTML5. Apparently not much people understood it also extends XHTML 1.0 so you can its semantics in XML or the more backwards compatible HTML variant. Browser makers do care about XML. So if you would have a document containing some header and a paragraph there would basically be two serializations. One for XML:

<html xmlns="http://www.w3.org/1999/xhtml">
 <head>
  <title>… </title>
 </head>
 <body>
  <h1>…</h1>
  <p>…</p>
 </body>
</html>

… and one for HTML:

<!DOCTYPE html>
<html>
 <head>
  <title>…</title>
 </head>
 <body>
  <h1>…</h1>
  <p>…</p>
 </body>
</html>

Some end tags will still probably still be optional in the HTML serialization. Also, thanks to the HTML parsing section the DOM generated from an HTML document might resemble more closely that of an XHTML document having namespaces and such. There are some advantages in using the XML serialization. You can have table elements and lists within paragraphs where that is not possible in the HTML serialization due to parsing. So (X)HTML5 is a language which you can write down using either an XML or HTML serialization where the XML serialization has some advantages over the HTML one.

Comments

  1. I'm not sure what you mean by saying that these are two different serializations of the same "language." If you can have a <table> as a child element of a <p> in one, but not the other, then it seems to me that these are different languages with different semantics for the <p> element (among other things).

    Also, thanks to the HTML parsing section the DOM generated from an HTML document might resemble more closely that of an XHTML document having namespaces and such.

    Can (elements from) namespaces other than the HTML namespace appear in the DOM? Or are all elements placed in the HTML namespace?

    Posted by Jacques Distler at

  2. The HTML serialization would be a subset I guess. But yeah, that might effectively rule it out of that definition. Regarding elements that go into the DOM. They will all be placed in the http://www.w3.org/1999/xhtml namespace. xml:lang for example will not be treated as if it was in http://www.w3.org/XML/1998/namespace namespace. (Which introduces some problems that have to sorted out as colons can’t actually appear in such places.)

    Posted by Anne at

  3. But, what’s the point in ‘XHTML 5’ if you can’t serve it on the web? Because last time I checked, the no. 1 UA still doesn’t support the application/xhtml+xml MIME type, and serving as text/html with compatibility guidelines is supposedly bad. At least, that’s what some people keep loudly clamouring.

    ~Grauw

    Posted by Laurens Holst at

  4. The point is that you can use the semantics within XML. Of course, for being compatible with the web means you should probably use the HTML serialization.

    Posted by Anne at

  5. Anne, it took me some time, to follow your ideas on html, xhtml and xml on the net. But - as always since I first found your blog - your goddamn right.

    So just that way: thanks a lot. Your work is highly inspiring and enlightening.

    Posted by ben at

  6. And what if the W3C HTML WG decides to release XHTML2 in the XHTML namespace? It’s their namespace, after all, and I’d personally applaud such a move. However, there could then be possible conflicting definitions between XHTML2 and HTML5, e.g. of <section>. So what does that mean for semantics?

    ~Grauw

    Posted by Laurens Holst at

  7. Please don't call it XHTML 5.0. There is likely to be an XHTML 3, 4 and 5 from the W3C and this will only cause confusion.

    Posted by Chris Hester at

  8. Laurens and Chris, no need to make duplicate comments. Laurens, I think that’s unlikely to happen but the goal of WHATWG is to get those “extension” standardized at some point and I’m sure that whenever that happens some coordination will take place. Note that the http://www.w3.org/1999/xhtml namespace was already extended by browser vendors with elements like blink, marquee and all that. Now we’re making semantic extensions on it as the W3C doesn’t want to keep things easy and backwards compatible.

    Posted by Anne at

  9. For webbuilders making a new website, what would be the best choice of HTML language?
    Be conservative and stick with HTMl 4.01? Use XHTML 1.0 strict with MIME-type text (wrong?) What would you say?

    Posted by Joop Laan at

  10. For webbuilders making a new website, what would be the best choice of HTML language?

    The best option is to use content negotiation to serve XHTML 1.0 Strict as application/xhtml+xml to browsers which support it, and HTML4.01 as text/html to others (just IE at this point, basically).

    If you can’t do that, but you need to cater to IE, then stick with HTML4.01 sent as text/html.

    Posted by Aristotle Pagaltzis at

  11. The best option is to use content negotiation to serve XHTML 1.0 Strict as application/xhtml+xml to browsers which support it, and HTML4.01 as text/html to others (just IE at this point, basically).

    What do you gain by doing so compared to serving HTML 4.01 as text/html to everyone?

    Should I serve application/xhtml+xml to Mozilla?

    Posted by Henri Sivonen at

  12. Ah, I said that already :).

    ~Grauw

    Posted by Laurens Holst at

  13. Do I lose anything by doing it that way? Any system I build is all XML in the bowels. Deriving HTML from that takes an extra step, and I only make the effort as a sop to IE users. If it weren’t for them, I’d serve application/xhtml+xml to everyone and spare myself the trouble.

    Posted by Aristotle Pagaltzis at

  14. I've been thinking about this since your post a few days ago and I see some potential problems, from an XML viewpoint....

    1. Why would I want to make an XML file that uses a bunch of elements that ARE NOT in the http://www.w3.org/1999/xhtml namespace, and bind them to that namespace? From an XML point of view, that doesn't make the languages compatible.
    2. How would any UA know whether a file is XHTML 1.x by the W3C or (X)HTML 5 (or even HTML 3.x from HTML 5 if one uses text/html) if the file has no DTD? (a UA could be a browser, XSLT file, PHP program, search engine spider, etc, etc)

    I'm asking these questions, because I want to support it. These two questions are what's stopping me. I can't see why HTML 5 would be the smarter choice.

    I know I don't like the idea of XHTML 2.0, with it's 100 namespace dependencies. I don't have time to join the mailing lists and really get into things there. I'm starting to like the idea of Web Applications 1.0 since your previous post about it, I just need to know it's not going to make a confusing mess of my XML files. I must be missing something, because it seems the HTML 5 writer(s) don't quite grasp the usage of namespaces.

    Posted by Devon at

  15. Just a reminder because there is something which is always surprising in this kind of discussions. Anne said:

    Now we’re making semantic extensions on it as the W3C doesn’t want to keep things easy and backwards compatible.

    Anne, as an employee of Opera and participating to W3C WG, you are the W3C. You are not alone being the W3C, but all Members make what the W3C is and decide, including Opera.

    Posted by karl at

  16. Do I lose anything by doing it that way?

    You lose incremental rendering in Firefox and Opera. You also lose DOM API compatibility. (You have to use namespace-unaware methods in IE and namespace-aware methods in XHTML-aware browsers.)

    Any system I build is all XML in the bowels. Deriving HTML from that takes an extra step

    That was not a big deal in systems I have built. And you are doing it for IE, so you have the code.

    Posted by Henri Sivonen at

  17. Anne, as an employee of Opera and participating to W3C WG, you are the W3C. You are not alone being the W3C, but all Members make what the W3C is and decide, including Opera.

    Doesn’t WHAT WG exist, because the W3C membership voted Opera Software and Mozilla Foundation down and did not want this stuff to be worked on within the W3C?

    Posted by Henri Sivonen at

  18. karl, our proposals (made together with the Mozilla Foundation) for extending upon HTML, DOM and CSS got unfortunately dismissed at some workshop. It seems the W3C is partially doing it now anyway though. (And yeah, I’m part of those WGs. More on that later I hope.)

    Devon, your first question is a bit vague. Are you talking about the new elements in the that namespace, like header? You would want that for richer semantics. Eventually this stuff is going to be supported by search engines, browser vendors, et cetera.

    DOCTYPEs are not helping with distinguishing languages. (They only help the browser with whether or not to swtich to strict parsing and rendering mode.) Using namespaces is even worse as the xhtml2:code and xhtml:code have essentially the same semantics (you could use either one), but from a namespace point of view they are totally different elements. Sucks.

    Identification is really a non-issue is down-level clients will just ignore the elements they don’t recognize.

    Posted by Anne at

  19. But, what’s the point in ‘XHTML 5’ [...] serving as text/html with compatibility guidelines is supposedly bad.

    Not just "bad" but wrong, at least for "XHTML 5". RFC 2854 specifically allows XHTML 1.0 to be transmitted as text/html, not any other version of XHTML. It also states:

    XHTML documents (optionally) start with an XML declaration which begins with "<?xml" and are required to have a DOCTYPE declaration "<!DOCTYPE html".

    Posted by Jim at

  20. The Web Apps ("(X)HTML5") spec specifically says that sending XHTML5 as text/html is non-conformant. Basically, HTML5 must be sent as text/html, and XHTML5 must be sent as an XML MIME type.

    Posted by Ian Hickson at

  21. karl, our proposals (made together with the Mozilla Foundation) for extending upon HTML, DOM and CSS got unfortunately dismissed at some workshop. It seems the W3C is partially doing it now anyway though. (And yeah, I’m part of those WGs. More on that later I hope.)

    Let me rephase if you authorize me, and correct me if I'm wrong and it's not by malice, but just by need to understand the positionning. Is this would be exact?

    karl, our (Opera, Mozilla Foundation, Apple which are organizations/companies part of W3C) for extending upon HTML, DOM and CSS got unfortunately debated and created conflicts with other W3C Members at one workshop on Web apps.

    The thing is that the venue for W3C decisions made by AC Representatives (Advisory Committee) is the Activity Proposal.

    W3C starts an Activity based on interest from the Members and Team. W3C Members build interest around new work through discussions among Advisory Committee representatives, Chairs, and Team, and through the Submission process. The Team tracks Web developments inside and outside W3C, manages liaisons, and organizes Workshops.

    I'm just telling this because it seems a bit schizophrenic to me, as in saying the W3C (our, we) has been dismissed by the W3C. I would be perfectly ok to hear that there are conflicts between W3C Members on the future of HTML. That makes perfect sense, and it's part of the work, like in any social groups.

    On top of that, I really hope the W3C Members will find a solution to work together and produce something which encourage interoperability on the Web.

    Posted by karl at

  22. Karl, as you well know, an activity proposal only makes sense if there isn't already an activity for the proposal. There's already an activity for HTML... it's called the "HTML activity". As the W3C said when we contributed Web Forms 2 as a member submission:

    As an extension to HTML, Web Forms 2.0 has a clear relationship to the HTML Activity.

    [...]

    Furthermore, the consensus opinion of the members within the W3C HTML and Forms Working Groups is that the HTML 4.01 specification should not be extended in the way Web Forms 2.0 suggests.

    What we're proposing is to drop XHTML2 and replace it with HTML5. The concensus of the W3C is to do the opposite. That's why we're not doing this at the W3C.

    We have found a solution to work together and produce something which encourages interoperability on the Web. It just happens that the W3C isn't part of it.

    Posted by Ian Hickson at

  23. Ian, your message is perfectly valid and clearer than everything I can often read here and there may seem confusing for readers outside of W3C life. I would prefer sometimes that people identify the "we", but it's a question of language I guess. In my own deformation ;) I would say: You MUST give the classes of products, as you just did for W3C HTML WG.

    Posted by karl at

  24. Do I lose anything by doing it that way?

    You lose incremental rendering in Firefox and Opera.

    Opera does support incremental rendering of content served as application/xhtml+xml.

    Posted by Martin at

  25. Opera does support incremental rendering of content served as application/xhtml+xml.

    Interesting. I tested with Hixie’s old test case which uses text/xml and Opera 9.0 build 3104 on OS X showed clearly different behavior between the HTML and XHTML cases.

    Posted by Henri Sivonen at

  26. Henri, we had some regression in Opera 9.0 on that due to switching to our own XML parser if I remember correctly. My Windows build (latest) renders the html and xml testcase there the same.

    Posted by Anne at

  27. it's an amazing debate

    I'm still left in the dark as to why the future X serializations shouldn't be served as text/html...

    I mean, I do understand the rationale behind it, but hate the frustration: on one hand I really appreciate the extra enforcements of X, on the other I need whatever I deliver to be backwards compatible with 10yo legacy browsers.

    Do you think this is likely to change? As in, XHTML5 SHOULD NOT be served as text/html but MAY BE?

    Posted by andrezero at