Anne van Kesteren

HTML5

22 April 2005

(If you haven’t paid attention for the past few weeks/months: I’m going to talk about HTML5 which is currently called Web Applications 1.0 and is defined here.)

Although after the release of XHTML1 and the release of the plans for XHTML2 and XForms nobody thought there was any future left for the text/html media type; there is! The WHATWG recognizes the fact that HTML and remaining backwards compatible are still important for the web. It focuses on making the implementation of text/html interoperable by defining error handling. As HTML typically was language with no known error handling implementations of HTML needed to find out some way to deal with invalid markup. (Sometimes referred to as ‘tag soup’.) As browsers logically implemented this differently, webpages were shown in different ways because the generated DOM tree, after parsing the document, was not the same.

At the time XML was developed this problem came up and the solution was that the error handling of XML basically was throwing in a parsing error; leaving the end user with nothing. This failed for a bit on the web (RSS) and even the upcoming Atom format does not say anything normative about this. A format that has defined non draconian error handling and is a big success is CSS. Of course, some early adopters implemented parts incorrectly, but nowadays the forward compatible parsing rules are implemented and when making a mistake the style sheet is still applied, which is a huge advantage over showing nothing at all, or returning a parsing error.

As said HTML5 will define forward (and backward) compatible parsing meaning that HTML will become a usable format again. Especially since it does not have draconian error handling meaning the end user will always get the information. (You could call that user friendly.) Besides defining error handling HTML5 will also clarify a lot from HTML4, define the relationship to the DOM and possibly introduce new functions and objects. It will include Web Forms 2 which was recently submitted to the W3C, the CANVAS element mentioned two days ago and more elements aimed at today’s web. (More specially; aimed at web applications instead of scientific documents.)

(That Safari and Mozilla implemented CANVAS is by the way against the specification: It is very wrong to cite this as anything other than a work in progress. Do not implement this in a production product. It is not ready yet! At all! Not that anyone cares and not that the specification will not be updated to match what is implemented, but still.)

The specification will also update XHTML1, obviously. In doing so it will nullify the text/html compatibility statement and completely obsoleting Appendix C. After all, who needs XHTML as text/html when error handling for HTML is defined? And also, who still believes that those parts of the XHTML1 specification were not made in error? Anyway, the good news is that the specification is made in such a way that it is both HTML and XML compatible although the XML version does give more possibilities like including a list inside a paragraph. Something that is impossible in HTML given its parsing rules.

With apologies to Mark Pilgrim I’m going to call myself “Mr. HTML5” and follow the specification for a bit.

Comments

Of course the draconian error handling wouldn't have been a problem if web developers actually looked at their own pages when posting or just used proper editing software in the first place, but then again, I guess these things aren't going to change soon. :-(
Anyway, the good news is that the specification is made in such a way that it is both HTML and XML compatible although the XML version does give more possibilities like including a list inside a paragraph. Something that is impossible in HTML given its parsing rules.

This worries me a bit, though. I can only imagine that there will be a lot of uninformed users that will use some stupid content negotiation script and that won't test a page properly both when sent as text/html and when sent as XML.
Sending it as XML is useless in a backwards-compatible environment, and since being backwards-compatible is what HTML 5 is all about, I just hope people stick with text/html exclusively for HTML 5 (in most cases).
Having HTML 5 as valid XML does, obviously, have significant advantages though when you want to write your own parsing tools. I believe XML parsing tools are much simpler to create than SGML parsing tools. Anyway, I'm also going to watch HTML 5 closely (I'm already on the WHATWG's mailing list). :-)
Posted by Charl van Niekerk at 10:03PM
Anne, HTML5 also includes the Web Forms 2.0 specification. See this email for clarification:
http://listserver.dreamhost.com/pipermail/whatwg-whatwg.org/2005-April/003746.html
Posted by Dean Edwards at 10:05PM
Charl, Well, you might want to use HTML5 semantics in an XML environment for example. Doing content negotiation is indeed not very clever as it does not offer you any advantages. Note that this is the same with XHTML1 and HTML4 (or XHTML1 as text/html) but that is just a bit more difficult to see.
Note also that, although I haven’t mentioned it in the article, HTML5 will not be SGML based. (I wanted too, but it would become too long.)
Dean, I actually mentioned that in my post ;-)
Posted by Anne at 10:09PM
I almost forgot about combining the namespaces, although I would rather stick to the already well-supported XHTML for now in feeds and stuff.
Luckily, the WHATWG do seem to advise the optimal usage of content types.
Posted by Charl van Niekerk at 10:49PM
In general this rule of thumb is a good starting point Be liberal in what you accept, be conservative in what you send.
XML is an elegant system for a more civilized age. But clumsy people prefer clumsy and random systems. It's the tragedy of the web that most people expect/demand the browsers to be liberal, but refuse to be conservative in what they send.
Posted by Gideon at 10:50PM
Charl, I mentioned that too. However, I was thinking about CDATA sections, which are not really compatible with HTML.
Gideon, are you calling CSS random? And the people who like it ‘clumsy’? When HTML has defined error handling, is it still random?
Posted by Anne at 11:05PM
Anne, yes you mentioned WF2 in your post. Half way down. In my mind it is the most important component of HTML5. That is why it was produced first and will be implemented first. I think it should be very clear that WF2 is a vital component of HTML5. I've seen other people linking to the WA1 spec as HTML5 as well. I don't want WF2 to get lost in the HTML5 buzz.
Posted by Dean Edwards at 11:17PM
There's still plenty of semantic elements lacking from that specification, which is pretty much the same situation as with XHTML 2. Mostly though, I'm not seeing why they say in that Spec that XHTML 2 has less semantic elements (suitable for various purposes) than this HTML 5 spec. I'm looking at both, and I see more in XHTML 2... Except for the m element, which I can see a lot of use for indeed. But personally I'd have preferred mark or so, but that's personal preference...
On the whole, however, I'm really wondering now... becuase nothing in that document nor in your post seems to be clear on this: HTML5 comes in 2 flavors, correct? XML and HTML. Do NEITHER versions have draconian error handling? Are either versions guaranteed to be XML-parseable when they're valid (and well-formed)?
As it is now, I see no improvement at all for the sake of interoperability. It'd be great if someone could enlighten me on that, but right now I only see XHTML 2 being very useful for that, and HTML 5 being indecisive (and thus, not useful for what I want).
Posted by Faruk Ates at 12:18AM
You should do a better job at reading I guess. First of all, HTML5 is not finished yet. Secondly, HTML5 does define more semantic elements like ASIDE and ARTICLE, but it will also define CALENDAR, CARD, DATAGRID and other elements that represent what web sites need. XHTML2 defines document semantics, mostly. Thirdly, of course the XML version has draconian handling, otherwise it would not be called XML. Fourthly, the improvement for interoperability is purely for text/html and will be defined in the parsing section eventually.
And finally, you can send in your comments on the public mailing list. Thanks.
(Also, you can use semantics on this weblog instead of Caps Lock.)
Posted by Anne at 12:49AM
I know it's not finished, but that spec was just not clear on this, because now I still have no clue how the improved interoperability for the text/html version would work. And quite frankly, I have my doubts for it.
And yes, I forgot... my god the one occurance *gasp*...
Posted by Faruk Ates at 1:11AM
XHTML 2 has Metainformation Attributes which is a much more extensible solution for semantics. Also attributes are better than elements because structure and semantics are orthogonal.
You can see that because there is code and blockcode, and also quote and blockquote. If it were up to me every element that is just semantics and not text structure should be removed from HTML.
Posted by Sjoerd Visscher at 3:30AM
I don’t like HTML5 as much as I used to... there are a few things in it which I find unnecessary or where the wrong choices were made.
Personally, I’d just prefer to have XHTML 2.0 finished and implemented ^_^. It’s nice :).
~Grauw
Posted by Laurens Holst at 4:27AM
Faruk, it will define how the HTML DOM will look like for cases when a required end tag is missing or when elements are incorrectly nested, et cetera. When that is defined authors will know what to expect when they mess something up, just like with CSS.
Sjoerd, yes; what if the world was perfect :-) I agree though that semantics and structure could be separated more, but that is just useful for the few who actually get it I guess. It isn’t as practical, which is what HTML should be.
Laurens, what are you babbling? Wrong choices made? You are talking like it is finished while there isn’t even a stable draft released. (Something like the first working draft the W3C would release.)
Posted by Anne at 4:39AM
Well, HTML5 looks interesting, but will it be ready for use any time soon? Just as a guess, I'd say that by the time it is ready and widly supported XHTML2 will widly supported too.
You can probably pick what standard I'd rather use (XHTML2).
Posted by The Wolf at 9:56AM