Anne van Kesteren

The future of HTML

Contrary to popular believe, HTML 5.0 is not XHTML. The W3C may claim the latest version of HTML, an upon SGML based markup language, is XHTML, an upon XML based markup language, but we all know that isn't the case. (If you don't believe that the W3C is claiming that, follow http://www.w3.org/TR/html/, linked from the HTML 4.01 Specification:

Latest version of HTML:
http://www.w3.org/TR/html

Indeed, that document is currently the same as the XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition) document also known as A Reformulation of HTML 4 in XML 1.0.) Fortunately, the WHAT Working Group was established to do some further work on HTML. After all, HTML is backwards compatible and XHTML is not. It is also a good thing because the WHATWG deals with browsers that will be important for the next 5 years where the W3C thinks Internet Explorer 6.0 is irrelevant. Don't get me wrong, having HTML in an XML form is great, mixing it with RDF, SVG et cetera. We just don't need XHTML for web pages at the moment on the client side. Having XHTML on the server and transforming it with XSLT to HTML is a good thing.

So what is WHAT doing? The WHATWG (sometimes referred to as WHATTF) is busy with creating several drafts that aim to improve Web Forms and Web Applications. Elika Etemad (also known as fantasai (subscribe)) maintains syntax.whatwg.org which will eventually define the syntax and DTDs (yeah, RelaxNG and other things are looked at as well) for the new specifications. The extensions to web forms are simply brilliant. type="number" makes things a lot easier or type="uri" (yes, type="email" exists as well). What I especially like from syntax.whatwg.org is that things like <br />, which used be valid, or now invalid because no browser implemented them correctly (before you scream, I'm talking about HTML).

Web Applications defines that the ID, TITLE, and CLASS attributes apply to all elements, which is basically what every browser more or less has implemented already. This is what is being done with most things. Describe things into details based on feedback and real world implementations. Of course, new attributes and elements are introduced as well, along with error handling, DOM methods and how CSS applies to them. (Mostly the CSS3 User Interface Module, with pseudo-classes like :valid and :checked.)

On the public mailing lists there is some discussion on introducing elements like SECTION (actually, that one is in the draft), H, NOTE, CONTENT, SIDEBAR, HEADER, FOOTER, NAME and QM. I probably missed some elements. However, if you have any suggestions or thought just propose it on the WHATWG mailing list or write it down here and I will write a mail. This includes new elements, standardizing really useful things, attributes and other things that could make HTML 5.0 even better!

Comments

  1. <br /> was never valid HTML (or, at least, never meant what everyone thinks it means).

    Posted by Jacques Distler at

  2. one thing i can't wrap my head around (admittedly, i haven't been following WHATWG): first, there's talk of how today's browsers don't understand XMLisms like <br /> and treat them as tag soup, and then there's the introduction of new elements which old browsers would treat as tag soup as well? or am i missing something? (not trying to troll, just trying to understand why the latter is better than the former)

    Posted by patrick h. lauke at

  3. The general rule for handling elements you don't understand is to ignore them and render their content. This is perfectly unambiguous in XML (where all elements have explicit start and end tags). It seems to me to be potentially ambiguous in HTML, where end (and sometimes even start) tags are frequently optional. What's the document tree for

    <html> <foo> a <p> b <bar> c </html>

    Without knowing about the content model for the elements <foo> and <bar>, I can't tell you.

    Posted by Jacques Distler at

  4. I didn't meant the XML form of <br /> Patrick, I meant the SGML version. Read Sending XHTML as text/html Considered Harmful for an example. (So browsers never supported that part of HTML, which is based on SGML and the WHATWG updates that to state that the BR element written in such form doesn't mean anything special anymore.)

    Introducing new elements will not cause much harm, since they will basically be ignored. At least, they will be specified in a way that older browsers won't have trouble with it. Like it always has been done with new HTML versions. (The Q element breaks a bit with that though; hence the QM element.)

    Posted by Anne at

  5. things like <br />, which used be valid, or now invalid because no browser implemented them correctly (before you scream, I'm talking about HTML).

    WHAT-TF? How can the WHAT-WG make that perfectly valid SGML syntax invalid? Unless they're redefining SGML too, that is just not possible. It's irrelevant that no current browser supports it, and doesn't treat the / as the end of the element, nor output the >. I think the HTML 4 appendix b deals with it just fine, and that's exactly how the WHAT-WG should do it. I'll bring that up on the mailing list later, after I finally get around to finishing reading this spec. (I know, I've been saying that for the last 2 months — I'm just lazy! ;-) )

    Posted by Lachlan Hunt at

  6. They can, apparently you can specify such things in the DTD.

    Posted by Anne at

  7. It is also a good thing because the WHATWG deals with browsers that will be important for the next 5 years where the W3C thinks Internet Explorer 6.0 is irrelevant.

    So how will the WHATWG make Internet Explorer understand the new element types and attributes? They won't? Well then what advantages does "HTML 5.0" have over XHTML x.x and HTML 4.01 with conneg?

    What I especially like from syntax.whatwg.org is that things like <br />, which used be valid, or now invalid because no browser implemented them correctly (before you scream, I'm talking about HTML).

    Firstly, it is not true that no browser implements it properly; Emacs-W3 does. Secondly, I thought you said "HTML 5.0" was backwards-compatible? Or is backwards-compatibility defined as "not breaking Internet Explorer and Mozilla"?

    On the public mailing lists there is some discussion on introducing elements like SECTION (actually, that one is in the draft), H, NOTE, CONTENT, SIDEBAR

    CONTENT? Isn't that just a synonym for BODY?

    SIDEBAR? What happens when you decide it's better presented as a drop-down? Wouldn't a better name be AUX for "auxiliary information"? Or, to pre-empt the most common use, wouldn't a NAVIGATION element type be useful?

    Posted by Jim Dabell at

  8. Jim, you know how the WHATWG is going to solve the implementation in Internet Explorer, why bring it up again? Backwards compatibility is indeed defined in a sensible way. MENU has been added. A SIDEBAR, however, usually includes more than just the navigation.

    Posted by Anne at

  9. Having XHTML on the server and transforming it with XSLT to HTML is a good thing.

    Why would XHTML on the server be a good thing? I rather see a more semantic XML-language there. In fact I think XHTML is pretty limited for sementically describing content, especially if it's not for use in a webbrowser. For now I would use my own XML-markup if nobody else needs access to it, and for the future I hope for a xml dialect making use of Object Orientated XML and inheritence. That would be a better base to distribute the same content to other user agents/applications/devices too.

    Posted by Jurriaan at

  10. Jurriaan, agreed that on the server (where Google doesn't come) XML can be far more semantic. However, practically, only a small percentage of websites actually use semantics. And a far, far smaller percentage uses them correctly. Using HTML correctly is quite tough.

    Posted by Anne at

  11. Unless otherwise stated, XML elements defined in this specification are elements in the http://www.w3.org/1999/xhtml namespace, and attributes defined in this specification have no namespace.

    Why? Namespaces are a very good way of handling this sort of thing.

    Posted by Grant Watson at