Anne van Kesteren

DOCTYPE of HTML5

On IM the other day Simon Pieters pointed out the ‘new’ DOCTYPE for HTML5. The old one was <!DOCTYPE html5> (finally a DOCTYPE you don’t really have to think about), but as Simon discovered that triggered quirks mode in Firefox. As the sole purpose of DOCTYPEs these days is to trigger (full) standards mode it was obvious HTML5 couldn’t use that one.

You might object to the statement that DOCTYPEs are purely for triggering a rendering mode, but if you think about it you know it’s true. Although DOCTYPEs could be used for versioning the language and that was actually one of their purposes it didn’t really work out well. I clarified my point of view on that last year when I said that MIME types matter; DOCTYPE don’t. That’s still true. So in the end you can use a DOCTYPE to trigger quirks or standards mode and sometimes semi-standards compliant mode when you can’t handle the real deal.

There is one other purpose, validation. But as browsers are not using SGML parsers and also no DTD parsers (that would make your and the W3C’s site unreachable) that’s a point that’s mostly a myth. Validators are useful syntax checkers but they could just check what DOCTYPE you declared and map a DTD to it. And in fact, that’s what they already do. I currently use <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> as DOCTYPE without referencing a DTD. I do validate.

Now you may know that the part right after !DOCTYPE, html5 above and html here is actually a reference to the root element of the language. As html5 is no root element (and it can never be in text/html and in application/xml there is no such thing as DOCTYPE switching) it almost logically triggers quirks mode. Well perhaps not entirely logical as you could expect browsers to treat everything with a DOCTYPE in standards mode with several notable exceptions. Anyway, taken in account these new findings the DOCTYPE for HTML5 (only relevant in text/html for triggering standards mode) is even better than before:

<!DOCTYPE html>

Let’s hope it stays like this. (All the references to ‘one’ (single, sole, only, et cetera) in this post are of course part of the countdown ;-))

Comments

  1. Wikid. A DOCTYPE that can be remembered!

    Posted by Mathias Bynens at

  2. Although DOCTYPEs could be used for versioning the language [...]

    It can do no such thing, even if that is what the W3C, and just about every (X)HTML tutorial tells people. A small example:

    <!DOCTYPE foo PUBLIC "-//W3C//DTD HTML 4.01//EN" [
       <!ELEMENT foo - O EMPTY>
    ]>
    <foo>

    But there is no element type foo in HTML 4.

    I currently use <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> as DOCTYPE without referencing a DTD.

    It does. The formal public identifier ("-//W3C//DTD HTML 4.01//EN") is used by the validator to look up the DTD in a catalog.

    Posted by David Håsäther at

  3. Hmm. Ain't it necessary to know more details of the Markup language you use, and thus <!DOCTYPE html> being shortsighted? For example, if there is another HTML version altering some elements - what DOCTYPE should it use then? ...!?

    Posted by Jens Meiert at

  4. but as Simon discovered that triggered quirks mode in every single browser.

    Then he obviously didb not test in IE6. Unknown doctype trigers standards compliant mode in this version of Internet Explorer. A quick test with margin:auto confirms this.

    Validators are useful syntax checkers but they could just check what DOCTYPE you declared and map a DTD to it. And in fact, thats what they already do.

    I see no evil in this.

    Posted by Rimantas at

  5. Actually, I only mentioned that it triggered quirks mode in Firefox. It triggers Standards mode in IE and Opera. Here's a test case. :)

    Posted by zcorpan at

  6. David, it does not reference one explicitly, which would be necessary for browsers for example. Or for general SGML parsers.

    Jens, like I said, versioning is obsolete.

    Rimantas, I agree it is perfectly fine. I updated the thing about triggering quirks mode by the way.

    Posted by Anne at

  7. Nonsense. Doctypes are also used to declare entities and default values for missing attributes.

    Also although I agree that doctypes shouldn't be used for versioning languages, I disagree that media types help in this regard. In fact, they don't help at all! Is that an XHTML 1.1 document or an XHTML 1.0 document with the application/xhtml+xml type? Is this a RSS 0.92 or a RSS 2.0 file? You have to open them to find out.

    Media types also provide little help when mixing languages, and the registration step is too centralized and hard for little languages to worry about. It also duplicates information: For example, I shouldn't need a media type for XHTML, MathML, or RDF/XML when there's already a namespace which specifies the same information.

    As a result of judicious use of MIME outside of email, you have to know the context of a file to determine how to process it (i.e. its type). This application/xml file is a Google sitemap, that application/xml file is a RSS feed, and the application/xml file over there is a XHTML file. Not to mention the mess with movies and codecs (effectively different types folded into the same format and media type).

    MIME is the worst possible solution. It works well with the typical html document and associated images, stylesheets, and javascript applications. For everything else media types suck.

    Posted by Jimmy Cerra at

  8. Agree with Jimmy Cerra. Since the DOCTYPE is unrecognized, it is prefectly correct to render it in quirks mode.

    Posted by minghong at

  9. Also although I agree that doctypes shouldn't be used for versioning languages, …

    Since they cannot, as David already pointed out correctly, that's very generous. :)

    BTW, hey there: <!doctype root> (surprise).

    But don't let yourself get fooled kids, the whole shebang <!doctype html system> is still needed for validation (ask your closest system with a sensible catalogue, nudge nudge, wink wink).

    Posted by Eric at

  10. Minghong, I think DOCTYPE switching is a bad practice too, for XML documents. If they must, browsers should switch based on the presence or absence of an xmlns declaration on the root element for XHTML documents. Of course I wish they wouldn't switch, but we are in the real world where most interesting pages don't validate to the standards. For HTML documents, DOCTYPE switching is okay since it doesn't break anything. SGML applications are not meant to be combined, especially HTML.

    Posted by Jimmy Cerra at

  11. David and Eric, why does your example mean you can't use the doctype for version information? You are adding elements with undefined semantics, but that doesn't change the semantics of the rest of the elements.

    Technically, adding anything that changes the type of any HTML markup means your document is no longer HTML. It is something else. Your example is like saying in XML:

    <foo xmlns="http://www.w3.org/1999/xhtml" />

    Saying foo is in the spec, even in a machine-processable format, doesn't make it so. In an open-world interpretation, your new elements have undefined semantics. In a closed-world interpretation your document is no longer HTML when adding anything not stately allowed in the spec (even entities). Hopefully HTML 5 will explicitly define the semantics of content and entities not defined in the spec.

    Posted by Jimmy Cerra at

  12. David and Eric, why does your example mean you can't use the doctype for version information?

    Because, as the above example shows, it isn't HTML 4.01 even if you migth be fooled into thinking so when looking at the FPI.

    Posted by David Håsäther at

  13. It looks like Firefox's HTML parser treats any DOCTYPE which doesn't contain HTML as the root element as badly formed, thus triggering quirks mode: nsParser.cpp#821, nsParser.cpp#1136.

    Posted by FP at

  14. David and Eric, why does your example mean you can't use the doctype for version information?

    There is no such thing as 'version information' by ISO8879-defined means to start with, and no way to 'name the DTD' (with a doctype declaration, read up on architectures). The document type declaration knows a couple of notations to basically do one and the same thing, including external subsets, none of which is more important than the other or any internal subset.

    Enter W3C specs and voodoo programming. Speaking of which, '<!doctype html>' is correctly 'recognized' by Mozilla since approximately 1.5 ;-).

    Posted by Eric at

  15. Actually the DOCTYPE will probably become just "<!DOCTYPE HTML>", since if we used "<!DOCTYPE HTML5>" we'd have to change the DOCTYPE with every version revision which would be a pain.

    ...and it triggers quirks mode in Mozilla, which is a more direct concern.

    There's no versioning problem. We will never require UAs to do different things based on what version of HTML they find a page claiming to use. It's bad enough that UAs have been forced into quirks vs standards mode versioning.

    Posted by Ian Hickson at

  16. Jimmy Cerra, I think DOCTYPE sniffing only occur for text/html. For "application/xhtml", obviously there is only one choice: XHTML 1.1 (may XHTML 1.0… whatever). In XHTML, if the DOCTYPE is not recognized, the whole document is just rendered as generic XML (not treated as HTML).

    Since HTML5 is served as text/html, it is destinated to be sniffed.

    Posted by minghong at

  17. I wrote more about using DOCTYPEs for versioning information since this is becomming offtopic for this blog entry. I want to know how my analysis is incorrect, if it is incorrect. Saying "read up on architectures" is arrogant without explictly saying what you are referring to.

    Eric, what you said makes no sense to me. Can you rephrase it?

    David, I never said it was HTML 4.01. In fact, I agree that it isn't HTML 4.01. Using DOCTYPE declarations for specifying version information has specific requirements which your counter-example doesn't satisfy.

    Posted by Jimmy Cerra at

  18. David, I never said it was HTML 4.01.

    I know. My mistake. What I meant was really "...one might be fooled...", not you specifically.

    There are some more info about this subject in a usenet post by Eliot Kimber, which also include info about architectures.

    Posted by David Håsäther at