Anne van Kesteren

Standards

A few days ago, when I was offline, Keith made this post about 100% validation. Apparently The Web Standards Project stated that non-validating sites can't claim to use web standards. If you follow the following links you can see that WaSP validates and Keith does not. In the comments on Keith's post all kinds of people give their own view and link "kindly" to their own site to promote an article they have written. Other people are telling you should ignore me. You hear that? Go away! The same person considers switching back to (the counter-productive) HTML a bad thing, while it doesn't matter a thing. Actually, HTML is the preferred language, since we are not ready for XHTML, but let's not go there.

I also notice that people in the comments mix up terms like 'valid', 'invalid', 'well-formed' and 'ill-formed' and are comparing XML and XHTML in a way that isn't appropriate. (Actually, you can't really compare them.) A quote that illustrates this:

Of course, that’s only if the document is invalid XML – it may be valid XML and invalid XHTML, but then it’ll still display properly.

However, let's continue. I don't really care if a page doesn't validate. It might make your site render worse or unusable, especially if you are not encoding ampersands (those are important for the browser we are backwards compatible for (the problem is that most sites are not backwards compatible because they don't encode them)), since with unencoded ampersands the resolved URI might be a bit different from what you intended. Unencoded ampersands, however, are most likely not the fault of some designer, but from the programmer of the software that is being used. (Has anyone reached such a community or company to tell them about it?)

If you are using XHTML and you really shouldn't, you must make sure your site is well-formed. That is a bit different from valid. It means for example you are allowed to use the EMBED element, which might be standardized in HTML 5.0 by the way… as long as you are using it according to the XML syntax. When you are using HTML you can do the same thing. When using the text/html MIME type you should stick as close to validation as possible, but you can enter some additional code if desired, like adding an ID attribute to the root element in HTML 4.01. Every browser supports it, but it isn't standardized (yet). (HTML 5.0 will have that.)

When you are really using XHTML, which is different from most XHTML you just have to make sure you got the namespace right and the rest is up to you. Note that when you are leaving out an element in (real) XHTML (I dislike to put the term "real" in front of it) it really isn't there, this is different from HTML when you can try to leave out an element by omitting it's start and end tags, but it will be there. (Try omitting both the start and end tags of the BODY element and apply the following style declaration to that document: body{background:lime}.)

Enough about HTML. It's great that 1% of the web is adopting better practices and more and more people and companies become aware of advantages of better coding (semantic markup). The problem is: there is more. Besides CSS and not using tables (for layout) things like HTTP exist. And although many people don't acknowledge it or simply don't know it, don't want to explain it or whatever, you need to know it. That is probably a pity for some since building a website becomes more and more complex every time "someone" adds another layer. First you just pressed publish in Adobe Golive. Now you are making a graphic, converting it yourself to HTML, CSS and images (3 layers) and you are complaining on your weblog web standards are hard and CSS isn't ready for prime time.

So anyway, there is fourth layer, HTTP and some people can tell you about other layers as well, like Javascript. (Do you know XBL? Server side scripting language? Et cetera.) HTTP is underused. A lot of people can't configure their own server and therefore never learn about HTTP and won't use it in the future (even large companies). They think you change the character encoding of a document using a META element and that changing the DOCTYPE is enough, wrong. Using the correct encoding is essential. Knowledge about MIME-types is important. Syndication (Atom) is important.

Blah. There is no point.

Comments

  1. I think I found a great server-side PHP script... content-type negotiation that actually makes the page serve up XHTML (complete with proper code, MIME, and doctype for XHTML and validates as such) for any browser that understands "application/xml+xhtml" and serves HTML 4.01 Transitional to anything else...and both validate perfectly

    mime-types

    my site can't be said to be XHTML or HTML... it's either/or ! (I haven't linked to it here because the content is a little sensitive, but I'll share it via e-mail if anybody cares)

    Posted by Emily at

  2. I, personally, look forward to HTML 5, because it will provide the "Web Standards ≣ XHTML" crowd with a stark choice:

    1. Continue using a markup language that sounds sexy because its name starts with an "X", but none of whose XML-ish features they actually use.
    2. Switch to a markup language that has the shiny new features that they want, but which is ipso facto backward and retrograde because its name does not start with an "X".
    3. Descend into utter tag-soupery by using HTML 5 features in their XHTML documents. Some will justify this as adherence to "real-world Standards."

    [An entirely different sort of irony would be presented by the arrival of XHTML 2.0. To avail themselves of that latest and greatest markup language would require converting to an XML MIME type. Unfortunately, XHTML 2 won't happen any time soon.]

    In the meantime, I suggest you find another topic to advocate. The people churning out tag-soup XHTML are pretty set in their ways. You're not going to convince them to do otherwise by mere force of logical argument.

    Posted by Jacques Distler at

  3. Please forgive me if I have missed something but isn't HTML 5 first going to have to be finished, and then submitted to a standards body for standardization ( and assuming that it is accepted and adopted by a standards body ) before we can use it?

    Posted by Peter Winnberg at

  4. Speaking of layers, Some of us bypassed Golive entirely and go back to the days of Netscape 2 and writing all our code in our favorite text editors (or notepad? hah.) Actually, that's not that much different from what I'm doing NOW.

    Honestly, while I think WYSIWYG tools may be helpful, they have a downside of enticing a lot of people into thinking that web design and development are a lot easier and less complicated than they actually are. And there's nothing I hate more than seeing people in this line of work who don't fully comprehend the medium they're working with.

    Posted by Josh Mast at

  5. Peter, like I said: but you can enter some additional code if desired. Validation is not really important, you just have to make sure you know what you are doing (most people do not, like Josh pointed out).

    Posted by Anne at

  6. I do actually have to disagree with validation not being important, because when I need to write a piece of software that parses and displays any particular kind of SGML/XML document, it would be nice to only have to read the specification and follow that precisely and know that I can now parse/display any relevant document on the Internet. If all of the browsers available today would have done that well, no cross-browser testing would have been necessary, except for the fact that some browsers might have some real bugs.

    On the other side, that isn't the situation anyway, so that's only a dream. The only real reason for me to use valid code is probably out of political advocacy, and sometimes to get things working and for functionality reasons, you may need to use invalid code. That's probably perfectly fine then, as long as you still try to code cross-browser.

    One thing I really do agree with, although I said different things in the past, HTML is probably more practical to use right now than XHTML, because when we're trying to make backwards-compatible websites, we can't use all of the nice XHTML features such as combining namespaces anyway. But that isn't to say that we should completely throw XHTML out of the door.

    It's easy to convert static documents with a tool like Tiny, but dynamic documents is a different thing altogether. Therefore for future-compatibility, it wouldn't necessarily be a bad idea to start converting. I don't know.

    Posted by Charl van Niekerk at

  7. I don't believe HTML 5 is due, unless someone can prove me wrong. The W3C abandoned HTML 4 to work on XHTML. Their next version will be XHTML 2.0. The last version of the HTML spec was in December 1999! See here:

    http://www.w3c.org/MarkUp/Activity

    There's no point to HTML any more, as it cannot be extended in the way XHTML can.

    Posted by Chris Hester at

  8. Chris, I don't think they are trying to continue with HTML in the long run. It seems like they are trying to see what can be done today in the most popular browsers on the market, and then they try to make a specification out of it. Stuff like the id and class attributes on the root element, which is not in HTML 4 but is yet supported by all of the major browsers (it seems).

    In other words, I think HTML 5.0 does have its place in the short run, but maybe not in the long run when XHTML becomes properly supported.

    Posted by Charl van Niekerk at

  9. 3. Descend into utter tag-soupery by using HTML 5 features in their XHTML documents. Some will justify this as adherence to "real-world Standards."

    I'm going to go out on a limb and predict that once "HTML5" has even a smidge of browser support, you'll start seeing scads of web dev articles saying, "Look at all these cool new features!!" Once that happens, most of the world will merrily proceed to use pieces of HTML5 (at least the elements and attributes that degrade nicely), regardless of the DOCTYPE that they slapped at the top of their pages.

    [An entirely different sort of irony would be presented by the arrival of XHTML 2.0. To avail themselves of that latest and greatest markup language would require converting to an XML MIME type...]

    Has anyone at the W3C made some sort of official statement that an XML MIME type will be required for XHTML 2.0? Yes, that requirement would be in line with the philosophy driving XHTML 2.0, but I've never heard anyone actually say that. But then I'm lazy and don't pay much attention to the W3C lists.

    Posted by Evan at

  10. Has anyone at the W3C made some sort of official statement that an XML MIME type will be required for XHTML 2.0? Yes, that requirement would be in line with the philosophy driving XHTML 2.0, but I've never heard anyone actually say that. But then I'm lazy and don't pay much attention to the W3C lists.

    My understanding is that there is still some debate between using application/xhtml+xml and inventing some new MIME type (so that servers can do XHTML 1.x/2.x content-negotiation with clients). Everyone agrees that XHTML 2 MUST NOT be sent with MIME type text/html.

    [M]ost of the world will merrily proceed to use pieces of HTML5 (at least the elements and attributes that degrade nicely), regardless of the DOCTYPE that they slapped at the top of their pages.

    No question. But I'm curious about the folks, who like to believe that the code they churn out has something to do with "Web Standards." What will they do and how will they rationalize their decision?

    Posted by Jacques Distler at

  11. The W3C screwed up big time in my opinion:

    The XHTML1-specs say that XHTML1 should be backwards-compatible by sending it with a text/html mime-type. But when we look at the HTML/SGML specs and think about the practical use of mime-types this just doesn't seem possible.

    XML was meant to be extensible, not to be backwards-compatible. XHTML being backwards-compatible or extensible (being both seems impossible) is too confusing for most webdevelopers and web-engineers. This is why I think XHTML1 isn't a nice XML-subset for the use of extensibility. This also means that I'm not interested in XHTML1 anymore.

    In my opinion this is only a small problem compared to the following:

    XML was meant to be extensible. That's a nice thing offcourse. Since a few years the W3C is trying to create modules so it would be easyer to implent or mix different subsets. That seems nice, but instead of seperating these modules they are mixing them up. Some examples:

    Maybe I should switch(-back) to more stable(?) ways of creating web-applications (for example Java or Flash).

    Posted by Jerome at

  12. The XHTML1-specs say that XHTML1 should be backwards-compatible by sending it with a text/html mime-type.

    Really? I'd say they state that you may serve XHTML 1.0 as text/html if you write it in a backwards-compatible way. They also say that you should use application/xhtml+xml for all XHTML doctypes (in the non-normative note).

    Posted by Tommy Olsson at

  13. Ok, my mistake, replace "should" with "could" in that sentence. :)

    Posted by Jerome at

  14. Has anyone at the W3C made some sort of official statement that an XML MIME type will be required for XHTML 2.0?

    The application/xhtml+xml mime-type is already a requirement for XHTML 1.1 - a fact which many standards advocates tend to forget...

    Posted by Ben at

  15. No, the mime type is recommended, but people may still serve XHTML as 'text/html' according to this W3C page:

    XHTML Media Types

    Now that there are at least four possibilities on media type labeling for XHTML Family documents - 'text/html', 'application/xhtml+xml', and generic XML media types 'application/xml' and 'text/xml'.

    The 'text/html' media type [RFC2854] is primarily for HTML, not for XHTML. In general, this media type is NOT suitable for XHTML. However, as [RFC2854] says, "[XHTML1] defines a profile of use of XHTML which is compatible with HTML 4.01 and which may also be labeled as text/html".

    'application/xhtml+xml' SHOULD be used for serving XHTML documents to XHTML user agents. Authors who wish to support both XHTML and HTML user agents MAY utilize content negotiation by serving HTML documents as 'text/html' and XHTML documents as 'application/xhtml+xml'.

    Also worth noting from that page:

    XHTML documents served as 'text/html' will not be processed as XML [XML10], e.g. well-formedness errors may not be detected by user agents. Also be aware that HTML rules will be applied for DOM and style sheets (see C.11 and C13 of [XHTML1] respectively).

    Authors should also be careful about character encoding issues. A typical misunderstanding is that since an XHTML document is an XML document, the character encoding of an XHTML document should be treated as UTF-8 or UTF-16 in the absence of an explicit character encoding information. This is NOT the case when an XHTML document is served as 'text/html'

    Posted by Chris Hester at

  16. Chris, take a look at this quote for example:

    3.5. Summary: XHTML Basic/1.1 should not be served as text/html.

    Ok, that seems reasonable, but what about the SGML Open Catalog Entry?

    This is only one thing which is confusing me (and other developers). I don't think we can rely on the documents at the W3C since those documents are too confusing.

    Posted by Jerome at

  17. My understanding is that there is still some debate between using application/xhtml+xml and inventing some new MIME type (so that servers can do XHTML 1.x/2.x content-negotiation with clients). Everyone agrees that XHTML 2 MUST NOT be sent with MIME type text/html.

    If that's true, then XHTML 2 is DOA. Some of the new elements and attributes might survive and work their way into the ecosystem, depending on what the browser vendors do. But XHTML 2 as envisioned by its designers will never be anything other than a curiosity, a museum piece.

    Posted by Evan at

  18. As we discussed long ago, XHTML 2 introduces a new and incompatible model for the <p> element. In HTML, the closing tag to the <p> element is optional. You know that the <p> comes to a close whenever you encounter another block-level element. Paragraphs cannot contain other block-level elements.

    XHTML 1 requires an explicit </p> tag, but maintains the restriction that you may not nest other block-level elements inside a <p> element. With XHTML 2, this restriction is lifted. You can nest other block-level elements inside a <p> element.

    You can never hand XHTML 2 to the tag-soup parser. It will not know when to end a paragraph!

    This was obvious from the start and nothing that has happened subsequently has (or could) change it.

    Posted by Jacques Distler at

  19. Yup, I remember this discussion. Doesn't change the fact that someone, somewhere is going to try throwing XHTML 2 elements and attributes into a tag-soup parser.

    I know that you believe that the parsers will have to ignore this markup. However, I believe that if some browser vendor sees a competitive advantage in adding some components of XHTML 2 to their tag-soup parser ("IE 8.0! Now with href-anywhere Technology!(TM)"), they will do it.

    Thus, XHTML 2.0 as envisioned by its designers is DOA. Either I am right, in which case little pieces of XHTML 2 may live on in one mutated form or another. Or you are right, in which case XHTML 2 will be relegated to static museum piece status.

    By the way, anyone who thinks people are actually going to be able to implement real XHTML 2 sites ought to take a serious look at the backlash in the articles Anne is linking to. "Who cares about MIME-types and XML parsing?" "Just ignore those standards-zealots." My favorite bit was the comparison between the X-Philes and audiophiles -- as if conforming to XML standards was some sort of "aesthetic" choice. Oh, how I wish that were true! But that ship sailed a long time ago.

    Posted by Evan at

  20. My favorite bit was the comparison between the X-Philes and audiophiles -- as if conforming to XML standards was some sort of "aesthetic" choice.

    Yes, that was a hilarious example of the foolish depths to which some self-described(!) Web Standards advocates have sunk. That's why I look forward to the arrival of HTML5 (and/or XHTML2), where they will be forced to confront the fundamental illogic of their position.

    Thus, XHTML 2.0 as envisioned by its designers is DOA. Either I am right, in which case little pieces of XHTML 2 may live on in one mutated form or another. Or you are right, in which case XHTML 2 will be relegated to static museum piece status.

    I prefer "niche status" to "museum piece status." There are not a lot of people who really need MathML (another XML-only web technology), but those who do can now count on pretty widespread browser support (IE/6 with a plugin and the Mozilla family). I think XHTML2 will be similar. Not too many people actually need it, and I betcha there will be a Mozilla implementation by the time it's finalized.

    Posted by Jacques Distler at

  21. Allow me to use a megaphone here:

    THERE WILL BE NO HTML 5!

    Unless anyone has a link to prove someone is working on this, it is a no-go. The W3C have abandoned HTML by reformatting it as XML and calling it XHTML.

    Posted by Chris Hester at

  22. Chris, the W3C is not the only organization in the world. Try to look further and please read It's just a NOTE, don't quote (XHTML Media Types). Thanks.

    Posted by Anne at

  23. Chris,

    There is no official HTML 5 effort going on at the W3C, as far as I know. "HTML 5" is just an informal and somewhat cheeky way of saying, "the ongoing work of the Web Hypertext Application Technology Working Group." As the group's front page states:

    Many of the members of this working group are active supporters and members of the W3C and other standardization bodies. We plan to submit our work for standardization to a standards body when it has reached an appropriate level of maturity. The current focus is rapid, open development and iteration to reach that level.

    We don't know whether the W3C will accept this standard and allow them to use some derivative of the "HTML" name. Whatever the standard's eventual name may be, Mozilla, Opera, and Safari seem likely to try to implement it. Therefore, it would behoove us all to stay apprised.

    Posted by Evan at

  24. To clarify still further, the WhatWG intends, ultimately, to submit their work for ratification by the W3C in two forms

    1. As a revision of HTML (what we have been referring to as "HTML5").
    2. As a module for XHTML. This will make their work available to authors using markup languages based on XHTML 1.1 and later.

    In no event will this stuff be available in any fashion in XHTML 1.0 documents. My prediction is that this will not deter advocates of "real-world Standards" in the slightest.

    Posted by Jacques Distler at

  25. Yes, that was a hilarious example of the foolish depths to which some self-described(!) Web Standards advocates have sunk.

    I understand that MIMEs and DOCTYPEs and whatever more are important, but Faruk does so too. He's trying to explain that The X-Philes are a bit elitist and too active on this. Believe me, he does know the importance of MIME-types et al. I just don't know where you get the part that he is a self-described Web Standards advocate.

    Posted by Rob Mientjes at

  26. Faruk, as I understand his position, is of the opinion that, in the cause of advancing "Web Standards," even badly written XHTML is preferable to (better written) HTML4.

    That is crazy. HTML4 is every bit as much a Web Standard and is, as Anne tirelessly points out, more appropriate for the vast majority of web applications today.

    Faruk argues that XHTML is easier to "sell" to clients. That is, he believes one can con them into buying into Web Standards by promising them all of the benefits that XHTML has over boring old HTML.

    Lying to clients is never a recipe for long term success. Promising them the XML-ish benefits of XHTML, without being able to reliably deliver those benefits, is a recipe for client diappointment, frustration and anger down the road.

    And it is thus that we get grotesqueries like the movies.com redesign, which was recently announced to much fanfare.

    On launch, the main page had some 600-odd validation errors. Merely changing the DOCTYPE to HTML4 cut that number down to 200 errors. The main page has since been cleaned up (a lot), but it's still true that you can cut the number of errors by 70% by simply changing the DOCTYPE to HTML4 (at current writing, 195 errors → 60 errors). The internal pages are as bad as ever, but,again, you can cut the errors by 60-70% by simply changing to an HTML4 DOCTYPE

    That's some serious XHTML-fetishism going on, when otherwise intelligent people slap an XHTML DOCTYPE on a document that, by any stretch of the imagination, ought be called HTML4, and calls the result a "victory" for Web Standards.

    Posted by Jacques Distler at

  27. Jacques, that's exactly the topic of my latest post, I've added our opinion in the nick of time. I too think that Faruk is wrong on that. Valid code and semantics are far more important than the mere usage of XHTML because it sells better.

    Posted by Rob Mientjes at

  28. Faruk argues that XHTML is easier to "sell" to clients. That is, he believes one can con them into buying into Web Standards by promising them all of the benefits that XHTML has over boring old HTML.

    What benefits? Even I, a die-hard XHTML user, am beginning to wonder about this one.

    Posted by Chris Hester at

  29. Just to reiterate, I disagree somewhat with Anne. I don't think that XHTML as text/html is inherently bad. I just think that it confers no benefits whatsoever over HTML4. Anyone who tells you it is better is almost certainly selling you snake-oil.

    And sometimes, as in the above-cited example, it is measurably worse (3 times worse, in fact) than HTML4.

    Posted by Jacques Distler at

  30. The discussion on the use of HTML and XHTML continues across various websites. Let me give you a short list of links to articles to consider: [...]

    Posted by Continuing the discussion - Greek to me at

  31. What is a "web standard?"

    Posted by Chris Hester at

  32. Ref: What is a "web standard?"

    <offtopic>I thought it was fairly common knowledge the W3C Recommendations were just that 'Recommendations'. Though I slightly-disagree - or misread - the WaSP layman explanation of Media Type; since that was just a "W3C Note" in the first instance not a normative specification.</offtopic>

    Posted by Robert Wellock at

  33. Just to put this "Recommendation" thing in perspective, the IETF Standards documents are called "RFC"s. RFC stands for "Request for Comments", which sounds even wimpier than "Recommendation." Nonetheless, the entire internet runs on the basis of those RFCs.

    Posted by Jacques Distler at

  34. Jacques, your accusations are in poor taste and not very respectable in nature. Someone in your position would do well to know better than make such insinuations about my business approaches.

    I never, never, never said that one should lie to clients. I never even suggested that one should sell XHTML because of it's "awesome benefits". I don't con anyone, I don't promote stuff that isn't true, or anything like what you insinuate.

    I don't promise XHTML-benefits to customers, I simply give them what they ask for: an accessible, modern website that uses XHTML. They ask, I deliver. I simply don't go out on a limb on my own to try and convince them not to use XHTML, because I'm of the firm opinion that such an act would be counter-productive, and so far I've only seen evidence that supports my case.

    Now, let's be nice again. I don't try to tell you how you should be a respected mathematician, so please, don't try and tell me how to do business. I'm fine with discussing the topic of XHTML with you, but not if you're going to sling mud at me; that's a waste of my time.

    Posted by Faruk Ates at

  35. I have to agree with Faruk there. Nobody is trying to 'con' anyone into doing things. If a client asks for XHTML, naturally we aren not going to tell them HTML4.01 is better. We point them at further benefits of XHTML, and go with that!

    Posted by Hayo at

  36. Just as a tiny addendum to Hayo: actually, I don't point them at further benefits of XHTML, because for the customer there are no direct benefits. The further benefits of XHTML exist for us, the webdevelopers of the modern internet, and most customers don't care to know that. Some, however, do enjoy hearing about improved forward-compatibility so then I'll admit that I enlighten them about such technical details. :)

    Posted by Faruk Ates at

  37. Forward-compatibility is an added benefit for the client; potential redesigns can be done a bit cheaper. Thanks for the addendum.

    Posted by Hayo at

  38. Ah, so now forward-compatibility is restricted to XTHML? Please…

    Posted by Anne at

  39. Not necessarily restricted perhaps, but XHTML is more forwards-compatible than HTML is, by a long shot. It's the whole reason that it's called eXtensible HTML in the first place, Anne...

    Posted by Faruk Ates at

  40. HTML browsers accept any input, correct or incorrect, and try to make something sensible of it. This error-correction makes browsers very hard to write, especially if all browsers are expected to do the same thing. It has also meant that huge numbers of HTML documents are incorrect, because since they display OK in the browser, the author isn't aware of the errors. This makes it incredibly difficult to write new web user agents since documents claiming to be HTML are often so poor.

    In the light of this quote from the W3c you could say, yes, it is restricted to XHTML (if only for the poor browser developers)

    Posted by Hayo at

  41. Yeah, let all browser makers only accept XHTML (as application/xhtml+xml). That would be the most braindead descision to make.

    Posted by Anne at

  42. I apologise. #40 was meant tongue-in-cheek. Still, i'd rather see that happening then the rubbish the web is now.

    Posted by Hayo at