Anne van Kesteren

The feedback on HTML

24 November 2006

For anyone who’s interested in the rechartering of the HTML Working Group I suggest you read the thread on www-archive. It doesn’t contain all the details which go mostly to W3C Member only lists unfortunately, but it gives some insights.

Meanwhile the WHATWG started a blog as a result of the request for feedback on HTML. There also is a #whatwg channel on the Freenode IRC server and a WHATWG Wiki if you need to draft some proposal.

What struck me with the feedback was the people who want to abandon this whole HTML thing. I wonder why. Figuring out HTML is important if we want to save the data the world is creating right now. I suppose the main reason people are using XHTML at the moment is because they’ve been misinformed. There are other suggested reasons though for using XHTML while sending it as text/html:

The validator is stricter;
It’s forward compatible;
It’s compatible with small devices;
It’s compatible with the “XML tool chain.”

I hope we can all agree that sending XHTML as application/xhtml+xml is silly. At least for now. It’s not supported by Internet Explorer, Google, older user agents and creates issues for mobile user agents. Then some people argued that the W3C should solely work on XHTML 2.0 which seems even weirder. I mean, even less user agents (zero for all practical purposes) support that. It’s also incompatible with XHTML 1.0 yet uses the same namespace (since short while) and is way less detailed and implementable than, say, (X)HTML5.

Comments

There are other suggested reasons though for using XHTML while sending it as text/html:

... I hope we can all agree that sending XHTML as application/xhtml+xml is silly.

Hmmm....? And here I thought sending XHTML as text/html was silly.

Posted by Jacques Distler at 5:22AM
Well I always liked sending XHTML as text/html but stayed out of the argument. XML is easier to handle with other tools. Are we now officially allowed to send XHTML as text/html? ;-)
Posted by Dean Edwards at 6:04AM
The validator is stricter;

Such an illusion. The Validator doesn’t even check documents for well-formedness.
Posted by Henri Sivonen at 6:33AM
For the record, sending XHTML as text/html is silly too, indeed. I’m just listing some things on how people try to justify doing that. For some reason people get it in their heads that it has the same “meaning” regardless of the media type. It’s probably not actively harmful if you realize that, although you are setting a bad example. Fortunately the next version of (X)HTML will clear this up.
Posted by Anne van Kesteren at 7:49AM
I call flamebait.
Posted by Sam ruby at 7:51AM
sending XHTML as application/xhtml+xml is silly

sending XHTML as text/html is silly

So we can say that sending XHTML in any way is silly? :)
Posted by Jozef Benko at 3:51PM
It may be flamebait, but I learned something useful from Henri's comment.

Posted by Jacques Distler at 4:52PM
Hmm XHTML is XML so I reckon you could (should?) send it as text/xml. But then again I might be totally wrong.
Posted by Yorian van Leeuwen at 8:49PM
I believe for now, the only good thing to do is have pages on the latest HTML doctype.
XHTML is useless, for the moment.
Posted by Yahia at 9:03PM
Here you can see XHTML which works in IE (handled with XML parser). BTW: that page is not indexed by Google.
Posted by Jozef Benko at 10:25PM
BTW: that page is not indexed by Google.

It is, actually. Google even offers a [severely broken] HTML translation.
Posted by ACJ at 11:13PM
@Anne: Your post is interesting because there is into it the fact that people may have different opinions going in opposite directions. You prefer to say that they are "misinformed", or that is "silly", it usually doesn't help the discussion (talking from someone who has to handle this all the time in the battle fields). But you forget that these people use these technologies every day. The problem when we design technologies is to cope with all arguments and try to find a solution which answer different needs. If one answer is not possible, there is then a need for different solutions depending on what you want to achieve.
Just think positively, by thinking that it is important to indeed have an HTML parsing model for old documents on the Web that will help to fix the tag soup to HTML valid document or XHTML document and some people need and want to move to XHTML as XML for their own purposes. Embracing diversity is always better for an ecosystem. If we go too much in one hardline one way or the other we will fail, and loose a lot of useful energy.
@henri, @jacques: The source code is available and there is Bug list to be fixed. You are welcome to join Henri. I think it will help a lot more to have good will for the development and to move to a more modular and flexible environment: Unicorn. Fixes are done. step by step. Little by little. But like some people prefer to shoot than helping to fix the code, it is indeed going slow. But you can see in the validator todo list. Welcome.
Posted by Karl Dubost at 8:28AM
Karl: the point with XHTML(1.x) is that it has been mis/abused from day 1 because people see it as the next version of HTML which it just isn't. And now with so many sites claiming to be XHTML but aren't we are stuck with a broken technology and browsers unable to ever implementing the full HTML specification.
That said I would like to add that draconical error-handling for a mere markup-language is still a bad idea in my eyes
Posted by Tino Zijdel at 8:58AM
And here I thought I was using XHTML because of all the wonderful XML tools that enables me to use.
I think some moderate course of action is required here. Like it or not, XHTML is here to stay, it's become too engrained in the teaching process.
All of my books include it, yes even incorrectly in one of them, due to my own intial misunderstanding of the spec, but even that boils down to UAs implementing a better error reporting process. The media type is part of the problem, but in my mind, the YSOD is the biggest show stopper for proper media types. Setting aside IE and all the others that don't even support the media type.
Personally, I couldn't care less about media types. I'd rather not have the possibility of a YSOD, although at the same time I would like to see and find well-formdness errors. In that light I think XHTML should take some inspiration from server-side languages. In PHP you can set the level of error output, or whether there is any at all. You can change how ridgid it is with several levels. Why can't the same be done for client-side markup? Firefox and others have error consoles, that's the place for well-formdness errors. When that fails, IMO, the content should just be treated like tag soup.
Client side XHTML should have that kind of error handling. Oh, I've got entities in my XML document that aren't defined in a DOCTYPE. Rather than throwing a well-formdness error, why can't I set an HTTP heading that says ignore the entity errors and treat that bit like HTML, but still look for other well-formdness errors. Or another error level that says, just ignore well-formdness errors and treat the markup like tag soup, but log the error to the error console. The problem with all this is the people writing these specifications are too pedantic!
At far as XHTML as HTML, in the end, it still works. It may be churning through as tag soup, but I don't care. It still more ridgid, flexible and transformable than plain old HTML, at least as far as my applications and CMS tools are concerned. I will never revert to HTML 4 or even HTML 5.
Maybe XHTML 5 is the compromise we need, I haven't read anything about it though, but, for better or worse XHTML is here and I think accepting that is the first step to reconciling the problems with it.
I think the biggest failure of the standardization process is that the people who define standards do not recognize that the overwhelming majority of developers out there are not pedantic, will never ridgidly follow any specification, and when presented with something that works will more than likely simply go with it without ever giving it another moment's thought, be it technically "the right way" or "the wrong way". I am not one of those people, but I deal with those people every day, and unfortunately, it's those people that are ultimately going to be using these standards the most. We are the minority.
Standards, IMO, should be written from the standpoint that everything will go wrong and the lot of it will be ignored. As far as XHTML goes, I think some flexible error handling combined with flexible error reporting is the way to go. It's designed to be ridgid, and it can be ridgid, but we also have an existing fallback mechanism that already works beautifully, HTML and all.
Posted by Richard York at 1:13PM
@henri, @jacques: The source code is available and there is Bug list to be fixed. You are welcome to join Henri. I think it will help a lot more to have good will for the development and to move to a more modular and flexible environment: Unicorn. Fixes are done. step by step. Little by little. But like some people prefer to shoot than helping to fix the code, it is indeed going slow. But you can see in the validator todo list. Welcome.

Sometimes, even with Open Source, it is easier to move things forward by creating a new piece of software with a new foundation than to incrementally patch an existing piece of software. (Before someone makes the obvious comparison to incrementally improving markup languages: Markup languages have many more points of deployment than software running online validation services.)
Also, parsing XML using an XML parser is not a peripheral issue suitable to be left to contributors from outside the core team. It is a pretty fundamental thing, and the fact that the core team hasn’t fixed it in 6 or 8 years (depending on when you start counting) tells something.
Admitting bugs is the first step towards fixing them. I have actually tried to approach this issue through the bug database by first trying to get the documentation fixed. I even provided a suggested fix. No success, yet.
Full disclosure: I suspect my validation service may have an ill-formedness detection bug in the area of what characters exactly are allowed in XML element and attribute names. There’s a call to java.lang.Character.isLetter(), which smells like a Unicode versioning bug. I need to review the code at some point.
Posted by Henri Sivonen at 7:22AM
Henri: It is always easier to work alone. It is not that much creating a new piece of software that you have achieved. It is that you are working in a small community with not many users (yet) and with no conflicting needs (yet). But as always, I will prefer the solution of working together, which means compromises. Btw in the link you have given, you tried to fix an introductory message…, so much for fixing real code bugs.
Posted by Karl Dubost at 11:25AM
Henri: It is always easier to work alone. It is not that much creating a new piece of software that you have achieved. It is that you are working in a small community with not many users (yet) and with no conflicting needs (yet). But as always, I will prefer the solution of working together, which means compromises.

Working together is valuable, but pulling a single piece of software in conflicting directions is unlikely to work out.
First, I was specifically not doing DTD validation and specifically not using an SGML parser. The W3C Validator does SGML DTD validation. Later, my focus shifted to supporting a spec whose development W3C members had specifically voted down. Moreover, I intended to add non-schema checks for conformance requirements that weren’t expressible in a schema, but the line of the W3C validator team through the years has been that the W3C Validator is a pure validator and doing checks beyond the validation formalism is not OK.
Considering that the is no code-wise the back end commonality at all in what I am seeking to accomplish and what the W3C Validator already does, I believe that my decision to create a new front end in the programming language of the back end infrastructure that I am building on was technically the right choice.

Btw in the link you have given, you tried to fix an introductory message…, so much for fixing real code bugs.

The bug report illustrates the attitude (both mine and that of the W3C validator team) and shows that even fixing something simpler than code hasn’t worked out.
Posted by Henri Sivonen at 5:26PM
"The bug report illustrates the attitude". Indeed. I could not say better.
Posted by Karl Dubost at 6:36AM
I suppose the main reason people are using XHTML at the moment is because they’ve been misinformed.

And this is said be someone who appears to think that the web is a giant DOM? Your claims about the web, HTML5 and browsers puzzled more than one web developer just a few months ago in MathML list. Remember?
You always claims to work for Opera. Well, Opera page (http://www.opera.com/) is using XHTML strict doctype, XML declaration, and xhtml namespace. Are you suggesting that Opera people is misinformed?
If your reply is yes. How can misinformed people try to guide the next web at WhatWG?
If reply is not, what is then your point?
What struck me with the feedback was the people who want to abandon this whole HTML thing. I wonder why.

Because HTML achieved its limits?
It is interesting that most of people is very critics of the w3C plans for a new HTML and the whole WhatWG "Web applications"
Just take a view on many I don’t think HTML5 is a good idea. on different joined blogs: Berea, The Web Standards Project, Lachy’s Log, etc. and other sites including the W3C QA (where many people is claiming Oh no... This is not helpful.
Why add yet another version to a spec that we know should have died long ago? This will not help. You should go the opposite direction.
I don't think that this is a good idea,
I Thin it's not a good idea to continue the development of HTML
Yes, some people likes HTML5 but how many? Can you measure? My perception is less than a 30% of people appear to be really interested in HTML5 over XML.
Posted by Juan R. at 7:38PM
Juan, I’m not working on the web team that does the Opera home page. I suppose I could bring it up, but I rather focus on the larger issue here.
Regarding negative feedback. There was lots of positive feedback as well. I think I already explained why it’s a good idea regardless of what becomes the next authoring format. And don’t forget that HTML5 has an XML serialization as well. I suppose that wasn’t really clear from the blog posts.
Posted by Anne van Kesteren at 1:07AM