Anne van Kesteren

Wasting resources

19 May 2007

One of the reasons it takes browser vendors a long time to implement new specifications is that they have to invest lots of resources into reverse engineering behavior the web depends on. Standards that do not accurately reflect the real world and can not be implemented because of that (CSS2, HTML4, HTTP, et cetera). Another example would be proprietary extensions to the DOM from other browser vendors that do not have any publicly available documentation on how to interoperably implement them. innerHTML and document.all from Internet Explorer for instance, or Range.createContextualFragment from Firefox. (HTML5 defines innerHTML.) To sum up, reasons we waste a lot of time:

Unclear specifications;
Specifications that do not match the real world;
Non-existent specifications for widely used features.

This is a problem for most widely used web technologies: CSS, DOM and HTML. (Although you hear people often blame ECMAScript / JavaScript, the real problem is the DOM and the various other APIs, such as XMLHttpRequest.) This is one of the more important reasons (I think) browser vendors are pushing hard for having the next version of HTML match the real world closer. Reverse engineering costs a lot of money and time. At some point we would like to do a little less of it and be able to invest more time in new functionality.

Besides wasting resources, other arguments are data persistence and keeping the market open. A century from now we still want to read content we publish today. Whether or not we forgot to escape an ampersand when writing or dynamically generating it. Keeping the market open has to do with documenting how you would need to implement things in order to be able to compete with Internet Explorer, Firefox, Safari, Opera et cetera. The more things we leave undefined the harder it gets to keep up. These arguments do not solely apply to HTML of course, we will also have to fix CSS, the DOM and other specifications in due course.

Comments

I am starting to think we should legalise theft and murder - that way laws would match the real world.
Posted by Rimantas at 4:46AM
It’s not about legalising theft and murder, it’s about defining them.
If you leave theft and murder undefined, the kind of penalty thieves and murderers face varies a lot across judges. In fact, the problem even starts with deciding what kind of action should be called a theft or a murder.
Having a clear definition of what thefts and murders are, and also of how they should be handled by judges, leads to a more predictable world for felons and law-abiding citizens alike. More importantly though, the judges’s lives get a lot easier.
Posted by Arne Johannessen at 6:21PM
With a goal of interoperatibility, one of the most important things to document - and come to terms with - then, would be quirks mode. But are vendors willing to document this? Currently they do not document it very well at all!
Quirks-mode and its relationship to the box-model in Internet Explorer, is a well known issue. But quirks-mode and the content-model has scarce documentation. The funny thing is that the content-model of P, as defined in WHATwg's HTML5, seems to be equal to the content-model of P:quirks-mode. Except that the WHATwg currentl has made an exception from its own content-model for historical reasons. Hence <P> <TABLE>[...] is not allowed to be rendered as a P with a TABLE, but must be rendered as one P-paragraph and one TABLE-paragraph. That is in direct contradiction to how all browsers does it! (Section 8.1.2.5. in the HTML5 spec).
It seems to me that there is very litte point in saying that P has a content-model which allows TABLE, unless the specified content-model is planned to be taken into use. So, is its use only to become quirks-mode - if quirks-mode get defined? Or will we be able to choose the content-model in a more defined way? What do «vendors» think about that?
Posted by Leif Halvard Silli at 1:47AM
With a goal of interoperatibility, one of the most important things to document - and come to terms with - then, would be quirks mode.

Indeed. I'm planning to research and document quirks mode, not only for HTML5 but also for CSS and other things. I hope that HTML quirks will be specified in the HTML5 spec so that HTML handling will only have one mode. Making CSS quirks go into the CSS 2.1 spec probably wouldn't work, so I might write a separate spec that trumps CSS 2.1 for quirks mode.
Posted by zcorpan at 4:42AM
Zcorpan, did you mean «spec that trumps CSS 2.1 into quirks mode»? If so, I have two challenges for you:
1. Should it be possible to affect the content-model via CSS? Isn't that really about semantics?
2. Is there really a link between the IE content-model and quirks-mode?
Personally, I would like to govern the content-model via CSS. But the main thing is ot be able ot govern it in a logical way – as author. Quite often the page looks the same regardless of how the content-model of P looks like. If it can be governed via CSS, then you can «repair» the page when the browser clutters it for you. If you cannot govern it via CSS, then you must turn to «modes» again, I'm afraid.
To the second question: The «standard» for quirks-mode i defined by IE. And IE always follows the same content-model, regardless of Quirks-Mode or not. :-) Did you know that? So the issue with the content-model is not linked to Quirks-Mode at all. Except in Opera, Safari and Firefox, that is. (Which is why I like to use DOCTYPE-s that bring IE in standards mode and the others in Quirks mode.)
This ought to be a strong argument against those socalled «historic reasons» which HTML5 currently speaks about. The historic reasons speaks for the fact that there should be no exceptions for the content model of P.
Posted by Leif Halvard Silli at 8:00AM
Zcorpan, did you mean «spec that trumps CSS 2.1 into quirks mode»?

No, I meant a spec that trumps the rules of CSS 2.1, but only when you're in quirks mode. Such as the fact that unitless length values are px in quirks mode.
Should it be possible to affect the content-model via CSS?

When you say "content model", do you mean HTML parsing rules? Or content model as in what content is allowed to be used in a particular element for authors? Either way, you can't change it with CSS.
Posted by zcorpan at 9:50AM
Zcorpan: Our interpretation of «trump» is the same. It is of course the interpretation of CSS that must be trumphed into quirks-mode - not CSS itself.
As for «content model», the HTML5 proposal lacks a definition of it, it seems. But as Restrictions on content models is a sub-section of Writing HTML documents, it applies to both parsing and authoring. (However, the resetrictions on the content models are probably only meant to be valid fOR HTML5, not for XHTML5 documents.)
The question is whether Quirks-Mode is about CSS interpretation only? Or whether it is about content models/parsing rules also?
The prevailing view is that Quirks-Mode is about CSS. And that is great! Let's stick to that. But then P must be allowed to contain Structured inline-level elements (TABLE,OL etc), not only in XHTML5 but also in HTML5 documents. Do you agree?
The question is whether this will break many web pages out there. Since most authors test their pages at least in Internet Explorer, and since IE allows P to contain TABLE etc regardless of quirks or strict rendering mode, this would break very few pages. (Plus I could give you a whole heap of other reasons that also speak for the argument that extremely few pages would break.)
Posted by Leif Halvard Silli at 9:49PM
Content model in HTML5 is authoring requirements only, not parsing. Since authoring requirements have nothing to do with what UAs have to implement, I don't see why it is relevant here.
Quirks mode is not only about CSS currently, but if HTML5 specs all the necessary HTML quirks then what is left to spec is CSS quirks, it seems. That would also result in quirks mode indeed only being about CSS handling.
In quirks mode, <p><table> will result in a table that is child of the paragraph, and so the HTML5 parsing spec will hopefully require that (per the current spec they would be siblings). <p><ol> is a different case.
Posted by zcorpan at 7:34AM
So we share hope when it comes to <P><TABLE> in HTML5! Regarding <P><OL>, there is no theoretical difference. It is only that <P><TABLE> has better support. About content model, the proposal says:
Some normal elements also have yet more restrictions on what content they are allowed to hold, beyond the restrictions imposed by the content model and those described in this paragraph. Those restrictions are described below.

The wording here implicates that «content model» is one thing. And «author requirements» another, somewhat narrower thing. May I interpret your point to be that UAs which fully support HTML5 must be able to parse and show a document according the content model as defined for each element in section 3 about «Semantics and structure of HTML elements», regardless of the author restrictions (and so that the extra restrictions for authors sooner or later may be lifted)?
Posted by Leif Halvard Silli at 4:10AM
You're reading the wrong sections. What you are reading apply to authors, not to UAs. The "Parsing HTML documents" section is what applies to UAs, and is what is relevant here.
"Content model" is a concept that says what authors are allowed to put inside a given element. It has no implications whatsoever on UAs. There is no relationship between authoring requirements and UA requirements.
I don't really care (much) about whether or not authors will be allowed to nest tables in paragraphs or not. I care about the HTML5 parsing section reflecting what browsers are already doing, taking quirks mode into account.
The "extra restrictions" are there because of how today's browsers parse HTML. They imply </p> when they see an <ol>, and so, in HTML, it isn't possible to have an OL as child of P.
Posted by zcorpan at 4:36AM
On the contrary, there is a relationship. It is just that processors do not need to listen when the spec speaks to creators and vice versa. And therefore, when <p><table> becomes «permitted», both section 8.1 (Writing) and section 8.2 (Parsing) must be updated, while section 3 (Semantics/Structure) can stay as it is.
Posted by Leif Halvard Silli at 6:11AM