Anne van Kesteren

XHTML 2.0: Working Draft 7

28 May 2005

As last time, a list of the new stuff in the now seventh reincarnation of the XHTML 2.0 working draft and things I want to point out in general that may not be new. The new draft is no last call by the way. I would hereby like to thank the HTML WG for acknowledging that it is not finished yet. Anyway, here we go:

As hinted before, the ROLE attribute is now part of the specification.
The SCRIPT element has been renamed to HANDLER. The reason is mainly that XHTML 2.0 uses XML Events to specify ‘event handling’.
XHTML 2.0 will actually use XFrames. I thought that specification was close to death as it is clearly ill defined. (There are some implementations though. One based on Flash.)
Although the ugliest move ever with respect to namespaces will probably go through, XHTML 2.0 does need multiple namespaces to be declared on the HTML element, because it needs to point to some Schema. Fortunately, this is only for strictly conforming documents.
A DOCTYPE is a should. Surely someone needs to point them out that DOCTYPEs are overrated, useless and are not used at all by browsers. (Well, Mozilla compares the DOCTYPE declaration to an internal array and resolves some entities if there is a match, but no validation at all.)
XHTML 2.0 might use xml:id if it’s stable enough. (As CR basically means call for implementations and there are already some implementations of xml:id I don’t think this will be a problem. Unless of course implementations that implement XHTML 2.0 don’t implement it…)
The TITLE element no longer allows elements as its children.
I believe the previous draft already had one, but it appears the VERSION attribute is back. This time with the intention to make it normative and usable. (They better leave out xhtml2 from the namespace then…)
The title of a document is metadata about the document, and so a title like <title>About W3C</title> is equivalent to <meta about="" property="title">About W3C</meta>.
Why aren’t HEAD and BODY dropped again?
The P element can now contain the TABLE element. So as Daniel Glazman puts it: What happens when I paste a TABLE at the end of a paragraph? Is it part of the paragraph or not?
There is a fancy not presentational LAYOUT attribute. It’s kind of like xml:space, but named differently. It depends on the element you apply it on what the default value is.
XHTML 2.0 still doesn’t define interaction of old headers types — they are still here — with the new H and SECTION.
The P element can only contain another P element when it’s not its direct child.
The SEPARATOR (also known as SEPERATOR) element is still here. It looks like it currently separates in two ways that would make it particularly tough to implement. Either it’s between two P elements or it is inside a LI element that is part of a couple of LI elements. See the difference?
Web 2 is coming closer. XHTML 2.0 introduces a FULL attribute to be applied on the ABBR element. The FULL attribute is actually an IDREF, but defined as URI (which can be an IRI) to make it confusing. It can be used to point to a previous or upcoming element that contains an expansion of the abbreviation. (I wonder if you could point to another ABBR element that contains the expansion denoted by a TITLE attribute or its META element equivalent.)
CITE can contain a CITE attribute. How’s that for confusion! Unfortunately there is still no way to link a CITE and a QUOTE element. It is also under defined in my opinion.
By the way, the semantics of QUOTE are dubious. Are quotation marks part of the content or not? You can’t tell, because both are allowed.
There is still no clear use case for the A element. OBJECT seems to have some improvements. Or actually, specific things that apply solely to the OBJECT element.
Continuing an OL element after a paragraph of text still involves the VALUE attribute on the LI element. No way to say that this list is a continuation of the previous. Again, an IDREF would be nice here.
A HREFMEDIA attribute. I wonder if this was invented by Opera ;-)
TARGET is back! There is no defined interaction with any of the elements though. It is solely there for XFrames, a specification I hope to be banished.
A new ENCODING attribute to define what encoding the embedded resource may have and what encoding you prefer.
Similar to that attribute SRCTYPE has also been introduced to do what you expect it will do.
The IMG element is back! Mark Pilgrim can revise his XHTML 2.0 migration issues article and Jeffrey Zeldman can cross the street again. Or maybe not, as it is totally, utterly, incompatible with the older IMG element defined in HTML 4.01 and does not add anything interesting over the SPAN element with the embedding attribute collection applied to it.

How neat, most elements can have a MEDIA attribute that make their content ignored when the MEDIA does not apply. My favorite example I used to implement using CSS can now be done in million dollar markup:

<section>
 <h>Contact</h>
 <div media="screen">
  Contact form to be inserted here.
 </div>
 <div media="print">
  <address layout="relevant">Anne van Kesteren
Crosestijn 4629 …</address>
 </div>
</section>

The link LINK element can now contain itself. And META. META can contain inline elements and text.
The REV attribute still sucks. And its added value is yet unproven in my opinion.
There is an ACCESS element. It has a KEY attribute, which basically implies the ACCESSKEY attribute is back in a different jacket here to ruin your world.
Does XHTML Ruby inherits XHTML 2.0’s namespace or not? If so, more namespace clutter to deal with.
The STYLE attribute is still here. Ugh!
Using the LINK element for embedding style sheets is impossible. That’s one improvement.
XForms inherits the namespace of XHTML 2.0, XML Events does not. The former sucks, the latter looks like a poor choice from their point of view.
They fixed some mistakes in the style sheet I pointed out. Hurray.

Writing this and reading the draft takes about forty-five minutes. Marking this up is like hell. XHTML 2.0 is irrelevant. The semantic web can kiss my ass.

Comments

The P element can now contain the TABLE element. So as Daniel Glazman puts it: What happens when I paste a TABLE at the end of a paragraph? Is it part of the paragraph or not?

I don't understand this. How can there be any question about it?
Posted by David Håsäther at 6:27PM
The P element can now contain the TABLE element. So as Daniel Glazman puts it: What happens when I paste a TABLE at the end of a paragraph? Is it part of the paragraph or not?

I don't understand this. How can there be any question about it?

He is assuming a word processor-style caret. With a caret like that, you can’t tell whether the caret is at the end of a paragraph but within it or between paragraphs. You need a Mathematica-style caret for that. (Vertical caret inside block and horizontal between blocks.)
Posted by Henri Sivonen at 6:56PM
I think this is the part of Daniel Glazman's post Anne refers to:

Imagine you have a paragraph, with red background color. And you have an unordered list in your clipboard. You place the caret at the end of the paragraph and paste your list. Where does it end up? In the paragraph or after it? Red background or not?

Posted by ghola at 7:05PM
The title of a document is metadata about the document, and so a title like <title>About W3C</title> is equivalent to <meta about="" property="title">About W3C</meta>.

Not just that, it is also equivalent to something like the following, as Steven Pemberton showed in his XTech lecture:
```
<section>
   <h property="title">About W3C</h>
```
Don’t hold me to the exact syntax, but something like that is possible, removing the duplication between the <title> and the first section name one often has.
~Grauw
Posted by Laurens Holst at 4:22AM
The semantic web can kiss my ass.

So totally how I feel about things these days...
Posted by Foofy at 6:11AM
The semantic web can kiss my ass.

Sad. Very.
Posted by Faruk Ateş at 6:53PM
I understand where Daniel Glazman is coming from, but from a a semantic standpoint, tables inside paragraphs are total nonsense. I don't think I've ever seen tabular data inside a paragraph in my entire life. He just wants to avoid confusion in a WYSIWYG interface, and personally, I think that can be resolved with a little intellegent design on the part of programmers.
Posted by Matthew Raymond at 7:31PM
Laurens, that was known before XTech. He mentioned it in the XHTML 2.0 and XForms presentation for example. I probably should have mentioned it. Now I think of it, what happens when both are specified, is preference specified somewhere?
Posted by Anne at 7:58PM
I understand where Daniel Glazman is coming from, but from a a semantic standpoint, tables inside paragraphs are total nonsense. I don't think I've ever seen tabular data inside a paragraph in my entire life. He just wants to avoid confusion in a WYSIWYG interface, and personally, I think that can be resolved with a little intellegent design on the part of programmers.

Why for? Who is to say that the tables of data you see between paragraphs of text are not actually part of that paragraph? Sure, on the web you can just see in the source that they aren't, but think of magazines and such. Plenty of cases there where you find a paragraph, a table, and then more text. Why should that be 2 separate paragraphs? From a semantic point of view, that doesn't make sense at all.
Just pick up a generic Dungeons & Dragons Roleplaying book and you'll find tons and tons of paragraphs with tables inside them, which definitely belong to the paragraph.
Posted by Faruk Ateş at 8:36PM
Scanned quickly, this looks like TSML 2. Did I miss something?
Posted by Korbo at 1:25AM
A DOCTYPE is a should. Surely someone needs to point them out that DOCTYPEs are overrated, useless and are not used at all by browsers.

Tell them to read A Plea to Implementors and Spec Writers Working with XML.

Marking this up is like hell.

Have some Markdown. Or for your blog, maybe PHP-Markdown.
I have a script globally bound to a shortcut, which pipes my current clipboard contents through Markdown and puts the result back in the clipboard. So writing HTML in forms is much like writing email. Marking everything up properly takes almost no time.

XHTML 2.0 is irrelevant.

Sounds like it, yes. I don’t know the reasoning behind any of these changes, but a lot of them sound pretty ridiculous. And the magic namespacing hack is the worst idea I’ve ever seen and positively hideous.
I still think XHTML 1.0 Strict is the best choice overall. No, HTML 4.01 Strict is not better supported. That is a lie. No browser can parse all valid HTML 4.01 documents, because none of them have a fully compliant SGML parser. Several do have fully compliant XML parsers.
Posted by Aristotle Pagaltzis at 1:48AM
HTML 4.01 as SGML is also overrated. Markdown doesn’t come close to what I need and still requires a lot of editing. I need something that recognizes sentences, abbreviations, technical terms, et cetera.
Posted by Anne at 2:21PM
Ah. I forgot that I use a hack with mine: a postprocessor that transforms links with abbr URIs to <abbr> tags, ie <a href="abbr:Hypertext Markup Language">HTML</a> becomes <abbr title="Hypertext Markup Language">HTML</abbr>. And I include a textfile with large bunch of link definitions like
```
HTML: abbr:Hypertext Markup Language
XML: abbr:Extensible Markup Language
```
to the bottom of every piece of text I pull through the Markdown meatgrinder. That way I can simply write [HTML][] and it turns into the properly marked up initialism. It’s truly a hack, but it works very well.
Just a thought.
As far as HTML-as-SGML language is concerned: if that is overrated (oh, I absolutely think so as well; but it’s the letter of the spec), will HTML5 be based on something else entirely? And if so, what?
Posted by Aristotle Pagaltzis at 3:43PM
HTML 4.01 as SGML is also overrated. Markdown doesn’t come close to what I need and still requires a lot of editing. I need something that recognizes sentences, abbreviations, technical terms, et cetera.

What I do often (mainly for big posts) is I use the Snelsite WYSIWYG editor (Editize, which you can use by going into one of our Snelsite Demo admin panels to write out the post. That allows me to write the main content, nicely put in paragraphs, with easy link-making and everything. Then, when done writing, I switch to code view, copy-paste it all to my kurafire.net textfield and fix up things that Editize doesn't support (if I'm using any of those in the post). Such things are rel="friend met" and dfn's around the first uses of abbreviations.
Additionally, I have my CMS setup to do some basic replaces in the output: %XHTML% becomes <abbr title="eXtensible HyperText Markup Language">XHTML</abbr>, which allows me to just use %XHTML% and the like when writing posts. Very handy.
Perhaps this has given you some ideas on how to write and markup posts quicker. :)
Posted by Faruk Ateş at 4:46PM
HTML 5.0 — if named that way — will probably be based upon its own meta language. Similar to SGML, but more restricted and with some different parsing rules. It will also include graceful error handling.
Posted by Anne at 5:40PM
May I suggest HTML Tidy? Then users can input tagsoup and that will be converted into Appendix C compliant XHTML, which means that it will solve your reply-script problems aswell (you don't need to convert tags to lowercase, and <BR> will be converted to <br /> upon Preview).
Posted by zcorpan at 6:57PM
Use DocBook with Xopus.
Posted by Sjoerd Visscher at 7:25PM
Using the LINK element for embedding style sheets is impossible. That’s one improvement.

And what about alternate atylesheets? I think stylesheet is document, you can't dropped rel="alternate stylesheet" and keep rel="alternate".
Posted by V at 2:15PM
That’s not an argument. rel="alternate" can be used for lots of things regardless of style sheets. Alternate style sheets can be taken care of using processing instructions. Besides rel="alternate stylesheet" was poor design. (The rel attribute was supposed to be a space separated list of values that applied to the resource, where here both apply to the resource simultaneously.)
Posted by Anne at 3:32PM
For example:
- <link rel="alternate" media="print" type="application/pdf" href="article05.pdf" />
- <link rel="stylesheet" media="print" type="text/css" href="articles.css" />
This links is links of one level? Why in XHTML 2.0 second link will connected up by @media of css?
Both links say: this document have a print version.
Posted by V at 9:25PM
```
<?xml-stylesheet href="foo" type="text/css" media="print" alternate="no"?>
```
Is one way to do it. And your example is nice, but you have to see it a bit differently. One is actually the same document in a different format. The other is purely stylistic information for a different — or similar — medium.
Posted by Anne at 9:39PM
I need something that recognizes sentences, abbreviations, technical terms, et cetera.

Make it yourself! :) My own weblog already automatically transforms regular ' quotes into ‘ and ’ by applying some fuzzy logic (although you can of course just type those yourself using ALTGR+9 and ALTGR+0, as I do nowadays, but there’s no key combination for pretty "s unfortunately), and I also plan to automatically add <abbr> tags around known abbreviations at some point in the future.
Based on a couple of hashtables, and some fuzzy logic, that should work pretty well. Conceptually it’s pretty much the same as converting paragraphs to <p> tags and other line breaks to <br />s.
But the best solution would just be to use an editor. I use Dreamweaver’s WYSIWYG editing as well for editing content, it is much more convenient. Although it of course doesn’t integrate nicely with my weblog system, and I would like it to have a function to automatically mark up abbreviations and such. Maybe with some work it could be done in Javascript, although it would be difficult. Perhaps using contentEditable.
~Grauw
Posted by Laurens Holst at 8:15PM