As last time, a list of the new stuff in the now seventh reincarnation of the XHTML 2.0 working draft and things I want to point out in general that may not be new. The new draft is no last call by the way. I would hereby like to thank the HTML WG for acknowledging that it is not finished yet. Anyway, here we go:
ROLE
attribute is now part of the specification.
SCRIPT
element has been renamed to HANDLER
. The reason is mainly that XHTML 2.0 uses XML Events to specify ‘event handling’.
HTML
element, because it needs to point to some Schema. Fortunately, this is only for strictly conforming documents.
DOCTYPE
is a should. Surely someone needs to point them out that DOCTYPE
s are overrated, useless and are not used at all by browsers. (Well, Mozilla compares the DOCTYPE
declaration to an internal array and resolves some entities if there is a match, but no validation at all.)
xml:id
if it’s stable enough. (As CR basically means call for implementations and there are already some implementations of xml:id
I don’t think this will be a problem. Unless of course implementations that implement XHTML 2.0 don’t implement it…)
TITLE
element no longer allows elements as its children.
VERSION
attribute is back. This time with the intention to make it normative and usable. (They better leave out xhtml2
from the namespace then…)
The title of a document is metadata about the document, and so a title like
<title>About W3C</title>
is equivalent to<meta about="" property="title">About W3C</meta>
.
HEAD
and BODY
dropped again?
P
element can now contain the TABLE
element. So as Daniel Glazman puts it: What happens when I paste a TABLE
at the end of a paragraph? Is it part of the paragraph or not?
LAYOUT
attribute. It’s kind of like xml:space
, but named differently. It depends on the element you apply it on what the default value is.
H
and SECTION
.
P
element can only contain another P
element when it’s not its direct child.
SEPARATOR
(also known as SEPERATOR
) element is still here. It looks like it currently separates in two ways that would make it particularly tough to implement. Either it’s between two P
elements or it is inside a LI
element that is part of a couple of LI
elements. See the difference?
FULL
attribute to be applied on the ABBR
element. The FULL
attribute is actually an IDREF
, but defined as URI (which can be an IRI) to make it confusing. It can be used to point to a previous or upcoming element that contains an expansion of the abbreviation. (I wonder if you could point to another ABBR
element that contains the expansion denoted by a TITLE
attribute or its META
element equivalent.)
CITE
can contain a CITE
attribute. How’s that for confusion! Unfortunately there is still no way to link a CITE
and a QUOTE
element. It is also under defined in my opinion.
QUOTE
are dubious. Are quotation marks part of the content or not? You can’t tell, because both are allowed.
A
element. OBJECT
seems to have some improvements. Or actually, specific things that apply solely to the OBJECT
element.
OL
element after a paragraph of text still involves the VALUE
attribute on the LI
element. No way to say that this list is a continuation of the previous. Again, an IDREF
would be nice here.
HREFMEDIA
attribute. I wonder if this was invented by Opera ;-)
TARGET
is back! There is no defined interaction with any of the elements though. It is solely there for XFrames, a specification I hope to be banished.
ENCODING
attribute to define what encoding the embedded resource may have and what encoding you prefer.
SRCTYPE
has also been introduced to do what you expect it will do.
IMG
element is back! Mark Pilgrim can revise his XHTML 2.0 migration issues article and Jeffrey Zeldman can cross the street again. Or maybe not, as it is totally, utterly, incompatible with the older IMG
element defined in HTML 4.01 and does not add anything interesting over the SPAN
element with the embedding attribute collection applied to it.
How neat, most elements can have a MEDIA
attribute that make their content ignored when the MEDIA
does not apply. My favorite example I used to implement using CSS can now be done in million dollar markup:
<section> <h>Contact</h> <div media="screen"> Contact form to be inserted here. </div> <div media="print"> <address layout="relevant">Anne van Kesteren Crosestijn 4629 …</address> </div> </section>
LINK
element can now contain itself. And META
. META
can contain inline elements and text.
REV
attribute still sucks. And its added value is yet unproven in my opinion.
ACCESS
element. It has a KEY
attribute, which basically implies the ACCESSKEY
attribute is back in a different jacket here to ruin your world.
STYLE
attribute is still here. Ugh!
LINK
element for embedding style sheets is impossible. That’s one improvement.
Writing this and reading the draft takes about forty-five minutes. Marking this up is like hell. XHTML 2.0 is irrelevant. The semantic web can kiss my ass.
The P element can now contain the TABLE element. So as Daniel Glazman puts it: What happens when I paste a TABLE at the end of a paragraph? Is it part of the paragraph or not?
I don't understand this. How can there be any question about it?
The P element can now contain the TABLE element. So as Daniel Glazman puts it: What happens when I paste a TABLE at the end of a paragraph? Is it part of the paragraph or not?
I don't understand this. How can there be any question about it?
He is assuming a word processor-style caret. With a caret like that, you can’t tell whether the caret is at the end of a paragraph but within it or between paragraphs. You need a Mathematica-style caret for that. (Vertical caret inside block and horizontal between blocks.)
I think this is the part of Daniel Glazman's post Anne refers to:
Imagine you have a paragraph, with red background color. And you have an unordered list in your clipboard. You place the caret at the end of the paragraph and paste your list. Where does it end up? In the paragraph or after it? Red background or not?
The title of a document is metadata about the document, and so a title like <title>About W3C</title> is equivalent to <meta about="" property="title">About W3C</meta>.
Not just that, it is also equivalent to something like the following, as Steven Pemberton showed in his XTech lecture:
<section> <h property="title">About W3C</h>
Don’t hold me to the exact syntax, but something like that is possible, removing the duplication between the <title>
and the first section name one often has.
~Grauw
The semantic web can kiss my ass.
So totally how I feel about things these days...
The semantic web can kiss my ass.
Sad. Very.
I understand where Daniel Glazman is coming from, but from a a semantic standpoint, tables inside paragraphs are total nonsense. I don't think I've ever seen tabular data inside a paragraph in my entire life. He just wants to avoid confusion in a WYSIWYG interface, and personally, I think that can be resolved with a little intellegent design on the part of programmers.
Laurens, that was known before XTech. He mentioned it in the XHTML 2.0 and XForms presentation for example. I probably should have mentioned it. Now I think of it, what happens when both are specified, is preference specified somewhere?
I understand where Daniel Glazman is coming from, but from a a semantic standpoint, tables inside paragraphs are total nonsense. I don't think I've ever seen tabular data inside a paragraph in my entire life. He just wants to avoid confusion in a WYSIWYG interface, and personally, I think that can be resolved with a little intellegent design on the part of programmers.
Why for? Who is to say that the tables of data you see between paragraphs of text are not actually part of that paragraph? Sure, on the web you can just see in the source that they aren't, but think of magazines and such. Plenty of cases there where you find a paragraph, a table, and then more text. Why should that be 2 separate paragraphs? From a semantic point of view, that doesn't make sense at all.
Just pick up a generic Dungeons & Dragons Roleplaying book and you'll find tons and tons of paragraphs with tables inside them, which definitely belong to the paragraph.
Scanned quickly, this looks like TSML 2. Did I miss something?
A
DOCTYPE
is a should. Surely someone needs to point them out thatDOCTYPE
s are overrated, useless and are not used at all by browsers.
Tell them to read A Plea to Implementors and Spec Writers Working with XML.
Marking this up is like hell.
Have some Markdown. Or for your blog, maybe PHP-Markdown.
I have a script globally bound to a shortcut, which pipes my current clipboard contents through Markdown and puts the result back in the clipboard. So writing HTML in forms is much like writing email. Marking everything up properly takes almost no time.
XHTML 2.0 is irrelevant.
Sounds like it, yes. I don’t know the reasoning behind any of these changes, but a lot of them sound pretty ridiculous. And the magic namespacing hack is the worst idea I’ve ever seen and positively hideous.
I still think XHTML 1.0 Strict is the best choice overall. No, HTML 4.01 Strict is not better supported. That is a lie. No browser can parse all valid HTML 4.01 documents, because none of them have a fully compliant SGML parser. Several do have fully compliant XML parsers.
HTML 4.01 as SGML is also overrated. Markdown doesn’t come close to what I need and still requires a lot of editing. I need something that recognizes sentences, abbreviations, technical terms, et cetera.
Ah. I forgot that I use a hack with mine: a postprocessor that transforms links with abbr
URIs to <abbr>
tags, ie <a href="abbr:Hypertext Markup Language">HTML</a>
becomes <abbr title="Hypertext Markup Language">HTML</abbr>
. And I include a textfile with large bunch of link definitions like
HTML: abbr:Hypertext Markup Language XML: abbr:Extensible Markup Language
to the bottom of every piece of text I pull through the Markdown meatgrinder. That way I can simply write [HTML][]
and it turns into the properly marked up initialism. It’s truly a hack, but it works very well.
Just a thought.
As far as HTML-as-SGML language is concerned: if that is overrated (oh, I absolutely think so as well; but it’s the letter of the spec), will HTML5 be based on something else entirely? And if so, what?
HTML 4.01 as SGML is also overrated. Markdown doesn’t come close to what I need and still requires a lot of editing. I need something that recognizes sentences, abbreviations, technical terms, et cetera.
What I do often (mainly for big posts) is I use the Snelsite WYSIWYG editor (Editize, which you can use by going into one of our Snelsite Demo admin panels to write out the post. That allows me to write the main content, nicely put in paragraphs, with easy link-making and everything. Then, when done writing, I switch to code view, copy-paste it all to my kurafire.net textfield and fix up things that Editize doesn't support (if I'm using any of those in the post). Such things are rel="friend met"
and dfn
's around the first uses of abbreviations.
Additionally, I have my CMS setup to do some basic replaces in the output: %XHTML% becomes <abbr title="eXtensible HyperText Markup Language">XHTML</abbr>
, which allows me to just use %XHTML% and the like when writing posts. Very handy.
Perhaps this has given you some ideas on how to write and markup posts quicker. :)
HTML 5.0 — if named that way — will probably be based upon its own meta language. Similar to SGML, but more restricted and with some different parsing rules. It will also include graceful error handling.
May I suggest HTML Tidy? Then users can input tagsoup and that will be converted into Appendix C compliant XHTML, which means that it will solve your reply-script problems aswell (you don't need to convert tags to lowercase, and <BR>
will be converted to <br />
upon Preview).
Using the LINK element for embedding style sheets is impossible. That’s one improvement.
And what about alternate atylesheets? I think stylesheet is document, you can't dropped rel="alternate stylesheet" and keep rel="alternate".
That’s not an argument. rel="alternate"
can be used for lots of things regardless of style sheets. Alternate style sheets can be taken care of using processing instructions. Besides rel="alternate stylesheet"
was poor design. (The rel
attribute was supposed to be a space separated list of values that applied to the resource, where here both apply to the resource simultaneously.)
For example:
This links is links of one level? Why in XHTML 2.0 second link will connected up by @media of css?
Both links say: this document have a print version.
<?xml-stylesheet href="foo" type="text/css" media="print" alternate="no"?>
Is one way to do it. And your example is nice, but you have to see it a bit differently. One is actually the same document in a different format. The other is purely stylistic information for a different — or similar — medium.
I need something that recognizes sentences, abbreviations, technical terms, et cetera.
Make it yourself! :) My own weblog already automatically transforms regular ' quotes into ‘ and ’ by applying some fuzzy logic (although you can of course just type those yourself using ALTGR+9 and ALTGR+0, as I do nowadays, but there’s no key combination for pretty "s unfortunately), and I also plan to automatically add <abbr>
tags around known abbreviations at some point in the future.
Based on a couple of hashtables, and some fuzzy logic, that should work pretty well. Conceptually it’s pretty much the same as converting paragraphs to <p>
tags and other line breaks to <br />
s.
But the best solution would just be to use an editor. I use Dreamweaver’s WYSIWYG editing as well for editing content, it is much more convenient. Although it of course doesn’t integrate nicely with my weblog system, and I would like it to have a function to automatically mark up abbreviations and such. Maybe with some work it could be done in Javascript, although it would be difficult. Perhaps using contentEditable.
~Grauw