Anne van Kesteren

XHTML 2.0: Working Draft 7

As last time, a list of the new stuff in the now seventh reincarnation of the XHTML 2.0 working draft and things I want to point out in general that may not be new. The new draft is no last call by the way. I would hereby like to thank the HTML WG for acknowledging that it is not finished yet. Anyway, here we go:

Writing this and reading the draft takes about forty-five minutes. Marking this up is like hell. XHTML 2.0 is irrelevant. The semantic web can kiss my ass.

Comments

  1. The P element can now contain the TABLE element. So as Daniel Glazman puts it: What happens when I paste a TABLE at the end of a paragraph? Is it part of the paragraph or not?

    I don't understand this. How can there be any question about it?

    Posted by David Håsäther at

  2. The P element can now contain the TABLE element. So as Daniel Glazman puts it: What happens when I paste a TABLE at the end of a paragraph? Is it part of the paragraph or not?

    I don't understand this. How can there be any question about it?

    He is assuming a word processor-style caret. With a caret like that, you can’t tell whether the caret is at the end of a paragraph but within it or between paragraphs. You need a Mathematica-style caret for that. (Vertical caret inside block and horizontal between blocks.)

    Posted by Henri Sivonen at

  3. I think this is the part of Daniel Glazman's post Anne refers to:

    Imagine you have a paragraph, with red background color. And you have an unordered list in your clipboard. You place the caret at the end of the paragraph and paste your list. Where does it end up? In the paragraph or after it? Red background or not?

    Posted by ghola at

  4. The title of a document is metadata about the document, and so a title like <title>About W3C</title> is equivalent to <meta about="" property="title">About W3C</meta>.

    Not just that, it is also equivalent to something like the following, as Steven Pemberton showed in his XTech lecture:

    <section>
       <h property="title">About W3C</h>
    

    Don’t hold me to the exact syntax, but something like that is possible, removing the duplication between the <title> and the first section name one often has.

    ~Grauw

    Posted by Laurens Holst at

  5. The semantic web can kiss my ass.

    So totally how I feel about things these days...

    Posted by Foofy at

  6. The semantic web can kiss my ass.

    Sad. Very.

    Posted by Faruk Ateş at

  7. I understand where Daniel Glazman is coming from, but from a a semantic standpoint, tables inside paragraphs are total nonsense. I don't think I've ever seen tabular data inside a paragraph in my entire life. He just wants to avoid confusion in a WYSIWYG interface, and personally, I think that can be resolved with a little intellegent design on the part of programmers.

    Posted by Matthew Raymond at

  8. Laurens, that was known before XTech. He mentioned it in the XHTML 2.0 and XForms presentation for example. I probably should have mentioned it. Now I think of it, what happens when both are specified, is preference specified somewhere?

    Posted by Anne at

  9. I understand where Daniel Glazman is coming from, but from a a semantic standpoint, tables inside paragraphs are total nonsense. I don't think I've ever seen tabular data inside a paragraph in my entire life. He just wants to avoid confusion in a WYSIWYG interface, and personally, I think that can be resolved with a little intellegent design on the part of programmers.

    Why for? Who is to say that the tables of data you see between paragraphs of text are not actually part of that paragraph? Sure, on the web you can just see in the source that they aren't, but think of magazines and such. Plenty of cases there where you find a paragraph, a table, and then more text. Why should that be 2 separate paragraphs? From a semantic point of view, that doesn't make sense at all.

    Just pick up a generic Dungeons & Dragons Roleplaying book and you'll find tons and tons of paragraphs with tables inside them, which definitely belong to the paragraph.

    Posted by Faruk Ateş at

  10. Scanned quickly, this looks like TSML 2. Did I miss something?

    Posted by Korbo at

  11. A DOCTYPE is a should. Surely someone needs to point them out that DOCTYPEs are overrated, useless and are not used at all by browsers.

    Tell them to read A Plea to Implementors and Spec Writers Working with XML.

    Marking this up is like hell.

    Have some Markdown. Or for your blog, maybe PHP-Markdown.

    I have a script globally bound to a shortcut, which pipes my current clipboard contents through Markdown and puts the result back in the clipboard. So writing HTML in forms is much like writing email. Marking everything up properly takes almost no time.

    XHTML 2.0 is irrelevant.

    Sounds like it, yes. I don’t know the reasoning behind any of these changes, but a lot of them sound pretty ridiculous. And the magic namespacing hack is the worst idea I’ve ever seen and positively hideous.

    I still think XHTML 1.0 Strict is the best choice overall. No, HTML 4.01 Strict is not better supported. That is a lie. No browser can parse all valid HTML 4.01 documents, because none of them have a fully compliant SGML parser. Several do have fully compliant XML parsers.

    Posted by Aristotle Pagaltzis at

  12. HTML 4.01 as SGML is also overrated. Markdown doesn’t come close to what I need and still requires a lot of editing. I need something that recognizes sentences, abbreviations, technical terms, et cetera.

    Posted by Anne at

  13. Ah. I forgot that I use a hack with mine: a postprocessor that transforms links with abbr URIs to <abbr> tags, ie <a href="abbr:Hypertext Markup Language">HTML</a> becomes <abbr title="Hypertext Markup Language">HTML</abbr>. And I include a textfile with large bunch of link definitions like

    HTML: abbr:Hypertext Markup Language
    XML: abbr:Extensible Markup Language
    

    to the bottom of every piece of text I pull through the Markdown meatgrinder. That way I can simply write [HTML][] and it turns into the properly marked up initialism. It’s truly a hack, but it works very well.

    Just a thought.

    As far as HTML-as-SGML language is concerned: if that is overrated (oh, I absolutely think so as well; but it’s the letter of the spec), will HTML5 be based on something else entirely? And if so, what?

    Posted by Aristotle Pagaltzis at

  14. HTML 4.01 as SGML is also overrated. Markdown doesn’t come close to what I need and still requires a lot of editing. I need something that recognizes sentences, abbreviations, technical terms, et cetera.

    What I do often (mainly for big posts) is I use the Snelsite WYSIWYG editor (Editize, which you can use by going into one of our Snelsite Demo admin panels to write out the post. That allows me to write the main content, nicely put in paragraphs, with easy link-making and everything. Then, when done writing, I switch to code view, copy-paste it all to my kurafire.net textfield and fix up things that Editize doesn't support (if I'm using any of those in the post). Such things are rel="friend met" and dfn's around the first uses of abbreviations.

    Additionally, I have my CMS setup to do some basic replaces in the output: %XHTML% becomes <abbr title="eXtensible HyperText Markup Language">XHTML</abbr>, which allows me to just use %XHTML% and the like when writing posts. Very handy.

    Perhaps this has given you some ideas on how to write and markup posts quicker. :)

    Posted by Faruk Ateş at

  15. HTML 5.0 — if named that way — will probably be based upon its own meta language. Similar to SGML, but more restricted and with some different parsing rules. It will also include graceful error handling.

    Posted by Anne at

  16. May I suggest HTML Tidy? Then users can input tagsoup and that will be converted into Appendix C compliant XHTML, which means that it will solve your reply-script problems aswell (you don't need to convert tags to lowercase, and <BR> will be converted to <br /> upon Preview).

    Posted by zcorpan at

  17. Use DocBook with Xopus.

    Posted by Sjoerd Visscher at

  18. Using the LINK element for embedding style sheets is impossible. That’s one improvement.

    And what about alternate atylesheets? I think stylesheet is document, you can't dropped rel="alternate stylesheet" and keep rel="alternate".

    Posted by V at

  19. That’s not an argument. rel="alternate" can be used for lots of things regardless of style sheets. Alternate style sheets can be taken care of using processing instructions. Besides rel="alternate stylesheet" was poor design. (The rel attribute was supposed to be a space separated list of values that applied to the resource, where here both apply to the resource simultaneously.)

    Posted by Anne at

  20. For example:

    This links is links of one level? Why in XHTML 2.0 second link will connected up by @media of css?

    Both links say: this document have a print version.

    Posted by V at

  21. <?xml-stylesheet href="foo" type="text/css" media="print" alternate="no"?>

    Is one way to do it. And your example is nice, but you have to see it a bit differently. One is actually the same document in a different format. The other is purely stylistic information for a different — or similar — medium.

    Posted by Anne at

  22. I need something that recognizes sentences, abbreviations, technical terms, et cetera.

    Make it yourself! :) My own weblog already automatically transforms regular ' quotes into ‘ and ’ by applying some fuzzy logic (although you can of course just type those yourself using ALTGR+9 and ALTGR+0, as I do nowadays, but there’s no key combination for pretty "s unfortunately), and I also plan to automatically add <abbr> tags around known abbreviations at some point in the future.

    Based on a couple of hashtables, and some fuzzy logic, that should work pretty well. Conceptually it’s pretty much the same as converting paragraphs to <p> tags and other line breaks to <br />s.

    But the best solution would just be to use an editor. I use Dreamweaver’s WYSIWYG editing as well for editing content, it is much more convenient. Although it of course doesn’t integrate nicely with my weblog system, and I would like it to have a function to automatically mark up abbreviations and such. Maybe with some work it could be done in Javascript, although it would be difficult. Perhaps using contentEditable.

    ~Grauw

    Posted by Laurens Holst at