Anne van Kesteren

Primitive markup

23 October 2004

After reading a post on abbreviations I wonder why people won't accept or don't see that HTML is primitive and it is only meant as a general document language. Things you can't do semantic:

Make a piece of text less important than the surrounding text.
Footnotes.
~~Resources of larger quotations. (The ones you actually want to have inside the BLOCKQUOTE element instead of in the CITE attribute of that element.)~~ It appears fantasai has found a solution for this. Putting ADDRESS as "last-child" of the BLOCKQUOTE element seems to be the most correct way to do this.
Let a search engine know that you are talking about mathematic formulas or scientific notations, like H₂0. (An extension for mathematic formulas has been created by the W3C, called MathML, though that is an XML language and can only be used together with other XML (based) languages, like XHTML.
Et cetera.

The point is that HTML is not very descriptive. You know it's a header by the H3 element, but you don't know if it's meant as a header of a particular section or as a general header. You don't know to what section the header applies et cetera. (You can't use XPath to select the header that belongs to a certain paragraph since there is no connection between the two.) The second point is that this is one of the very nice features of HTML. It's not complex and could be easy to learn if people knew a bit about semantics, which most people don't.

This was written a while ago, but I never took the time to finish it. Besides, I forgot about it since I have about 20 text files on my desktop.

Comments

You should never have rubbish lying around on your desktop. That's what «My Documents» and folders elsewhere is for. The desktop should be clean, even though it often is the default chosen place to put all downloaded content. Well, that's my opinion, anyhow.
When it comes to the actual content of your entry, you are of course correct. It's going to be interesting to se how XHTML 2.0 gets adopted, both in browsers and by authors. I guess it will take a long time before we can use it on commercial websites, due to Internet Explorer and Microsoft's specification-fright.
Posted by Asbjørn Ulsberg at 12:09AM
You know it's a header by the H3 element, but you don't know if it's meant as a header of a particular section or as a general header.

That's where div comes in, and that's what makes that div (and span) can have semantics (especially when used in combination with a class or ID that is described in an external HTML meta profile).
XHTML2 has a pretty neat compromise in the form of the section block element (which is still very generic, but slightly more descriptive than div).
My 2 euro cents.
Posted by ACJ at 12:24AM
ACJ, that's exactly what I think too, but a division doesn't have a semantic purpose (as you put it). It has a structural purpose.
I always use these examples to clarify this problem:
```
<h1>kop1</h1>
<p>
	deze paragraaf hoort bij kop1
</p>
<h2>kop2</h2>
<p>
	deze paragraaf hoort bij (kop1 en) kop2
</p>
<p>
	deze paragraaf hoort bij (kop1 en) kop2
</p>
```
```
<h1>kop1</h1>
<p>
	deze paragraaf hoort bij kop1
</p>
<div>
	<h2>kop2</h2>
	<p>
		deze paragraaf hoort bij (kop1 en) kop2
	</p>
</div>
<p>
	deze paragraaf hoort bij kop1
</p>
```
But I don't understand why "section" is more descriptive than "div". I don't think this is important anyway. It's more important that browsers know how to parse these elements.
Posted by Jerome at 1:44AM

Jerome, XHTML2 seems to use a total different structure, at least what I've seen from it.

 <section>  <heading>Heading 1</heading>   <p>Nonsense bla</p>  <section>   <heading>Heading 2</heading>    <p>Blabla yadda yadda</p>  </section> </section>

Which roughly translates to XHTML1 as the following:

 <h1>Heading 1</h1>  <p>Nonsense bla</p> <h2>Heading 2</h2>  <p>Blabla yadda yadda</p>

So, well, it gets more bloated.

Posted by Rob Mientjes at 1:58AM

Jerome,

The section element, in conjunction with the h element, offers a mechanism for structuring documents into sections.

…is slightly more specific and descriptive than…

The div element, in conjunction with the id and class attributes, offers a generic mechanism for adding extra structure to documents.

…and actually adds weight in the sense of meaning (read; semantics) of the element. (I realize I'm being a bit vague in my explanation; I hope it's sufficient.)
The examples that you provide, by the way, illustrate exactly what I mean, and how I like to structure documents myself. (Though, ironically, I dropped that very mechanism on my personal site recently; which is now more like Anne's.)
Rob, from the current XHTML 2 working draft:

There are two styles of headings in XHTML [2]: the numbered versions h1, h2 etc., and the structured version h, which is used in combination with the section element.

Anne, I think it sucks I can't use ins to mark up changes I make to quotes properly, and hr to break up my comment properly. ;p
Posted by ACJ at 7:54AM
i've been saying for ages on various fora and mailing lists that (x)html is an extremely generalised, simple markup that will never manage to adequately describe real world content...but i do find it endearing when i see people arguing the hell out of the "most correctest" way of wrapping some obscure content in this or that element.
in the majority of cases, it comes down to "triage", choosing something that fits well enough, being consistent in that choice, and not worrying too much about it, in my opinion
Posted by patrick h. lauke at 10:29AM
ACJ, I knew it was something with an h :P
Sorry, but I didn't know the normal hn was still possible.
Posted by Rob Mientjes at 6:00PM
I think it all comes down to need. Do we really need all of this additional semantic markup?
Why not just leave off abbr elements? Because it's cool to be able to provide expansions on largely unknown abbreviations so that more people can understand and learn them.
And the class attribute on the abbr element is completely necessary since that will make sure compliant aural user agents will pronounce the abbreviation correctly.
In many other cases though, I think people are going overboard. We should not loose sight of practicality when searching for the "ultimate" markup. I think semantic markup has a lot to do with accessibility. Maybe we should rather work towards "perfect accessibility" instead of "perfect markup", since the latter is practically impossible anyway.
And if you need to provide additional semantics, why not just combine namespaces and switch to XHTML? It's a poor excuse IMHO to say that IE doesn't support it.
I can understand that this is impossible when it comes to commercial sites, but for personal sites, why not switch to XHTML and display an incompatibility error to IE users? How do people ever think we will migrate the web when we don't give IE users reason to switch?
I'm not even using XHTML on my weblog, and I'm already displaying a warning to IE users, just because of IE having poor CSS support!
Posted by Charl van Niekerk at 1:38PM