Anne van Kesteren

Markup content models

5 May 2005

The first question of today/tonight is: Does your markup still validate when you remove all DIV elements? This implies that the following markup (snippet) is non conforming:

<body>
 <div>
  <img src="foo" alt="bar">

Quite often I encounter ‘such hacks’ and although they are less evil than abusing markup I don’t think it should be considered to be correct either. After all, the DIV element is a meaningless wrapper element. The markup it contains should be able to stand on its own, as if the DIV element’s start and end tag were removed. I’m currently using a single DIV element for the header and if I would remove it my markup still stands and validates.

Another point about content models is that some elements allow mixed content models. If we take the DIV element again (other examples are the LI and DD elements) you can see that its content model includes both inline level elements and block level elements. Note that these are terms of the HTML4 specification and do not necessarily imply any specific rendering. I think that because of limitations of DTDs this mixed content model is allowed and valid. Otherwise it would probably be either inline or block level as it is currently being specified in HTML5 (though not for the DIV element as that element is not specified and might never be).

That invalidates markup as:

<a href="thunderbird/"><img src="../../images/product-thunderbird.png" alt="" height="60" width="60"></a>
<h3><a href="thunderbird/" class="producttitle"><strong>Thunderbird</strong>: Reclaim Your Inbox</a></h3>

… and other things that do not really make sense. Elements that make this even more complicated are INS and DEL which are both block and inline level elements themselves and can contain both type of levels. Not to mention the fact that the XML part of HTML5 might start allowing BLOCKQUOTE inside P and even BLOCKQUOTE inside Q inside P. Watch this list.

Comments

Just in addition: According to the HTML 4 Recommendations of the INS and DEL elements, block-level content is not allowed. In XHTML it is allowed.
Posted by Markus Wulftange at 5:06AM
Guilty as charged. However, I am no convinced of my being utterly evil, or dumb for that matter. Considering that the object element can contain multiple block-level elements such as tables and paragraphs (for each bad example I can give you one very good example if there is need), I think the object is of flexible nature, as are del and ins. I see nothing inherently wrong in an object with alternate content directly embedded in a section wrapper which itself directly resides in the body element.
Besides, in XHTML body is just another wrapper, so the restriction that all its inline children must reside in a block-level element is too strict. I think of it as a remnant of the olden HTML.
Posted by Moose at 5:33AM
After all, the DIV element is a meaningless wrapper element.

I beg to disagree here: div can be a meaningless wrapper, but it's first goal (as span's goal) is to be a "generic" element, to use when the existing markup can't fit your semantic or logical needs.
As I see it, a div can be semantically meaningless, but it can also have whatever semantic meaning you give it.
Finally, I don't really get why a link on an image "doesn't make sense"...
Posted by Masklinn at 5:43AM
Markus, you are mistaken:
The INS and DEL elements must not contain block-level content when these elements behave as inline elements.

This means that if you put an INS or DEL element inside a block-level element, you are not allowed to put block-level content inside it. But put either of the two elements outside any block-level element, you can put block-level content inside. Hence, the following is legal:
```
<ins>
 
 Some text
 
</ins>
```
Posted by Arve at 7:16AM
[D]oes your markup still validate when you remove all DIV elements? Yep. ☺
Posted by ACJ at 7:27AM
I think a better statement is that you should be able to replace all DIVs with semantically correct block level elements, should such elements come into existence.
For instance, let’s pretend I'm writing a simple essay containing several paragraphs of text, but the imaginary version of HTML I'm using has no defined element for paragraphs. What should I do?
I could join the "everything in the world is a list" crowd and put each paragraph in an LI. Doing so would conform to your rule about removing DIVs and still validating. But I think it would also be pretty dumb.
In my mind, the better option is to put each of them inside a generic element (DIV, with class="p") until a better, more specific element (P, obviously) comes along.
Posted by Joel at 9:10AM
Why not make your own elements and put them into the DOCTYPE? 99.99% of all browsers correctly interpret your intent today.
Posted by Jimmy Cerra at 12:31PM
While I understand your rationale, Anne, I don't think it can be applied to every situation. (X)HTML has a very limited repertoire of semantic elements. The DIV element type can, aside from grouping page divisions, be used as a semantically neutral container when nothing else is appropriate.
I think it's better to use a DIV to wrap an image and a link, than to use a P. The DIV is neutral, but the P is plain wrong if the content is not a paragraph.
I do agree, though, that you shouldn't mix block-level and inline elements within the same parent. That applies to other elements as well, such as LI.
Posted by Tommy Olsson at 1:07PM
What about turning ins and del to generic attribute? e.g.
```
New stuffs... and old stuffs...
```
Posted by minghong at 1:35PM
About the example on top of Anne's post, are we looking at an evil hack, or a fault in the specification? I do think that this is a valid reason to use a transitional doctype over a strict one, while writing everything else as if it were strict (see what Moose wrote for what looks like the same reasoning as mine). Besides, as already mentioned by Joel, DIV is as semantically meaningful as you want.
As for myself, my markup would still validate, but it would fail to display fashionable in IE, I guess you all know why.
Posted by Frenzie at 3:28PM
Not to mention the fact that the XML part of HTML5 might start allowing BLOCKQUOTE inside P and even BLOCKQUOTE inside Q inside P. Watch this list.

Oh my god, EW! And here I was, being happy about blockquote being all perfect (to me) as it is...
Posted by Faruk Ates at 3:43PM
I don't agree that a div is semantically completely meaningless. It still groups together things, and if they semantically don't belong together than they shouldn't be in the same element. (Whatever element that is.)
But I do agree that divs should only contain block level elements, although that might be too much to ask in practice. But mixing inline and block level content should certainly never happen.
minghong: XHTML 2 defines the edit attribute. Its values are inserted, deleted, changed and moved.
Posted by Sjoerd Visscher at 3:45PM
Basically I agree, it is definitely a good thought-provoking impulse since there are many people juggling with div elements.
But, though div elements are indeed semantically meaningless (they only exist for adding structure), I’m not entirely sure if it is not a different situation anyway when they are used together with class or id attributes, which might bring in some semantics - div class="error" seems to provide structure as well as certain semantics.
Posted by Jens Meiert at 4:47PM
On the matter of div semantics: don’t forget extra weight can be added to classes and ids (and relations) via html meta profiles, and thereby to the semantic value of the elements they are applied to.
Posted by ACJ at 7:04PM
What semantics and such mean when most people discuss markup beats me, but as far as content models are concerned, any semantic or presentational concepts are irrelevant, what matters is type here (id est, an element type that may only have %inline; content still has a mixed content model). Mixed content is the SGML-derived term for an element type’s content model allowing both element content and parsed character data (#PCDATA). The upshot being that whitespace between tags is irrelevant in element content but #PCDATA in mixed content.
To illustrate the usual pitfall:
1. not so good
2. no good at all
(This is not possible in XML, by the way.)
Posted by Eric at 9:22PM
What about turning ins and del to generic attribute?

This is being done for XHTML 2.0, See the edit attributes module.
Posted by J. King at 10:50PM
I have to disagree here. Just because the div element doesn't have any semantics doesn't mean it doesn't have any structure. In my mind, it provides a kind of generic structural building block with which you split the sections of your document up. It's a container. So it's okay for inline elements to reside inside top-level divs, because the div itself provides the necessary structure.
And secondly, presentational models (block, inline) are just that -- presentational. So why we're splitting markup up into presentational boxes really confuses me: block or inline determines how something is displayed, so why is it a case for validation? This is a point wider than the focus of this article I guess, but it certainly has implications here.
Posted by David House at 6:47PM