I hope that Jacques Distler doesn't mind if I quote his comment here. I could have typed something myself, but his summary is pretty clear:
Tags vs elements:
<p>Some text</p>is an element.
In HTML some elements (in some circumstances) may omit their closing tags.
In X(HT)ML, all elements must have closing tags, though empty elements can have abbreviated "short" tags:
<p></p>is equivalent to
<p />. Note that this abbreviation, is strictly speaking not legal HTML. In HTML, the closing tag is often optional (and hence can just be omitted), but it can't be abbreviated as above.
For example, you may not omit the
HEAD) element in HTML 4.01, but you may omit both start and end tags.
For example, you may not omit the HTML (and BODY and HEAD) element in HTML 4.01, but you may omit both start and end tags.
I don't really get this part. Omiting both HTML and BODY start and end tags is like practically impossible (if you want a good structured document).
See one of my @import tests for an example of that.
HTML allows you to omit the (start or) end tags when it is obvious where the element (begins or) ends.
For instance, consider
<p>Some text <p>Some more text <ul> <li>A list item <li>Another list item </ul>
This is precisely equivalent to
<p>Some text</p> <p>Some more text</p> <ul> <li>A list item</li> <li>Another list item</li> </ul>
We can omit the closing
</p> tags because paragraphs cannot contain other block-level elements. Hence we know that the paragraph element ends as soon as we encounter the start tag of another block-level element.
Similarly, we can omit the closing
</li> tag above because we know that the list item element comes to an end whenever we encounter another
<li> tag or the closing
These rules are perfectly unambiguous. They may be harder to learn that the rules of XML, but they are every bit as well-defined. They are also purportedly harder to parse, but that is silliness. You parse XML with an off-the-shelf XML parser, and you parse HTML with an off-the-shelf SGML parser. Both run very fast, and sport similar APIs.
By the way, this page is currently invalid because of a stray
Windows-1252 character in what is purportedly an
ISO-8859-1 document (per your
Content-Type HTTP header).
I've fixed this problem in MovableType. I would have thought WordPress would have an out-of-the-box solution, as XHTML-validity is one of its selling points.
Fixed that, thanks!
Oh, and by the way, WordPress seems to swallow carriage returns. The above snippets of HTML had carriage returns in the
<pre> element. These previewed OK, but disappeared from the posted version of the comment.
I presume that's a "bug", not a "feature."
Ok, well after reading the specification I understand it a lot better: Elements (the link might be usefull).
This is something I've tried to explain to folks without much success... The notion of an element being present in a document without being present in the source code is not something people generally understand - to the level that W3C actually changed the structure of tables to be less strict in [XHTML1.0] than it is in [HTML 4.01] with regards to the
As for the element termination issue, I did a comparison between HTML and XHTML parsing to illustrate the problem of structure trees when you are using autotermination of non-empty tags (
p) in tagsoup parsers compared to XML parsers (wellformed XML, invalid HTML and XHTML, radically different structure trees).
As for the difference between tags and elements I must admit I still sometimes use the word "tag" when I mean "element" just from habit. When you've been using HTML since 1995 it's hard to relearn terminology that has been used incorrectly in everyday use.
HTML is also an element. Elements are both start tag, end tag, contents and attributes.
Bah, it should be possible to edit comments! That should read HTML (which is an element, with both start tag, end tag, contents and attributes)!