Anne van Kesteren

invalid after validated

19 September 2003

It takes some time before you get to know such things (I think); the validator is nice, but not it. Validating is not everything and can be confusing sometimes. This is addressed many times before (spelling and grammar not the same), but you can actually write invalid XHTML and still validate. This should be told to anyone. Don't trust the validator, don't write for the validator, just use your intelligence. The validator can be used to check if there are any un-escaped ampersands or if your blockquote element contains a p element or equivalent.

This morning I was trying what nonsense the validator accepts (This is valid XHTML Transisitional against the DTD, well-formed, but not valid against the specification):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><title>Test</title></head>
<body bgcolor="#000000">
 <p>
  <del>
   <center>
    <font color="#fffff" face="verdana">No the world has left us.</font>
   </center>
  </del>
 </p>
</body>
</html>

The center element (how horrible) is a block-level element and shouldn't be allowed within a paragraph. But since I used the del element, which can be either inline or block-level, this is ok according to the DTD I used. This shows us the limits of the techniques we are using now and why we need XML Schema (Primer, Structures, Datatypes), which I think is perfectly capable on handling these limits of the Document Type Definition.

Can you believe that the following is valid, while the specification tells us this:

Links and anchors defined by the A element must not be nested; an A element must not contain any other A elements.
Since the DTD defines the LINK element to be empty, LINK elements may not be nested either.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/tr/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head><title>Test</title></head>
<body>
 <p>
  <a href="">Outside
   <del>
    <a href="">Nested</a>
   </del>
  </a>
 </p>
</body>
</html>

General conclusion: "Know the specification; use the validator only for a last check".

Unfortunately, this is not the case. Lots of people find the site of the W3C difficult. Especially to find the information they need. The reason for this could be that there is a lot of information available. A tip for those who can't find anything: "Use search". That sounds simple and it really is. Also use the right keywords, if you are looking for a description of the a element, search for HTML401 a element. If you search for a CSS property, use keywords, it will make your life easier. The W3C has a lot of information, make use of it.

(If you comment, please make use of abbr/acronym, you don't need to provide a title attribute. Use always the p element, thanks.

UPDATE: I'm horrified and confused what a DTD (or even XML Schema) allows us to do:

<p>
 <a href="#this">a
  <em>
   <a href="#is">b
    <strong>
     <a href="#difficult">c</a>
    </strong>
   b</a>
  </em>
 a</a>
</p>

Comments

I guess I miss your point. The DTD is the Specification (in machine-readable form). Doing something (no matter how perverse) which is allowed by the DTD is "valid".
There are lots of limitations of the Validator. For instance, it does not check that attributes have legal values. (In other words, there are aspects of the DTD that it does not check.) But "valid against the DTD but invalid against the Specification" seems to me to be an oxymoron.
Posted by Jacques Distler at 9:05PM
I should also have pointed to the clearly-stated limitations of the XML support of the W3C Validator.
Posted by Jacques Distler at 9:08PM
I disagree,
The DTD is not the specification. 'Cause of the limits of the DTD it is not yet possible to be sure if your document is valid. Just look at my second example. The DTD says it is ok, but the specification disagrees.
IMO XML Schema is the solution to these problems. Just look at this: Modularization of XHTML in XML Schema.
Posted by Anne at 12:07AM
The validator can be used to check if ... your blockquote element contains a p element or equivalent.

Are you saying this is a bad thing? It's the way blockquotes are illustrated in the HTML 4.01 specification (http://www.w3.org/TR/html401/struct/text.html#h-9.2.2)
Posted by Adam Rice at 2:03AM
Perhaps you simply need to clarify what you are claiming. Let us accept, for the sake of discussion, that we wish to declare the construction of your second example to be "invalid". Do you claim that:
1. This is possible using XML Schema, but not using DTDs?
2. It's possible using DTDs, but the XHTML 1.0 Transitional is wrong?
3. The DTD is correct (and forbids this construction), but the W3C Validator does not apply it correctly?
Posted by Jacques Distler at 2:25AM
Adam,
It's a good thing. I was just illustrating where it can be used for.
Jacques,
I wished both my examples where invalid. And technically, they are. The p element can't contain block-level elements including p itself. Why on earth is center allowed if is separate those with del?
My point is that the DTD does not validate all things that are pointed out in the specification. This can cause confusion. Besides that, I was also surprised to find out that the font element was still supported. I like the validator, but I wanted to say that it is much more important to understand what is going on, than just validating, since that can lead to incorrect documents, as shown in my examples.
And I think that validating could be improved in the future by using XML Schema instead of a DTD.
Posted by Anne at 3:15AM
Jacques, I think Anne is trying to make point 3, since it's the only one that seems to make sense to me. But to be certain I would have to check the DTD of XHTML 1.x and I don't have the time right now. Maybe later.
Anne, I think you've made a good point, but maybe not very relevant. When the time has come that this is our greatest error, you may smack it in our face. For now I would be glad just to see every page validate in the validator. This doesn't mean that we shouldn't use our heads though, but the mass is still trying to validate in the first place, let's give 'm some time.
Besides that: it's not a mistake which will be made that often, or are there any other elements - apart from del - which cause the validator to incorrectly mark a page valid?
Posted by Bas Hamar de la Brethonière at 4:49AM
Jacques, I think Anne is trying to make point 3, since it's the only one that seems to make sense to me.

You could think that, but you'd be wrong. Here is what the DTD says.
```
<!ENTITY % inline "a | %special; | %fontstyle; | %phrase; | %inline.forms;">
<!ENTITY % Flow "(#PCDATA | %block; | form | %inline; | %misc;)*">
<!ELEMENT del %Flow;>
<!ENTITY % misc.inline "ins | del | script">
<!ENTITY % a.content
   "(#PCDATA | %special; | %fontstyle; | %phrase; | %inline.forms; | %misc.inline;)*">
<!ELEMENT a %a.content;>
```
Or, in plain English, a can contain elements of type a.content, which include elements of type misc.inline. Elements of type misc.inline include the element del. The element del can contain elements of type "Flow", which include elements of type "inline". And elements of type "inline" include ... a.
Now, Anne would probably say that this DTD does not reflect the intentions of its authors (in the same way that you might try to argue in Court that a certain law does not reflect the intentions of the legislators who enacted it). But he will get just as far with the Validator as you would with the Court (which is encharged with enforcing the Law as written, not some mythical set of "intentions" of the Legislature).
If the legislators feel they got it wrong, they can always amend the Law, and the W3C, if they feel similarly inclined, can publish an amended DTD. It wouldn't be the first time...
Posted by Jacques Distler at 7:24AM
If that explanation was too complicated, here's the executive summary: the DTD says that a can appear as a child element of del and it also says that del can appear a child element of a. Putting these together, a can appear as a grandchild of a.
If that reminds you of the classic Country Song, I'm my own Grandpa, you're not alone.
Anyway, maybe Anne can explain to us how to avoid that possibility with XML Schema, since it is pretty hard to avoid with DTDs.
If you can't come up with an XML Schema which does the job (and, even if you could, I think many people would be surprised and troubled to find an XML dialect that could not be described by a DTD), then I would call the proscription of the above construction in the verbiage issued by the W3C to be meaningless gibberish. It does not correspond to any formal description of this XML dialect.
Posted by Jacques Distler at 12:19PM
Try this version.
Posted by Jacques Distler at 12:27PM
I checked the new XML Schema for the modularization and the changes that are made compared to the current DTD. It is not addressed and the current XML Schema still allows this.
I think it is possible, but it is probably to complex to benefit from it. I also still think that the guidelines are different from the DTD, I don't think the validator is wrong BTW.
I know remember I had such a 'problem' before: PNG, object and IE.
Posted by Anne at 12:57PM
but the mass is still trying to validate in the first place, let's give 'm some time.

Absolutely right: so many web sites are still getting the very basics wrong (no doctype, no charset, unescaped ampersands etc) that they're unlikely to run into problems like you have described. Nice work, but don't sweat it.
You might want to check a page I put together called ValiDAQ - it lists the top-ten invalid web pages that I have discovered so far.
Posted by Tim at 6:33PM
I like the ValiDAQ. I have seen it before (you should check the page again since number 1 now has 1159 errors ;)).
And I know I go to far sometimes, but such things create a discussion and opens different points of views on interesting subjects. Most people who read my weblog already know the easy stuff.
Posted by Anne at 6:49PM