Anne van Kesteren

GREATER-THAN SIGN (U+003E) and XML

The more you learn about a certain technology, the more scary details you discover. Attribute-value normalization for one, but that is not the topic of today. At the entmoot Martin Reurings brought up U+003E, also known as ‘>’ or >. In the same sentence he said XML. This is an XML document:

<test>&lt;test></test>

(Note that an XML document is per definition well-formed.)

Laurens brought up that > as such could not appear inside attribute values. We all thought that was quite logical as it would probably make it easier to parse XML. Wrong. Perhaps not about the parser argument, but certainly about the XML specification. The following is another XML document:

<test test=">"/>

Read the section on character data and markup. Note that it says that > must (one from RFC 2119) be escaped in the literal sequence ]]>. Note that the statement does not apply to attribute values. Note that this is the only instance where you would need to encode >. Note that the sequence ]]> is not very common, only in this article.

More on XML and the word Bozo by Henri Sivonen. He does escape >. So do I. The Atom specification does not. XML is confusing.