Anne van Kesteren

Markup design: elements or attributes?

After having created a permathread in www-html I think I know enough to say something about metadata and content attributes, like HREFLANG, TYPE, CITE, ALT, TITLE, the upcoming HREFTYPE and other similar attributes from (X)HTML. Most of these attributes could better be elements instead. Let's explain that a bit further. HREFLANG is sometimes used on weblogs and other websites to mark links that point to a page that is written in another language. Often, you see CSS constructs being used as in:

a[hreflang]::after{ content:" [nl]"; }

Now, tell me: is that presentational or is it (semi-)important information that should be showed to the user? The latter, obviously, since you (ab)used CSS to display that content. So you are actually hiding this information in attributes (without a second language you will never be aware that it exists). This happens quite often. Perhaps a better solution would be:

<a href="http://example.org/link.nl">
 ...
 <lang>nl</lang>
</a>

Note that I haven't included '[' and ']' since those are in fact presentational. This way the language of the link is showed by default and can be hidden using CSS if you don't think that information is important. If you think even harder about this and a bit further than the "screen" media, you might think that the following makes more sense:

<a>
 <text>...</text>
 <href>http://example.org/link.nl</href>
 <lang>nl</lang>
</a>

Because in the media "print" you often want to show the link itself + the text and perhaps the language. But I'm not 100% sure about this example myself as well. As for the other attributes, ALT (from IMG) is more or less replaced by the OBJECT element, although it is still valid:

<object data="graph" type="image/png"> <!-- DATA becomes SRC in XHTML 2.0 -->
 <table>
  ...
 </table>
</object>

Note that the ALT attribute and therefore TITLE are a bit more special than the other mentioned attributes since these attributes are not about metadata, but contain parts of the document content. These attributes can be in another language, they can contain abbreviations or parts that need to be emphased. And none of that is possible since they are attributes. Fortunately, ALT is gone in XHTML 2.0, but TITLE remains to exist. TITLE is actually an attribute that has more problems than the ones mentioned above; it isn't really properly defined and authors are often confused what they should do with it. In reply to a mail of me regarding a DESCRIPTION element proposal to replace the TITLE attribute Jukka explains more flaws of it.

To continue with the CITE attribute. The CITE attribute contains a URI and points to the source of quote, nothing special. Example:

<blockquote cite="http://example.org/articles/2004#0726-firefox">
 <p>...</p>
 <p>...</p>
</blockquote>

While it is nice that we can define the location where the quotation comes from, we can't actually do anything with it since it is an attribute. Of course, we can use the power of JavaScript or generated content from CSS, but that doesn't make the information less important. We actually want a way to mark this up and have an element where can put the source in along with it's URI and this should probably be a directly nested child of BLOCKQUOTE as proposed in this mail to www-html.

The same thing applies to TYPE and HREFTYPE. Now TYPE is a bit strange since it applies to more than just the A and therefore it was probably renamed in XHTML 2.0 to HREFTYPE, while TYPE can still be used on the STYLE element for example. You could extract the information TYPE contains and make the output look like: Report 2004-07 (PDF), using CSS:

a[type=application/pdf]::after{ content:" (PDF)"; }

Again it is more useable to have an element or just write it down and forget about TYPE. However, HREFTYPE is a bit different. In XHTML 2.0 HREFTYPE and HREFLANG do have functionality (in the current draft), since they are used to modify the Accept and Accept-Language header respectively, but this has fundamental flaws, since if you want to point to a certain document in a certain type or language you should just point to the specific version and not to the general version. You should actually describe this as a lot of documents on the same subject where each document is in a different language or media. (You could point to a document and say, when I follow that link, I have to get the Dutch XHTML version of the document. Fine. But when I follow the link and I get that version and I copy the link and pass it on to someone else without copying your link (and metadata along with it), but copying the general link from the address bar he will probably get a different document and will say to me it doesn't work. Clear? There are other points to be made on URI design, saved for a future post.)

So in more general terms you can say that every piece of information that is about content, should not be in an attribute at all. This leads to the conclusion that TITLE and ALT are evil by design. When you want attribute metadata to show up in the document, you don't want attributes, but elements. Thanks for reading.

Many thanks to Jukka Korpela for going into great detail when explaining his thoughts about "this issue".

Comments

  1. Well said.

    I especially agree with you that having title as an attribute is wrong. Now you can't emphasize, set languages, etc. This is not only wrong when it comes to semantics, but also when it comes to needed functionality.

    About XHTML 2.0's implementation of hreflang and hreftype, they just make something very simple to be very complex. Not practical IMHO.

    Posted by Charl van Niekerk at

  2. Good article, Anne.

    I personally always liked the idea of allowing the title element on all elements, rather than only in the head (like it is currently). Not that the title in the head should lose its special meaning, but because I always felt the title attribute is far too limited (for reasons you mentioned for the most part). <dc:title/>, perhaps?

    On my personal website I use a lot of p class="description" that I render in small type. It works, but is not as pretty as a description element would be.

    Posted by ACJ at

  3. I think too much time is spent arguing the merits of X and Y and not enough is spent implementing real stuff. This is where I'm lost on the spec debates. Just give me a spec and I'll implement it.

    Posted by Randy Charles Morin at

  4. When you want attribute metadata to show up in the document, you don't want attributes, but elements.

    In more places you suggest to promote URLs from metadata to actually showing up in the document. But here comes up the question: what is to be displayed in a document? Pursuing your approach to the end, there would be no need for attributes at all, as everything could at some point be important to be displayed.

    You invite us, for example, to widen our horizon on media types: but how many media types would want, in your example, to have a URL showing up in a document? Screen, handheld, speech, braille and embossed certainly not! Eventually projection, possibly print. This makes me think that having the href being displayed in a document is more of an exception than the rule.

    An approach that in my view makes more sense is to include as content of an element all that information that commonly human beings deal with - that is, for instance, the content that they could be quoting to somebody else when talking about what they saw or heard; and as attributes all machine-related stuff: the href of a link, the hreftype, actually even the hreflang (it would make more sense to write dutch as content of an element - nl is not a human format, and would in any case require a nested ABBR).

    Posted by csant at

  5. (Excuse me for the lenght of this rant.)

    Of course one of the fundamental flaws in this system is a much criticized (and rightly so) design decision of XML: supporting both elements and attributes and the differences between them.

    What do we use attributes for? Some say they should be used for 'meta data', while others say they should be used for 'property-like values'. Still others just don't use attributes. Because what is the fundamental problem with XML attributes? They cannot have structure and they cannot be transparently interchanged with elements.

    The consequences of this design decision are enormous. Once you have chosen to use an attribute for some value, you cannot switch (or not 'just like that' anyway) to using an element (and vice versa). Why? Because XML is designed that way. So choose wisely or you will have to change code at a later time.

    Once you have chosen to use an attribute for some value, that value cannot be given any internal structure. On the other hand, if you had chosen to use an element, this would not be a problem. Why this distinction? It is not an uncommon course of events to want to add structure to an attribute's value at a later time. And then you're stuck.

    So we have to think carefully and not use attributes for everything we can possibly imagine to have an internal structure. Hmmm, this seems an argument against using attributes at all. One of the other 'rules', to use attributes for meta data only, is flawed too. Because what is meta data? For some, certain values would be considered meta data that others would not consider meta data. This could even change during the lifetime and use cases of a document. (And besides that, also meta data can have structure.)

    A case against heavy use of elements (vs attributes) was made in the above linked thread based on the argument that is would be significantly harder to implement tools to handle these documents. If that is really so (and I am not convinced it is), why does XML exist in this form anyway? If it would not be wise to make heavy use of elements, XML would surely be a big failure.

    Having made some of the problems with elements/attributes clear, let's look at the impact of this system.

    For example, the SVG spec uses attributes (heavily) for values with internal structure (vector paths, and inline styles which are also in XHTML1 and XSL-FO). The main argument people use is that it is easier to read and to write. But unfortunately, it is very much harder to process (with elements, all you'd need is an XML parser which you already were using anyway since you were processing XML; with structured attribute values, you'd need an additional parsing algorithm for those values).

    Another example is the HREFLANG attribute in the latest XHTML 2.0 WD (also noted here) where it may contain a comma-separated list of language codes. Anne wonders why we may not use a space-separated list, but that is not fixing the core problem here. The real problem is the guy who once decided an attribute would suffice here, because he thereby decided the value could never have meaningful internal structure (exactly the thing we try to add now).

    Yet another example of this mistake is XSL with the embedded concrete syntax of XPath. What are the remaining bennefits of XML for a programming language (which are questionable in that case anyway) when a significant part of the semantics is put away in a string representation?

    If you use XML, try to make use of its power. When you come to the point where you are storing structured data as a textual string in an element, something went wrong. Because at that point you lost the advantage of all the available XML parsers. You lost the advantage of XML's capability to define structure. If you have enough reason for this ('elements are too hard to type', 'attributes read easier', etc (sidepoint: should XML be written or read by people anyway?)): fine, but then don't use XML.

    Another problem often raised (and in the thread too) is error handling. Ian suggested the success of HTML was made possible (among others) by the "handle all input at all costs" attitude. He might be right. Anyway, with XHTML we're beginning to see some problems. Because it is an XML application, no attempts should be made to process non-wellformed documents (even if it would be trivial for the application to display most of the document). But is that really what we want (and therefore: do we really want HTML as an XML application)?

    Mark Pilgrim raised some additional issues with XML on the web. Of course some of these apply to XHTML.

    The conclusion of all this could be that XMLifying HTML was not the right thing to do. Another possible conclusion could be that it was the right thing to do, but we're not doing it the right way. Anyway it seems clear to me some things went wrong here.

    Posted by Martijn Vermaat at

  6. An approach that in my view makes more sense is to include as content of an element all that information that commonly human beings deal with [..]; and as attributes all machine-related stuff

    Yet another try to make yourself comfortable with the distinction between elements and attributes. Just like the 'meta/non-meta' approach it is flawed, for two reasons:

    1. The distincion is not at all clear and will cause a lot of discussion on border-cases.
    2. Even if the distinction was clear, XML does not allow you to base the elements/attribute decision on it, because there are other, fundamental differences between the two forms by definition (of course the afore mentioned fact that attributes cannot have structure).

    Posted by Martijn Vermaat at

  7. A lot of interesting points are raised here. I don't agree with the fact that we should have stayed with HTML at all, because there are too many advantages of XHTML (especially when you're looking into the future).

    At the end of the day it is, for me, more about functionality than semantics. In a lot of cases attributes decrease the functionality. This is seen with the title (and alt) attributes as Anne said, and I also thought of another one: the summary attribute on a table. I honestly can't believe they ever made that an attribute!

    I don't think attributes should completely be thrown out, but when it is right and when not to use an attribute is, I guess, a matter of opinion.

    For me, as long as a value doesn't need structure, and it doesn't need any other related values (like xml:lang) to accompany it, I don't see functional loss. But title, summary, and those were definitely a mistake IMHO.

    Off topic: And oh yes, after Martijn's comments, I don't feel bad about having made looong comments in the past on Anne's weblog anymore. :-)

    Posted by Charl van Niekerk at

  8. For your information, I haven't mentioned it (among other changes), but SUMMARY is an element in XHTML 2.0.

    Furthermore, I would love to hear those advantages of XHTML, since I think they are all based on the XML ideology.

    Posted by Anne at

  9. Anne you mention that we could use JS og CSS to make the attributes readable by humans, but XHTML 2.0 is XML and is meant to be read by an XML-parser.

    So along with our XHTML-document we could send an XSL-file or the user may have his own XSL to do a transformation of all those attributes to readable text.

    Posted by Martin Hintzmann Andersen at

  10. For your information, I haven't mentioned it (among other changes), but SUMMARY is an element in XHTML 2.0.

    Thank goodness! *Sigh of relief* I should rather start reading that draft before making more comments. :-)

    Furthermore, I would love to hear those advantages of XHTML, since I think they are all based on the XML ideology.

    It is much faster to parse, simpler, you can combine namespaces with MathML, SVG, RDF, etc. But you should know those better than me, Anne! ;-)

    I don't know why suddenly this retro-movement back to HTML. I don't think all of those people at the W3C are wrong for leaving HTML behind and moving into the future with XHTML. Unless I am missing something.

    Posted by Charl van Niekerk at

  11. Now, tell me: is that presentational or is it (semi-)important information that should be showed to the user? The latter, obviously, since you (ab)used CSS to display that content. So you are actually hiding this information in attributes (without a second language you will never be aware that it exists).

    This isn't a flaw of HTML or XHTML or XML at all. It's a flaw in the default handling of all browsers for those properties. I don't believe the W3C made a recommendation as to how to parse things such as language, cite, etc. So, the browser makers never told the browser to do anything with it. That, and few people seem to actually care about it, so things are unlikely to change.

    I fail to see how this is abusing style sheets, simply because the default handling is missing. In Opera, all elements are controlled by a default style sheet, which you are free to edit if your heart desires.

    Posted by chris at

  12. If you have enough reason for this ('elements are too hard to type', 'attributes read easier', etc (sidepoint: should XML be written or read by people anyway?)): fine, but then don't use XML.

    Exactly! I agree to a very large extent in very much of the things said in this discussion:

    The above list does not exactly promote attribute usage. Maybe that's because attribute usage is a bit hairy. If you think so, XML shouldn't be the right answer, but unfortunately, it is. The problem with all XML alternatives are their support, or should I rather say; non-support.

    A very good alternative to XML, that lacks any support for special-syntaxed attributes, is Enamel. It has all the advantages of XML in that:

    But there are on a couple of things Enamel differs (to the better) from XML:

    Lastly, Enamel has a couple of disadvantages over XML:

    Enough about Enamel; XML is more than good enough for years to come. The problem is that its design somewhat encourages attribute usage, even when attribute usage is stupid. XML itself doesn't afaik say anything about when to use attributes or elements, and thus will this be up to each XML dialect to decide. Such missing definitions lead to chaos. Chaos is bad.

    Posted by Asbjørn Ulsberg at