Anne van Kesteren

HTML5: custom data

4 April 2008

Sometimes HTML is not enough and you need a mechanism to include some custom data into the document. In 2005 Validating a Custom DTD was published on A List Apart and illustrated how you could add custom attributes to HTML and have them validate. The problem with the approach outlined in that article is that DTDs are a thing from the past and that only the W3C validator cares about them. (Newer validators, such as Validator.nu, don’t have this issue.) Browsers ignore DTDs and the only reason they still look at the doctype of a page is to determine the rendering mode. So if you add a custom DTD and add a required attribute to the browser that will look as if you added a required attribute to HTML. Now if a future version of HTML introduces an attribute with the same name, but with different semantics, your page might behave slightly weird in future browsers.

There is a custom data proposal for HTML5 that allows authors to add custom data to their pages without interfering with future extensions to HTML. The idea is that all attributes starting with data- are reserved for Web authors and they can do whatever they like with them. (They are not intended for browser extensions, et cetera.) In addition there will be a DOM attribute dataset that will allow easier access to these attributes. For an attribute data-opacity you can access that using dataset.opacity instead of having to use getAttribute("data-opacity") and setAttribute("data-opacity", x).

Namespaces were considered, but integrating them in the existing HTML environment is harder and they would also make it harder to author.

Now in the specification: embedding custom non-visible data.

Comments

DTDs are a thing from the past and that only the W3C validator cares about them

I'm not sure I fully agree with this statement. I still see DTDs frequently used in XML documents; they're primary use lies in the ability to define named character entities. The original HTML4 specification defined its named character entities using DTD catalogs. If what you say is true for HTML5, this means that named character entities are become a "part" of the language itself, and they aren't coming from DTDs anymore. Is this the case?
Posted by Edward Z. Yang at 2:19AM
Yes. (It has been true for HTML as a language practiced on the Web for quite a while now so making that more formal makes sense.)
Posted by Anne van Kesteren at 2:38AM
Wouldn’t it be better to just ask the user to prefix their custom attributes, like CSS does?
I did that recently in a control I created, I needed to store a databound ID on various parts of the HTML inside, which I did using a btl_data-id attribute (btl being the prefix here). Seems better than only allowing the data- prefix, which seems unnecessarily restrictive and have a greater potential of conflicts when running several controls from different frameworks/authors alongsite.
Or, maybe adding accessor methods for RDFa to the HTML DOM would provide the desired functionality (just a wild thought that I didn’t think very carefully about!).
~Grauw
Posted by Laurens Holst at 6:50AM
Laurens, I think you'd right that as data-btl-id if you wanted to follow the data-foo proposal.
Posted by Ben 'Cerbera' Millard at 1:45PM
Fantastic idea.
Posted by Andrew Dupont at 1:50PM
Laurens, that would also force usage of the underscore. On top of that it becomes more difficult for authors to figure out what they have to do. As for RDF, that would be an order of magnitude more complicated and go far beyond what is actually needed as solution here.
Posted by Anne van Kesteren at 2:40PM
That sounds like an excellent idea and a solid implementation!
Posted by James John Malcolm at 7:37PM
I don't understand why there is a new syntax for accessing these attributes. I think it's more anoying than useful to bind data-opactity to dataSet.opacity (dataSet-opacity will be more accurate). That remember me the "imcompatibilities" in binding some css property in javascript. And why don't simply use the already existing getAttribute() and setAttribute(). You are introducing another point of confusion.
Posted by François Piat at 9:14PM
How about x- instead of the verbose (and not always appropriately named) data-? The corresponding DOM object could be called extattr instead of dataset, which too is less inaccurate in various cases.
Posted by Aristotle Pagaltzis at 9:32PM
I'm liking X/HTML 5 more and more all the time. I didn't like it in the beginning.
Posted by Devon Young at 4:09AM
François, it is mostly for ease of authoring and encouraging authors to use these attributes instead of attributes that might clash with future versions of HTML.
Aristotle, the problem with that is that the attributes are not experimental, which is what x- is often used for. They are simply representing custom data.
Posted by Anne van Kesteren at 4:42AM