Anne van Kesteren

`rel=canonical`

16 February 2009

I wonder why the canonical extension to the rel attribute was not first proposed on an open forum. Someone might have bothered to point out that it is almost (if not the same) as the self value Atom uses. Also, Google, Microsoft, et al, there is a registry for extensions to the rel attribute. Take note.

The extension itself only seems marginally useful. In the extreme case you would have to use it for every page because someone could put a question mark at the end of the URL with a bunch of useless parameters that do not affect anything at all. In most of the other cases redirects would probably be better. The Wikipedia scenario is somewhat compelling though.

Comments

It's unfortunate 'self' isn't in the registry.
Posted by Ben at 1:29AM
Yeah, it's not really clear whether HTML and Atom share the same value space in that sense. (Which should not stop us from using the same names if they are identical in meaning.)
Posted by Anne van Kesteren at 1:48AM
Atom's link/rel and HTML's link rel live in different namespaces, use different value registries, have different datatypes, and are defined in specifications which do not reference each other. Is there any reason to think that they share the same value space, other than the fact that they share the same (local-)name?
Posted by Mark at 2:25AM
Just the value alternate and the fact that Atom also uses the link element with the rel attribute. Well, and there are some people from the HTTP WG who try to combine Atom, HTML, and HTTP Link link relationships into a single thing. (I’m neutral to that idea at the moment and opposed to the way it is suggested to function.)
Posted by Anne van Kesteren at 3:10AM
This would be useful on most pages in web apps, I think. Wikipedia has a very nice, clean division between "pages" and "editor-only junk", with basically no gimmicks or options on the pages, and so it's one of the least compelling use-cases (although it's certainly convenient for redirects). A much more compelling use-case is your typical forum link like
http://www.twcenter.net/forums/showthread.php?t=124773&page=5&perpage=10&highlight=browser
which as far as the search engine is concerned, should probably be treated like
http://www.twcenter.net/forums/showthread.php?t=124773&page=3
or some similarly non-obvious thing. I've long thought a directive like this would be useful. No objections about "Why didn't they propose it publicly first?", but I've gotten used to that for this kind of search engine feature by now. At least they cooperated with each other instead of all making up a slightly different thing.
(Wow, never tried posting here before. The requirement to manually enter valid XHTML must cut down on uninformed comments, not to mention spam bots. But really, & isn't allowed?)
Posted by Aryeh Gregor at 6:12AM
Not meaning to be flamebait, but has Google ever had any interaction with the standardization process? It seems to me like they just do whatever they want as long at it doesn't directly hurt anyone else.
Also, I don't quite get the point of this. Aren't URL's supposed to be canonical in their nature? I could point you to the appropriate literature, but you're probably already read it.
Posted by Alan Trick at 6:39AM
Aren't URL's supposed to be canonical [...]?

Well said!
Posted by Ben Millard at 8:04AM
@Alan: Like any organization, commercial or not for profit, some people participate and coordinate and some ignore. It is specifically true for a big organization. Like Anne, I would have preferred that Google, Yahoo and Microsoft used an open standard organization such as W3C, or even IETF for making their proposal. But it is often happening that a group of likely minded people create their own thing. This time, it is search engines companies, in the past it has been blogs companies (notfollow). We had also browser vendors (whatwg), or Social Network Services (Social Graph). Maybe now that the damage is done, maybe they could write a simple W3C Note or a W3C Member Submission at least.
Posted by Karl at 2:49PM
WHATWG at least allows for public participation. I think it therefore is something distinct from what has happened here, but feel free to disagree.
Posted by Anne van Kesteren at 4:12PM
Some CMS's were created Before Google and unfortunately haven't modified their core software to suit SEO. This is actually pretty helpful for the real-world use case where a publisher using one of these systems needs to fix duplicate content problems inherent in the CMS.
Posted by KH at 5:05AM
I was just re-reading http://diveintomark.org/archives/2002/12/29/million_dollar_markup which reminds me of rel=bookmark.
"Now, there is a way to specify permalinks in HTML, but virtually nobody uses it. On each actual permalink link, you can specify rel="bookmark". "...
Do the two overlap?
Posted by Dan Brickley at 5:29AM
I wonder why the canonical extension to the rel attribute was not first proposed on an open forum.
Whatever Google demands, Google gets. So why would they bother asking anyone else for input? ;)
Posted by Ben Buchanan at 12:00PM
For me rel="self" should return that exact resource, complete with sort orders, search terms, highlighting and other formatting cruft. rel="canonical" however should have the same meaning even if the formatting changes slightly. rel="shortlink" (my current focus) would then be a short version of the canonical URL, useful for space-constrained (e.g. microblogging, mobile) and manual entry (e.g. printed, spoken) applications.
Of course that's not taking into account versioning information such as "this link is to a specific version and will never change even if the document is updated" vs "this is the most current version of the document" (the former being useful for quoting wikipedia for example).
Sam
Posted by Sam Johnston at 9:00PM