HREFLANG
and TYPE
are both attributes for the A
and LINK
elements. TYPE
is an attribute as well for OBJECT
. None of these attributes apply to AREA
, I wonder why. Although embedding can be interesting, I'm going to focus on applying them to LINK
and A
elements. Both have some nice features, especially in combination with CSS generated content:
a[hreflang]::after{ content:" ["attr(hreflang)"]"; }
Using the TYPE
attribute you could tell the user the linked file is a PDF, so he will know that Mozilla almost crashes using the Acrobat 6.0 plugin and it will take some time:
a[type="application/pdf"]::after{ content:" (PDF)"; }
Or the more ironical example of showing IE users that they can't view the site, because it uses XHTML:
a[type="application/xhtml+xml"]::after{ content:" (XHTML)"; }
(Before you comment on that, reread the word "ironical". Thanks.) Those examples show the value of the attributes, so why are they considered harmful by me? Let me explain: you don't know for sure what content you get back, never
. This is, obviously, completely related to content negotiation. For example: you have an English weblog, but your browser of choice has been adjusted in a way to say it prefers Dutch content over English content, because you are from The Netherlands. So you visit a site, Google pages are the best example, and you see it has Dutch content. Great, you can read that very easy, it is interesting as well, let's link to it!
<a href="http://www.google.com/press/zeitgeist.html" hreflang="nl">Google Zeitgeist</a>
As well as pointing towards that link I mention that I miss some browser statistics information on that page, which would be nonsense for English users, who will see that information on the page. This would also be a argument that Google needs better architecture, especially since I can't retrieve all information without modifying my browser's Accept-Language
header, but I actually wanted to show that saying "that content is Dutch" is stupid, since you don't know for sure, never. It might have been a useful attribute if HREFLANG
was specified in a way that it would alter the Accept-Language
header of the user clicking the link temporarily, which will allow you to make the following constructs:
<link rel="alternate" href="/" hreflang="fr"/> <link rel="alternate" href="/" hreflang="de"/> <link rel="alternate" href="/" hreflang="nl"/>
Now that would have been nice, but HREFLANG
doesn't do that, it specifies the base language of the resource designated by
, which isn't useful. So HREF
HREFLANG
is harmful, since guessing what the language might be is stupid, you can't rely on it. (Unless you stress test the target document with all possible Accept-Language
headers, including languages that are 'x-' prefixed (which will make it impossible to test) and it keeps returning only one document with 1 language.)
Now, it is probably a good time to say all above applies to TYPE
as well. Try making a link to my weblog using TYPE
;-). Note that this use of TYPE
is different from the use of TYPE
on OBJECT
, see the current XHTML 2.0 TYPE
attribute description how that works. Note that in XHTML 1.0 it isn't possible to make a list of possible MIME types or use asterisks; we can use nested OBJECT
elements though. So this would apply to the TYPE
attribute on the A
and LINK
elements only, since you can't give metadata about a link when that metadata relies on content negotiation. Again, it would be nice if TYPE
could change the Accept
header:
<link rel="alternate" href="/" type="application/xhtml+xml"/> <link rel="alternate" href="/" type="application/atom+xml"/>
But I may as well dream on.
Even without bringing content negotiation into the picture it does seem hard to keep these attributes updated on links to sites that you do not control yourself.
Don't forget that you could get a list of available languages for a page easily by looking at that page's <link hreflang="">
elements. However, most pages don't have propper link
elements anyway (including that page at Google).
Also, keep in mind that the available languages for a page can change very easily over time. For example, somebody could just have made a perfect list of all the available languages for a particular page, and just after that I upload an Afrikaans version of the page. That means that the list would already be outdated. Too complicated if you ask me.
The beauty of using HTTP_ACCEPT_LANGUAGE
is that you can automatically have a page displayed in the language of your preference without any further human intervention. However, if you follow Anne's idea of having hreflang
influence that, the beauty is taken away.
For example, Anne could link to the Dutch version of any particular page, but I would prefer to see the Afrikaans version. So when I click on the link, I want to be taken to the Afrikaans version, not the Dutch one.
So, as far as I'm concerned, hreflang
is only usefull on link
elements (for when you provide a list of possible languages for any particular page on that page itself where it can easily be updated as necessary).
But if you look at a site that uses content negotiation in a "real" way this is not a problem.
Take http://www.debian.org for example, when I go to that URL I get a swedish page. The Vary
HTTP header field indicates that content negotiation is used and it also sends a Content-Location
HTTP header field to indicate the "real" URL to the swedish index page ( /index.sv.html.
So if I could link to that site using the type and hreflang attributes it would look like this:
<a href="http://www.debian.org/index.sv.html" hreflang="sv" type="text/html">
And it would work just fine.
But lets say that I have a site that sends XHTML 1.1 to people that accept that and HTML 4.01 to the rest. And use the Content-Location
header field in the same way as debian.org does, Google for example would Link to URLs that looks like this: http://example.com/index.html and http://example.com/index.xhtml instead of my nice clean URL.
So doesn't this come down to, do you want to have something in the URL to indicate content type or not?
I'm trying not to go too far off-topic, so here goes... In reply to comment 3:
I think this opens up a far bigger can of worms. One of the things I am also concerned about is the indexing by Google. How will Google find these different versions of the same page? Technically, Google must know about both the text/html
and application/xhtml+xml
versions and about all of the different available languages.
Maybe the best way to do this is by putting querystring parameters in the URL. Then you can see that they aren't really necessary, but that they MAY (as defined by some RFC) be put inside the URL if desired.
For example:
<a href="http://www.ccc.de/?language=en" hreflang="en">Chaos Computer Club in English</a>
<a href="http://www.ccc.de/?language=de" hreflang="de">Chaos Computer Club in German</a>
Then you could also have:
<a href="http://annevankesteren.nl/?content-type=application/xhtml+xml" type="application/xhtml+xml">Anne van Kesteren (XHTML)</a>
<a href="http://annevankesteren.nl/?content-type=text/html" type="text/html">Anne van Kesteren (HTML)</a>
Or of course a combination:
<a href="http://annevankesteren.nl/?content-type=application/xhtml+xml&language=en" type="application/xhtml+xml" hreflang="en">Anne van Kesteren in English (XHTML)</a>
Note that I did not mark up the abbreviations propperly, but that isn't meant to be part of the example.
The point is that, if the querystring values are specified correctly, then they should be used. Those that are not specified, can be left up to content negotiation if necessary.
But I think for normal links it is better to leave all of this up to content negotiation, unless there is some special reason not to. For example, if I ask Google for pages only in Afrikaans, it might decide to return the extra language
querystring parameter inside the URLs.
The most important thing is to have propper link
elements to specify all alternatives. Stress testing wastes bandwidth IMHO.
Documents should reside at one unique location and stay there.
Different language versions of documents are different documents, and therefore should reside at different locations.
This basicly is what Petter Winnberg wrote If people would follow these guidelines HREFLANG
and TYPE
wouldn't be harmful.
Don't you just love thinking about utopia?
If a resource is available in multiple languages, content negotiation which language the client wants is a fine method, very RESTful, and not bad at all. But what it seems most people forget, is notifying at the resource URI what languages the resource is available in. Content-Language
is afaik supposed to solve that, and if clients supported it, the user could be given the available choices before the final GET
was executed.
Now, there's no negotiation. A resource which is available in many languages only serves the language it thinks is best for the client, even if the client would actually want something else. But if the user doesn't know what else is available, that negotiation is really a one-step process, like saying «I want red or green apples, in that specific order» and get green apples in return.
If the resource beforehand (by doing a HEAD
on it) could state that «I have yellow and green apples», the user could be presented these options, and choose one of them. Maybe the user would rather want yellow than green apples, for instance. Then, the chosen option could be ranked first in the Accept-Language
header on the following GET
request. The yellow apple would then be returned to the user.
In reply to comment 5 & 6:
I don't think keeping seperate documents for different languages is optimal. You are then loosing out on the benifits of automatic content negotiation.
When I go to Google, it is in Afrikaans because I set my preferences in my browser correctly. I don't want to have to click on something to get my language of choice every time. I think that this would be highly impractical, irritating, and unnecessary.
I personally would like the computer to decide for me by default, unless I specify otherwise. The computer knows what my typical language preferences are by the settings I set, so why is human intervention necessary every time?
However, a manual override is allways good. The list of available languages can be in the link
elements, and they can then be displayed to me in some way through my browser. I can then select an alternate language if I like from the list.
The beauty of using
HTTP_ACCEPT_LANGUAGE
is that you can automatically have a page displayed in the language of your preference without any further human intervention. However, if you follow Anne's idea of havinghreflang
influence that, the beauty is taken away.
Why is it take away? You get the page you want first, of course (using content negotiation). But if you want to view the English content you would at least have the possibility.
But if you look at a site that uses content negotiation in a "real" way this is not a problem.
The problem is that no browser supports content-location
, except for Opera with bugs. If I read the comments on the Mozilla bug thread correctly it seems that the RFC is incorrect.
Google must know about both the text/html and application/xhtml+xml versions and about all of the different available languages.
Why? I don't think Google has application/xhtml+xml
in the accept header, so it won't get that version, never.
The point is that, if the querystring values are specified correctly, then they should be used. Those that are not specified, can be left up to content negotiation if necessary.
That isn't different from the solution provided earlier, using extensions for differentiation.
The most important thing is to have propper
link
elements to specify all alternatives. Stress testing wastes bandwidth IMHO.
Stress testing is impossible, like I said. Having the alternative specified doesn't work, when the URL is the same though.
Different language versions of documents are different documents, and therefore should reside at different locations.
+1 (You have the same opinion for different types of documents having the same content, right?)
I don't think keeping seperate documents for different languages is optimal. You are then loosing out on the benifits of automatic content negotiation.
Not at all, how would you loose these advantages? You could still use content negotiation even if you have multiple separate documents.
When I go to Google, it is in Afrikaans because I set my preferences in my browser correctly. I don't want to have to click on something to get my language of choice every time. I think that this would be highly impractical, irritating, and unnecessary.
You misunderstand the concept. Google will give you the localized version, but it will have a permanent location as well, allowing you to choose for the English version as well.
Anne, as long as you have one URL for each different alternate version there is no problem.
Then you can specify link rel="alternate"
(along with type
and hreflang
) without any problem.
This is the problem:
HREFLANG
and/or TYPE
. Different language versions of documents are different documents, and therefore should reside at different locations.
+1 (You have the same opinion for different types of documents having the same content, right?)
Yes, Atom files, for example, should have a different location (Preferably by adding .atom
)
The reliability of a link is small. It seems that half of the links referred to from scientific documents since around 1995 is not accessible anymore today, making a lot of scientific publications non-understandable because the information they rely on is simply not there. Considering this relative unreliability for links without a hreflang or type attribute, consider the number of times a MIME-type or a language might change [1].
However, does something like this fall under reliability? I don't think so. Just because you have once chosen a language or a MIME-type does in no way mean that you would be unreliable by changing that. You are only unreliable if you remove the way to access a page at a certain location (of course not meaning IP-addresses).
Still, this doesn't render these attributes useless. As stated by Charl van Niekerk, they are definitely useful for linking to and from pages on your own site. As with almost any element and attribute, these can be abused (just think of the alt attribute on the img tag).
------------
[1] If you consider the latter unlikely, just take me or Anne. How unlikely would it be for us to switch from English to Dutch or the other way around? Or to German? Only a little less likely than changing MIME-type. Hereby I just mean that www.google.com would display Dutch as default instead of English and the English version would reside at www.google.co.uk (stupid example, but I needed something).
The beauty of using
HTTP_ACCEPT_LANGUAGE
is that you can automatically have a page displayed in the language of your preference without any further human intervention. However, if you follow Anne's idea of havinghreflang
influence that, the beauty is taken away.Why is it take away? You get the page you want first, of course (using content negotiation).
Anne, I don't really understand what you mean.
My point was, that if you specify what language the user must view the page in, you are taking away the advantage of having the user's user agent figure out what language is best for him. Of course he can later choose a different language, but I would suggest rather linking to a language-neutral URL and let the user agent or the user figure it out himself.
But if you want to view the English content you would at least have the possibility.
This is precisely what I described in previous comments.
Why? I don't think Google has
application/xhtml+xml
in the accept header, so it won't get that version, never.
I know, but it could have. And I believe that is will have one day when XHTML is commonly used and understood.
There could be many reasons why one would only like to retrieve XHTML documents. For example, I might want to know how many people have Afrikaans websites with XHTML on them.
The point is that, if the querystring values are specified correctly, then they should be used. Those that are not specified, can be left up to content negotiation if necessary.
That isn't different from the solution provided earlier, using extensions for differentiation.
It is a little different. File extensions aren't propperly defined. A full-stop-point that is used to devide the name of a file and its extension isn't limited to that use. You could also have:
http://www.google.com/some.uri.that/doesnot.exist
In the above example, is exist
a file extension?
In other words, full-stop-points must be seen as part of the document path.
However, querystring parameters is a different story. They are not meant to be part of the document path. However, the borderlines for this aren't propperly set either.
For example, many content management systems abuse querystrings by serving completely different documents depending on querystring parameters. I think this also probably has to do with URI semantics and abuse of it. But then, these things aren't propperly defined either (as far as I know).
You can always remember the Vary
HTTP header. However, it doesn't specify to what extent the changes are inside the document. In other words, is it only different content-types or completely different documents? I think this is a little under-defined as well.
I don't think keeping seperate documents for different languages is optimal. You are then loosing out on the benifits of automatic content negotiation.
Not at all, how would you loose these advantages? You could still use content negotiation even if you have multiple separate documents.
Sorry, I think I didn't express myself propperly. What I meant by different documents was actually different file names. Coming back again to rather using querystrings.
You misunderstand the concept. Google will give you the localized version, but it will have a permanent location as well, allowing you to choose for the English version as well.
What I was suggesting was something like this:
http://www.google.com/?language=af
http://www.google.com/?language=en
Therefore, every language does have a different location, but if you type in http://www.google.com
it will be left up to content negotiation.
I think the only question that still exists, is that should different translations of one document each be seen as a document on its own? If they are seen as separate documents, giving each a separate filename is probably best. Otherwise, use querystrings.
For example, many content management systems abuse querystrings by serving completely different documents depending on querystring parameters. I think this also probably has to do with URI semantics and abuse of it. But then, these things aren't propperly defined either (as far as I know).
And that is exactly the reason one should use different files, using something like name.language.type for files. Since we are talking about different documents here. The English version is different from the Dutch one and Atom has different semantics compared to XHTML.
I think different translations are separate documents. Google does form a sort of exception/shady area here because except for a few little details (Google search
in English as opposed to Google zoeken
in Dutch) the page is 99% the same after entering a search term.
Frenzie, please see the example I posted in the entry (from Google), I don't call that the same page.
And that is exactly the reason one should use different files, using something like name.language.type for files.
Ok, I see your point. I agree.
Anne, it's my fault, I had the window open for a about an hour before I returned to it again and pressed post. It's a bit stupid, because a lot of people could have posted in between.
I just noticed a remarkable difference between Google English and Google Dutch, just check this:
Results 1 - 10 of about 6,010,000 for zoek. (0.23 seconds)
Resultaten 1 - 10 van circa 6,190,000 voor zoek (0.16 seconden)
Looking at the results it also appears that Google Dutch automatically searches for Dutch results, even though I didn't tell it to do so (zoeken in pagina's in het Nederlands
was off).
The Zeitgeist is of course very different, but I would have expected the search results to be the same except for some language specific details (search, images and such) if entering the same search query. I was wrong. Of course there is a need for different versions in the way you stated, but I don't think that takes away that it would have been a sort of shadow area - would the results be the same, which they are not. Therefore, it's not a shadow area and the need for two different locations is much more clear.
I would have expected the search results to be the same except for some language specific details (search, images and such) if entering the same search query.
Though this is getting a bit off-topic: Google's various versions not only differ language-wise but also content-wise. Google removes pages from their index as required by local law, which leads to the indexes available to users of a specific language-version (whether they are from that country or not) being of different sizes.