Anne van Kesteren

Marking up an abbreviation is complicated

The importance of abbr is underestimated. Most people do not use it, 'cause of the browser compatibility issues (I thought only IE/Windows doesn't get it, but that's a big one) or because they haven't seen it in use and therefore they don't no about it. The second is of course related to the first and related to the fact that the W3C doesn't do anything to promote web standards, but that's a whole different story/item.

Most people who use some kind of abbreviation markup use acronym to markup all abbreviations. This is nonsense of course, since acronyms are a subset of abbreviations. Acronyms are supposed to be words itself (if I'm correct), like WaSP and therefore they shouldn't be spelled out like other abbreviations. If you have an aural style sheet you could have something like this (the default style sheet of an aural browser could also look like this, it's actually quite obvious):

abbr{
 speak:spell-out;
}
acronym{
 speak:normal;
}

Maybe that makes it clear why you have to distinguish abbr and acronym for now... In XHTML2.0 acronym has disappeared, which is a good thing. There has been a lot of confusion between the two elements we have now and they don't make sense. The one is a subset of the other and there are kind of abbreviations I read somewhere (list.w3.org...), am I right? So if you ever don't know which you should choose, choose abbr, since acronym is a subset. I also thought if this (I'm not sure if this is correct with the current specification):

<acronym><abbr title="Web Standards Project">WaSP</abbr></acronym>

A lot of people only specify an abbr or an acronym only at the start of a post/message/item/thread which is in my opinion evil. It could be that this is a recommendation of the W3C and if, I disagree. If an abbreviation like CSS comes another time in that post how on earth does an araul browser know it has to spell-out that word. The least thing an author could do is make it clear by wrapping abbr around it and leave the title attribute out. You should do that I think, let me know if you have something against this.

Last but not least. In XHTML2.0 we also have to distinguish abbr and acronym, well actually we have to distinguish the abbreviation that get spelled out and the one that is just read. My solution would be a simple class, which add some semantic value and it is already widely supported (although I don't care about backwards compatibility). It would look like this:

<abbr title="Web Standards Project" class="word">WaSP</abbr>

This way we are still able to style this aurally.

Comments

  1. I do not completely agree with you. In the first place, I don't think of acronym being a subset of abbr. They're two different things. Take "modem" or "radar" for example. Both are acronyms, but I would never put those inside an abbr-tag because it would be overkill. This is my opinion, of course.

    Secondly, I disagree with the W3C, one should not take out acronym out of the specifications. And I don't agree with you to distinguish acronyms and abbreviations with classes. classes and id's are NOT meant to add semantic value, structural markup should be encoded in the X(HT)ML itself. It's attributes can add extra meaning to it, but not classes, since they require stylesheets for this meaning which some browsers (like Lynx) will ignore.

    But it's a fairly interesting discussion, and maybe I'm just being stubbern. ;)

    Posted by Bas Hamar de la Brethonière at

  2. Aside from EmacsSpeak (GPL for Linux), I don't think there are any screenreaders out there with Aural Stylesheet support.

    That doesn't mean we shouldn't be distinguishing between <abbr> and <acronym> or writing aural stylesheets. We should be throwing our support behind EmacsSpeak and its users. But 99% of screenreader users are using something like JAWS which sits on top of IE and has all of IE's limitations (and will cost you $1000 to buy a copy to test with).

    Posted by Jacques Distler at

  3. Semantic of these two tags seems to be quite different in french.
    Abbreviations are words that are shortened
    e.g. : smthg = something.

    Accronym is when you pick several words first letters.
    e.g. : UFO = Unknown Flying Object.
    No matter in french if you can read it or if you have tou spell it out.

    Posted by mauriz at

  4. I think the specification is clear enough on this point (but, yeah, it could be clearer). acronym is a subset of abbr - it's an abbreviation that can be pronounced as a word. I do believe it would be a good thing to keep them, as they are different enough.

    I am with Bas in saying that to nest them would be overkill, and distinguish 'em by class would not be a good thing. I like the way it is now, and it should stay that way in XHTML2.

    Posted by Ben at

  5. If you're going to omit @title from your abbr and acronym tags after the first occurence, it might be a good idea to style abbr[title] and acronym[title] instead (for visual browsers, obviously).

    Posted by Sean at

  6. You have an interesting point there, Anne. I had thought best to lose the distinction and any possible confusion with it. But, as you say, it could be used to give pronunciation guidance.

    The example of an acronym that always occurs to me is ANZAC (Australian and New Zealand Army Corps).

    I just tried a sentence that uses the word: The Victoria Cross was awarded to 10 ANZACs during the Gallipoli campaign. in a text-to-speech demo service: speech demo.

    It got it right. I suppose the capabilities of the TTS software is also relevant.

    Posted by Michael at

  7. For the same reason, it is important to mark up text (even single words) in a language other than the primary language of the document as:

    <span xml:lang="fr">d&eacute;j&agrave;-vu</span>

    if you want them to be read correctly correctly.

    Posted by Jacques Distler at

  8. There is of course the question of whether native English speakers are used to hearing, say, 'Björn Borg' pronounced correctly. That might actually make it less accessible for some more problematic foreign words.

    Posted by Sean at

  9. For the record (and as mauriz mentioned), an abbreviation is any shortened form and an acronym is a type of abbreviation formed with first letters. "Pronounceablity" is not implied by either term, and I don't believe it should be the responsibility of content markup to distinguish whether there is a correct way to say WSDL.

    The only relevant distinction that comes to mind would be display-related: acronyms are typically rendered in all uppercase. The infamous "radar" is a special case of an acronym being promoted to full word status, and "modem" is an abbreviation, not an acronym.

    Posted by hans at

  10. More specifically, a type of abbreviation formed with first letters where each letter is pronounced is an initialism. Acronyms can be formed with first letters (or not), but most definitions say an acronym is a word and imply it should be pronounced as such.

    For a more in-depth discussion of these nuances, you may want to read an article (and the ensuing comments) I wrote for evolt.org last year, HTML is not an acronym.

    Personally, I was happy to see acronym dropped from XHTML2.

    Posted by Craig at

  11. If I read all this, how should a simple webdeveloper like me distinguish all this? If "modem" is not an acronym, but an abbreviation how should I style that for my aural users (yes I know I haven't got any)? This look terrible to me and I'm now more into the class attribute than ever. Since I then only have to distinquish a word and a "not-word", which isn't really semantic (it would be better to distinquish all kind of abbreviations) but the current method just s***s.

    Posted by Anne at

  12. Modem is an acronym. My Collins English Dictionary says an acronym is: a pronounceable name made up of a series of initial letters or parts of words.

    Posted by Ben at

  13. ... even single words

    Does that go for names, too? So would I put, for example, put as follows?

    See <span xml:lang="tr">Tantek &Ccedil;elik's</span> recent piece on Tim Berners Lee's book.

    Posted by Michael at

  14. Anne, I'd say for acronyms, abbreviations, or foreign words that have become common usage, don't worry about marking them up. For example, most dictionaries list "modem," "scuba," and "radar" as nouns, not abbreviations.

    For foreign words, most style guides suggest if the foreign words is a familiar part of English, for example, don't treat it as foreign. Same is true people's names and most place names. The style guide for Canada's national newspaper, The Globe and Mail, says: "A good indication of whether a word or expression has entered the language is whether it appears in our English dictionary".

    Posted by Craig at

  15. Does that go for names?

    I guess it ought to.

    I can imagine a screenreader set on 'English' pronouncing "Tantek" more or less correctly (or, at least, as well as I can), but "Çelik"? I assume it would need a 'hint' (this, here, is a Turkish name) in order not to totally spooge the pronunciation.

    This makes for an interesting question as to what you should do with words transliterated from non-Roman alphabets. How can you hint a screenreader to pronounce them correctly?

    Posted by Jacques Distler at

  16. If it ought to be done for names, then it means a lot more marking-up ...

    And another interesting question is what then if I mark up, say, Cicero as lang="la" when the usual pronunciation will be different in English, French, (and indeed modern Italian), which the rest of the text might be in.

    In the spirit of separating content and style, am I identifying the language when I mark up a word or phrase rather than mandating a pronunciation?

    Perhaps there could be other reasons for identifying it.

    If it were a familiar word/phrase and the author hadn't provided a translation, a user who however didn't know it could look at the source and see which dictionary to go to.

    Maybe it could be an indication of the pronunciation rather than a means of mandating it. So theoretically an English language screen-reader could and perhaps should interpret:

    <span xml:lang="fr">d&eacute;j&agrave;-vu</span>

    as "dayzhah voo".

    It used to said in England that it was "incorrect" and "vulgar" to "speak more than one language at once" - meaning use authentic pronunciation when conversing in a language but don't show off when it's just a scrap in the middle of a sentence in another language.
    Is there something in that?

    Latin is, I think, a special case. There are dozens of Latin words/phrases that are severely anglicized: Caesar ("seizer"), via ("vye-er"), carpe diem ("carpy dye-em") ... and so on ad nauseam.

    By all means mark up a block of Latin. But better not mark such as those up, if it meant a screenreader confused people with "authentic" pronunciations.

    Posted by Michael at

  17. What I see repeatedly is that many people think out of their own language, mostly being English. This is quite understandable, but there are circumstances in which you would expect things to work differently.

    No offence to the English people here, but you are all barbarians when it comes to foreign languages! (Not all of you, I know.) You are not only trying to pronounce words English-like, but you are even changing names into more English ones if it suites you. "Homer"? The guy's name is "Homerus"!

    The point I'm trying to make is that in Dutch a word from a foreign language will be pronounced as it would have been in the language it came from, the same goes for names. So in our case (and probably many other cases) marking up these words is not an irrational thing to do, and screen readers would most definitely (sp?) benefit from such markup.

    Just to be clear, I did not intend to offend anyone.

    Posted by Bas Hamar de la Brethonière at