Anne van Kesteren

Re: URI design

There have been several interesting comments and questions to my post about URI design and I thought I would address some in a separate post to keep things simple and clear. First, there was some confusion about the term URI. People who are confused as well might want to read Uniform Resource Locator, a slide from a presentation called Basic Internet Definitions brought to you by Google. You might want to read the following documents as well for additional background information (keep in mind that most of them are not new):

Some people would already want to use the UTF-8 version of URIs, namely: Internationalized Resource Identifiers (IRIs). The problem is that not all browsers support this correctly yet and it isn't standardized yet. You should probably stick with either US-ASCII names for the location or use encoding, like: m%C3%A8re. The problem with the latter is that it isn't exactly readable, unless the browser is going to display these things differently.

I don't think there is much to say about file extensions (there is, but the answer is clear for me). I don't really understand why people think that documents should not have file extensions and images may have file extensions. These people seem to be under the impression that you need a lot of mod_rewrite rules to do so. That is wrong. If you turn multiviews on you have it. This can potentially increase accessibility a bit if you take the proper steps. Let's say you have an animated GIF that tells a little story. The file is called 'little-story.gif'. For people who don't have the ability to read images or see them you have created a simple text file (a HTML file would be even better, obviously, but you wanted to do it quickly) called 'little-story.txt'. Since you are running a proper webserver you can refer to it as 'little-story' and depending on the user preferences (technically: the Accept header) he will get what he wants. This continues in XHTML 2.0, where you could have: 'little-story.svg', 'little-story.swf' and 'little-story.gif' and you use the following markup:

<section src="little-story" type="image/svg+xml,application/x-shockwave-flash;q=.5,image/*;q=.1">
 <h>…</h>
 <p>…</p>
 <p>…</p>
</section>

This actually means that the thing you point to doesn't have to be necessarily there. You might know this works quite well with mod_rewrite, but as you can see there are more tricks you can use. Your URI structure can be totally independent of your file structure and that is a good thing. Your file structure may be based on directories with the editor's name in it, but you don't want to have such an URI structure. It is not permanent.

Re: lowercase, using lowercase characters in your URI makes them easier to remember (consistency is key). Some people like camel case, but using a complete lowercase URI, using hyphens to separate words is clear and more readable. Of course, we shouldn't follow the industry leader, because it indexes hyphens better. (Following the "standards" of Microsoft isn't considered best practice either.) Hyphens are just easier to type and easier to read than either camel case or underscores.

Finally, there was some discussion about /file versus /dir/. See the difference? One has a slash on the end, the other doesn't. If you are going to set up a new site and want the URI to be perfect, you might be interested in the following Apache modules:

I should probably write this down somewhere general, but this weblog is subjective with some objective points and it is mostly "in my humble opinion". Thanks.

Comments

  1. Regarding hyphens and lower-case URIs, there is one case where it is not so good, and that is hyphenated words and also hyphens in phrases. Hyphens between a lot of words adds to the length of the URI too. Consider these two approaches:

    http://www.designdetector.com/archives/04/07/AQuestionOfHTMLPart3-Definitions.php

    http://www.designdetector.com/archives/04/07/a-question-of-html-part-3-definitions.php

    The hyphen between the last two words is lost. However, the URI is easier to type in (no SHIFT key required).

    Here's another example. Which is better?

    http://www.example.com/ISawANewly-MadeBunToEatInTheShopFor69p.php

    http://www.example.com/i-saw-a-newly-made-bun-to-eat-in-the-shop-for-69p.php

    Certainly the latter is easier to read and type in! But if capital letters are important, some meaning might be lost.

    Posted by Chris Hester at

  2. Thanks for the resources. I was having a similar thought when I wrote a blog on Web application URLs.

    I agree that lower case is a must. Although can't they be neatly lowered? They seem be when I type a domain in caps in my URL bar.

    With content negotiation I would say there absolutely no need for file extensions. I try not to use them on my images.

    My dilemma is this. How do I know the URL http://annevankesteren.nl/ is a blog?

    Posted by Kai Hendry at

  3. Chris, you are trying to have the document title inside the URI. I consider that bad practice, unless the title is quite short, like the title from this post. (I have made posts in the past with hyphenised title, among others, but when slugs were discovered and implemented I don't see them that much anymore.) Note that having a '.php' extension is sub optimal.

    Kai, domain names and domain extension are case insensitive.

    Posted by Anne at

  4. Ah, now I understand! Thanks, Anne!

    So a URL can't contain something like #somebookmark at the end. Therefore, if you want to include bookmarks then you must speak about URI.

    However, remember that this is also apparently a URI:

    tag:diveintomark.org,2004-08-11:/archives/20040811182856

    (The above URI is taken from Mark Pilgrim's Atom Feed.)

    That was my original pain, because a URI can practically be anything, as long as it's unique. I prefer being more specific when I can.

    The concept of the URN sounds nice, since I don't like http://. For some reason it just feels to me like it's technology specific, and could (possibly) be subject to change somewhere into the future. Hardly seems like "permanent" to me. But anyway, let me rather not start again. :-)

    Posted by Charl van Niekerk at

  5. Confusion over "URI" is why, every time I upgrade WordPress, I change the comment form so the label is "Website." "URI" might be the most precise term, but people were entering things like "URI?" and "huh?" More people know what a URL is, but everyone knows what a website is.

    Sometimes it's best to just go with the layman's terms.

    Posted by Kelson at

  6. Finally, there was some discussion about /file versus /dir/. See the difference? One has a slash on the end, the other doesn't.

    You know, I was really hoping this would go somewhere; I've been wondering about this for quite a while. A slash on the end can't be the only difference, can it? :-)

    Posted by dolphinling at

  7. You know, I was really hoping this would go somewhere; I've been wondering about this for quite a while. A slash on the end can't be the only difference, can it? :-)

    No, since /index.php/ will, in most cases, open the file index.php, even /index.php/args will work.

    Also, /dir will, if the server is configured that way, which a lot of servers are, open the directory dir or the file /dir/index.php.

    I belief this is not standard configuration (and certainly in hyperlinks, you should use a slash only for directories), but on a lot of server-configurations, both cases will work. :-)

    Posted by Tom at

  8. Yes, some servers are configured the wrong way. This one is too, but that is just because I don't want my own server yet.

    It's like markup, you can use it incorrect and still make it work.

    Posted by Anne at

  9. I did the whole no extension thing for about a year on my site. Every image, CSS file, javascript, every referenced resource was negotiated by Apache. At first I was concerned about performance but the hit was negligable if anything. However I stopped doing it for two main reasons:

    There you have it. Web stats programs that work off Apache log files have no way to distinguish type on extensionless resources. At the time stats were more important to me than they are now and I switched back to the old way of doing things. Now my site is somewhat a mix. Having been on both sides of the fence, I'm pretty neutral; I appreciate cleaner URIs but I don't think for media resources most people will get much benefit out of negotiation.

    Posted by Matt at

  10. Anne: Chris, you are trying to have the document title inside the URI. I consider that bad practice, unless the title is quite short, like the title from this post.

    I thought it was good practice, so the user knew what the link was for. Sometimes I got lazy and just made a link called, say "validator.php". I then had to think exactly what it was. For someone new to my website, surely an exact title is the way to go?

    Posted by Chris Hester at

  11. For those who're interested in it: I wrote a nice little overview about trailing slashes on Fiftyfoureleven.com.

    Posted by Faruk Ates at