Anne van Kesteren

The ID element in Atom

20 August 2004

An Atom ID element contains an URI (IRI are not a standard) that is currently bound to a set of rules on which there is some consensus. It must be universally unique, must not change over time and it must not be relative. There is only one thing I doubt about. String comparison or URI comparison?

http://example.org/
http://EXAMPLE.org/

Those are two exactly the same URIs, but different strings. You might want to read Identifying Atom for more (fun) examples. Because it seems to be very difficult to normalize URIs some people want string comparison instead, but if that is the case… why should we use URIs in the first place?

Apperently different languages environments return different results. Java is totally borked.

Comments

Why we should use URI's is a question I've asked several times, but not yet gotten any comprihensible answer to. Some people has the opinion that URI's are the only way to make something globally unique, but I digress. Any syntax would do, and URI's are infested with all kinds of different things, like encoding problems, dereferencability (applies only to URL's, but URL's are URI's) etc.
I think we should go for URN's. Yes, they are also URI's, but they are more explicit as identifiers and not at all directly dereferencable as such. I don't know how IRI's apply to URN's, but I would guess IRI-based URN's would make them more i18n-friendly, and thus a better choice than URI-based URN's.
I would agree that the syntax definitions of a URI's are good for identification purposes, but all the other baggage, like normalization, escaping etc, is just stupid to have to pay attention to.
Posted by Asbjørn Ulsberg at 6:24PM
Is Java just as borked as Opera? :-)
Good web servers will simply redirect if your case is incorrect on file/directory names. And if you forget that trailing slash, no problem. And even if you replace a slash with a backslash (oh my goodness!), it should still work.
If you ask me, a web server/operating system that does not do this, is written poorly. And many (most?) are.
These things do add complexity, but who ever said that being a programmer was easy. :-)
In the end, usability rules! And you will just have to live with the fact that a URI != a string. :-)
Posted by Charl van Niekerk at 8:28PM
Charl, you miss the point. This is about URIs as identifiers. This has nothing, absolutely nothing, to do with web servers.
Posted by Anne at 9:18PM
I know. I think I just expressed myself wrong. I wasn't trying to advise people on how to write web servers; maybe I got sidetracked a little bit.
My actual point was trying to say that expecting URI's to be simple strings is impractical.
some people want string comparison instead

I was just trying to say that there is no simple way out. :-)
Posted by Charl van Niekerk at 9:48PM
URL vs URL has me all sorts of confused.

-Shade
Posted by Shade at 6:16AM
The (current) Atom pre-draft 0.3 spec says atom:id "MUST be a URI."
This completely negates using string comparison to check it.
Posted by Devon at 9:25AM
It's all muddly and sloppy because we're looking for consensus and engineering, not elegant architecture. Our esteemed Co-Chair will not accept any solution that doesn't allow (and probably prefer) HTTP URIs as identifiers. The reality of the IETF means we can't use an unregistered scheme, and probably can't register a scheme. The reality of our experience with RSS's guid (and Atom 0.3's id, for that matter) means we can't just leave it up to people to decide on their own how to make something unique and stable. The utter brokeness of common tools means that even if we mandate nothing but HTTP URIs, we still can't count on successful comparison unless we either mandate canonical URIs (which we may or may not get, no matter what we insist upon), or say that they have to be URIs (for Tim, for the IETF, for a chance of uniqueness) which must be treated as strings (for Java and C#). Engineering, not architecture.
Posted by Phil Ringnalda at 9:37AM