Anne van Kesteren

URL equivalence

Years ago Sam Ruby posted URI Equivalence (thanks Robbert!). I have been studying URLs lately to write a new URL standard. Turns out that something so fundamental to the platform is non interoperable in various ways. Quelle surprise! And yes, the plan is to do away with IRI/URI and just call them all URLs. Anyway:

http://example.com/          http://example.com           true
HTTP://example.com/          http://example.com/          true
http://example.com/          http://example.com:/         true
http://example.com/          http://example.com:80/       true
http://example.com/          http://Example.com/          true
http://example.com/~smith/   http://example.com/%7Esmith/ false
http://example.com/~smith/   http://example.com/%7esmith/ false
http://example.com/%7Esmith/ http://example.com/%7esmith/ false
http://example.com/%C3%87    http://example.com/C%CC%A7   false

The reason for the latter four being false is that browsers (apart from Chrome) do not unescape URL escapes. Well, and the last one really is because Unicode normalization is not performed throughout the web platform. These are equal as Ç expands to %C3%87 during parsing:

http://example.com/%C3%87    http://example.com/Ç