Anne van Kesteren

Network effects affecting standards

29 April 2016

In the context of another WebKit issue around URL parsing, Alexey pointed me to WebKit bug 116887. It handily demonstrates why the web needs the URL Standard. The server reacts differently to %7B than it does to {. It expects the latter, despite the IETF STD not even mentioning that code point.

Partly to blame here are the browsers. In the early days code shipped without much quality assurance and many features got added in a short period of time. While standards evolved there was not much of a feedback loop going on with the browsers. There was no organized testing effort either, so the mismatch grew.

On the other side, you have the standards folks ignoring the browsers. While they did not necessarily partake in the standards debate back then that much, browsers have had an enormous influence on the web. They are the single most important piece of client software out there. They are even kind of a meta-client. They are used to fetch clients that in turn talk to the internet. As an example, the Firefox meta-client can be used to get and use the FastMail email client.

And this kind of dominance means that it does not matter much what standards say, it matters what the most-used clients ship. Since when you are putting together some server software, and have deadlines to make, you typically do not start with reading standards. You figure out what bits you get from the client and operate on that. And that is typically rather simplistic. You would not use a URL parser, but rather various kinds of string manipulation. Dare I say, regular expressions.

This might ring some bells. If it did, that is because this story also applies to HTML parsing, text encodings, cookies, and basically anything that browsers have deployed at scale and developers have made use of. This is why standards are hardly ever finished. Most of them require decades of iteration to get the details right, but as you know that does not mean you cannot start using any of it right now. And getting the details right is important. We need interoperable URL parsing for security, for developers to build upon them without tons of cross-browser workarounds, and to elevate the overall abstraction level at which engineering needs to happen.