Anne van Kesteren

URL: Unicode IDNA Compatibility Processing (Unicode Technical Standard #46)

Previously, in reverse chronological order: URL: IDNA2003, IDNA Hell, URL: IDNA2008, and URL: domain names.

The change to the URL Standard to use Unicode IDNA Compatibility Processing was made a while back. The reasoning is that it provides an interface compatible with IDNA2003, including almost identical processing, but is based on the IDNA2008 dataset. That means that lowercasing still happens. Mapping code points still happens. URLs will remain cool. The URL Standard has the integration details.


Objects live in Vats. Realms are a legacy accident due to poor browser engineering and do not control the object addressing space. Vats can only communicate through message passing. Messages clone objects, sometimes transfering their underlying data. Ever-so-slowly ECMAScript is growing more powerful to describe the web platform. This kind of modeling is what excites me these days.

Thanks to Allen Wirfs-Brock for correcting my errors in writing this and providing these cool analogies:

(Mark Miller did this long ago in E and the web has a somewhat sloppier version of it, through workers, structured cloning, and having multiple globals, but on the upside is deployed widely.)

Contributing to standards

I was asked how one contributes to standards. Before anything else, it is worth watching Domenic Denicola’s presentation on making friends and influencing standards bodies. It is awesome and will teach you a great deal.

I think the core thing to understand when considering contributing to web standards is that they are created by communities. Typically there are a few people leading the charge and many people contributing with critique, research, and tests. Usually there is a combination of mailing lists, IRC, and the occasional face-to-face meeting, to keep everyone roughly synchronized.

A lot of discussion still happens through email and given the volumes you need to filter it to some extent. An effective way of doing this is by paying more attention to the peers you know and trust and checking from time to time to see if that list of peers needs adjusting. E.g. if you follow the development of JavaScript you want to read email from Allen Wirfs-Brock and Brendan Eich. If you follow development of HTML you want to read email from Ian Hickson. You’ll quickly find out Boris Zbarsky is insightful irrespective of the mailing list involved. If the people are not immediately obvious to you, you can always ask on IRC. These people will often reply to the key points within a thread and make it immediately obvious what it is about and why it might be worth paying attention to. That way you save yourself some time reading the whole thing. Of course you will need to judge for yourself how to filter, but some amount of filtering will be required if you want to keep up with the community and also do some work.

You want to figure out what community to participate in:

Unfortunately there’s a myriad of other smaller lists for particular APIs. Usually the standard you care about has relevant pointers. If it doesn’t, please file a bug or let someone else know as it definitely should.

Studying the output of the community (the standard and tests) and its ongoing progress (the mailing list) is a good way to get a feel for how things work and what you should pay attention to. It can help to read the WHATWG FAQ too as it documents answers to many common questions. Having familiarized yourself with the material and the environment you should feel more than ready to start participating more actively, in particularly if you see something worthy of improvement.

Monkey patch

There appears to be trend where specifications monkey patch a base specification. A monkey patch being a subtle change to an existing algorithm only observable if you have seen both the new and the base specification. Some examples: Custom Elements attempts to redefine the createElement() method; Resource Timing adds a hook into each fetching end-point within a document without actually defining this in any amount of detail; Content Security Policy hijacks JavaScript’s eval(). (Using dated TR/ URLs here as an exception so these examples remain useful going forward.)

Apparently it is not clear that this is bad design. We should avoid monkey patching (hereafter patching). It has at least these problems:

Note that it is fine to have extension points. Both adopting and cloning of nodes can be hooked into by other specifications (and soon JavaScript for Custom Elements). Explicit extension points make the model clear. If adopting was instead merely patched from HTML’s img element definition it would not be clear for someone reading the adopting algorithm that adopting is actually more involved.

If you encounter patching, please file a bug. If you are writing a specification and temporarily want to patch a base specification to help implementations along, file a bug on the base specification so the community is informed of what you are trying to do.

One year at Mozilla

I love figuring out the web platform and making it better.

Last year (since I started February 4) I worked on Fetch, URLs, DOM, XMLHttpRequest, Fullscreen, Notifications, and Encoding, all published through the WHATWG under CC0. Apart from that I focused on bringing JavaScript and the web platform closer together by trying to foster better mutual understanding. The intersection of DOM, HTML, IDL, and JavaScript around the details of script execution, tasks, microtasks, and multiple globals has also been a recurring theme. This year the plan is to solve offline.

Countries in 2013

I moved to the United Kingdom to work for Mozilla last year and it has been excellent so far. Getting close to a full year now. Since I have listed countries in 2008, 2009, 2010, 2011, and 2012, I thought I would do it again:

State of promises

A brief overview of the current state of promises as people have been asking me all over. We thought we were fully done, but decided on course-correction. Based on discussions with Mark and Tab, Domenic has been doing great work writing up the new algorithms in Promise Unwrapping Algorithm. This defines a subset of the envisioned model (it does not support promises-for-promises, but can in due course). This design will be integrated in DOM until it can move to JavaScript proper. By next TC39 meeting in a month we hope to declare consensus. After that we can start shipping in browsers.

To be clear, the fundamental aspects of promises remain unchanged. And we should continue using them for all new features that require asynchronous values.


Previously, in reverse chronological order: IDNA Hell, URL: IDNA2008, and URL: domain names.

IDNA2003 consists of two important algorithms: ToASCII and ToUnicode. Both operate on a single domain label (i.e. not a whole domain name). To obtain one or more domain labels from a domain name it needs to be split on dots (U+002E, U+3002, U+FF0E, and U+FF61).

Apart from doing a range check and checks for certain code points, ToASCII encompasses two major algorithms: Nameprep and Punycode (see Wikipedia’s Punycode). Nameprep is a specific profile of Stringprep. Stringprep in turn, does a number of things: mapping code points, Unicode normalization (NFKC — “Die, heretic scum!”), check forbidden code points, check proper use of bidirectional code points, and check unassigned code points (although this last one will not happen in browsers).

ToUnicode does the reverse, with the caveat that it cannot fail. If it fails at any point the original input is returned instead.

The URL Standard standardizes on IDNA2003 as that is what the most widely deployed clients implement. It does override one requirement, namely to use the latest version of Unicode rather than Unicode 3.2.

The IDNA section of the URL Standard references IDNA2003’s ToASCII and ToUnicode and makes appropriate requirements around them. The status quo now has better documentation than before. It seems unlikely clients will update to IDNA2008 as it’s not a straightforward replacement (it has nothing equivalent to ToASCII and ToUnicode) and is not backwards compatible.


Ten years now since First Item!. And five months since I started at Mozilla. Pretty sweet.

Currently working on URLs again. In particular, file URLs. Does file: relative to file:///test/path yield file:/// or file:///test/path. I don’t even…

London TAG F2F

Last week was the second reformed TAG meeting, this time with new chairs, and hosted by me at Mozilla in London. I felt that overall it went well, though there was quite a bit of repetition too. Getting a shared understanding takes more time than desired. Takeaways:

Also, the W3C TAG is now on GitHub. It took some arguing internally, but this will make us more approachable to the community. We also plan to have a developer meetup of sorts around our meetings (a little more structured than the first one in London) to talk these things through in person. Feel free to drop me a line if something is unclear.