Anne van Kesteren

registerProtocolHandler() & registerContentHandler()

At Mozilla we are looking at making more parts of the web platform pluggable by web applications so I had a look at the current state of registerProtocolHandler() and friends. The goal behind this feature is to make mailto URLs play nice with Yahoo! Mail, Gmail, etc. Or in other words, it makes navigation extensible.

The state of this feature is rather poor. It is only supported by Chromium and Gecko. Chromium also supports unregisterProtocolHandler(). Neither Gecko nor Chromium support isProtocolHandlerRegistered().

Perhaps if we implement the missing methods and improve the user interface around it this will see somewhat wider adoption. However, to make the interface really work we would have to have built-in knowledge about each URL scheme. As users really have no concept of that. Firefox uses “Add title (domain) as an application for scheme links?” which even for mailto is probably confusing. Chrome seems a little better, using “Allow domain to open all type links?” For the mailto scheme it uses email as type.

(There is also registerContentHandler() which is for making the bit pluggable where your browser has no idea what to do with the resource it just retrieved. This is only supported by Gecko and only for feed-related MIME types.)

If you have any great user interface ideas let me know! I thought I would share the above since I could not find a decent summary anywhere else.

DOM: attributes sadness

I have been reinstating “features” related to attribute handling in DOM. We thought we could get rid of them, but usage counters from Chrome and compatibility data from Gecko showed we could not. This is very sad so I thought I would share the pain.

A simple design for attributes would consist of each having a name and a value (both strings) and a simple map-like API on every element would be sufficient to deal with them. The getAttribute(name), setAttribute(name, value), and removeAttribute(name) methods. As well as a way to iterate through the names and values.

However, back in the day getAttribute(name) was required to return the empty string rather than null for a missing attribute, so hasAttribute(name) also exists. Fixing the specification to make getAttribute() return null was highly controversial back then. I even misguidedly ranted against developers who were making use of this feature as it prevented Opera from becoming standards compliant. “Please leave your sense of logic at the door, thanks!” was not a popular phrase back then.

Unfortunately namespaced attributes are a thing. And instead of simply adding a namespace field to our existing name and value, a namespace, namespace prefix, and local name field were added. Indeed, the local name is not necessarily equal to the name of an attribute. The idea was to have some kind of modality where before namespace and after namespace attributes would not really interact. That never happened of course. To deal with namespaces we have getAttributeNS(namespace, localName), setAttributeNS(namespace, name, value) (indeed, name, not localName, so bad), removeAttributeNS(namespace, localName), and hasAttributeNS(namespace, localName).

The real kicker is that the first four methods ignore the namespace fields, but can create attributes you cannot access with the *NS methods. There is no universal attribute API, though if you stay clear from namespaces everywhere you are probably mostly fine (except perhaps with SVG and such).

This was still too simple. There is also attributes which returns a NamedNodeMap (only used for attributes these days). And hasAttributes() which can tell you whether that map is empty or not. These two used to be on all nodes (to limit the amount of casting in Java), but we are moving them to element since that is where they make sense. NamedNodeMap contains a collection of zero or more Attr objects so you can inspect their individual fields. The map has a length property, an item(index) method, and is implemented with some kind of JavaScript proxy so attributes.name works, as well as attributes[0]. Good times. Attr objects also allow manipulation of an attribute's value. Due to mutation observers this requires an element field on attributes to point back to the element the attribute belongs to. Namespace prefix also used to be mutable field, but fortunately this was poorly implemented and recently killed.

The real reason attributes are so complicated, and more complicated still, ignoring namespaces for the moment, are DTDs. The SGML crowd was not brave enough to cut the lifeline when they did XML. Then XML got popular enough to end up in browsers and the DOM. This meant that attributes cannot contain just text, but also entity references. And therefore attributes became a type of node. Entity references were really never implemented and we managed to remove that cruft from the platform fortunately. However, attributes are still a type of node.

The last things we are investigating is whether attributes can stop having child nodes and perhaps stop being a node altogether. Meanwhile, we had to add createAttribute(localName) on document, getAttributeNode(name), setAttributeNode(attr), and removeAttributeNode(attr) on element, and getNamedItem(name), setNamedItem(attr), and removeNamedItem(name) on NamedNodeMap back as sites use these. Oh wait, and all their *NS counterparts of course, bar removeAttributeNodeNS().

Added together, we have twenty-five methods to deal with attributes rather than three. And attributes require six internal fields rather than two. And this is assuming we can get rid of child nodes and attributes being nodes, both semi-implemented today.

Asynchronicity

There is ever more asynchronicity within the web platform. Asynchronous being some set of steps that could be performed in parallel with JavaScript running in a given environment such as a window or worker. Fetching networked resources, computing crypto, and audio processing, are examples of things that can be done asynchronous. The JavaScript language does not really know about threading or background processing. The platform however has had this for a long time and synchronized with JavaScript using events and these days also through resolving promises.

The way any environment works, simplified, is by going through a stack of tasks. Whenever the user moves the mouse, or XMLHttpRequest fetches, new tasks are queued to eventually dispatch events and then run event handlers and listeners. Asynchronous steps run in parallel with this.

When new standards are written, this is often done wrong. A set of asynchronous steps cannot refer to global state that might change, such as a document's base URL. They also cannot change state, such as properties on an object. Remember, these steps run in parallel, so if you change obj.prop, obj.prop === obj.prop would no longer be guaranteed. Bad. Instead you queue a task. Effectively scheduling some code to run in the environment at some point in the future when it has the bandwidth. The Fetch layer queues tasks whenever new network chunks arrive. The UI layer queues tasks whenever the user moves the mouse. Etc.

In summary, you have the environments where based on a sequence of tasks, JavaScript is executed. Then there is the background processing, known as asynchronous steps in standards, which queues new tasks to the various environments to stay synchronized over time.

(Not all of this is properly defined as of yet, please fill in the gaps as you run across them. Note Asynchronous Steps Explicitly has advice for how to go about that.)

28

When I was a kid we went to Switzerland sometimes to enjoy the mountains in the summer and my mom used to tell me that all of Switzerland was celebrating my birthday. I live in Switzerland now 😊.

URL: Unicode IDNA Compatibility Processing (Unicode Technical Standard #46)

Previously, in reverse chronological order: URL: IDNA2003, IDNA Hell, URL: IDNA2008, and URL: domain names.

The change to the URL Standard to use Unicode IDNA Compatibility Processing was made a while back. The reasoning is that it provides an interface compatible with IDNA2003, including almost identical processing, but is based on the IDNA2008 dataset. That means that lowercasing still happens. Mapping code points still happens. URLs will remain cool. The URL Standard has the integration details.

Vats

Objects live in Vats. Realms are a legacy accident due to poor browser engineering and do not control the object addressing space. Vats can only communicate through message passing. Messages clone objects, sometimes transfering their underlying data. Ever-so-slowly ECMAScript is growing more powerful to describe the web platform. This kind of modeling is what excites me these days.

Thanks to Allen Wirfs-Brock for correcting my errors in writing this and providing these cool analogies:

(Mark Miller did this long ago in E and the web has a somewhat sloppier version of it, through workers, structured cloning, and having multiple globals, but on the upside is deployed widely.)

Contributing to standards

I was asked how one contributes to standards. Before anything else, it is worth watching Domenic Denicola’s presentation on making friends and influencing standards bodies. It is awesome and will teach you a great deal.

I think the core thing to understand when considering contributing to web standards is that they are created by communities. Typically there are a few people leading the charge and many people contributing with critique, research, and tests. Usually there is a combination of mailing lists, IRC, and the occasional face-to-face meeting, to keep everyone roughly synchronized.

A lot of discussion still happens through email and given the volumes you need to filter it to some extent. An effective way of doing this is by paying more attention to the peers you know and trust and checking from time to time to see if that list of peers needs adjusting. E.g. if you follow the development of JavaScript you want to read email from Allen Wirfs-Brock and Brendan Eich. If you follow development of HTML you want to read email from Ian Hickson. You’ll quickly find out Boris Zbarsky is insightful irrespective of the mailing list involved. If the people are not immediately obvious to you, you can always ask on IRC. These people will often reply to the key points within a thread and make it immediately obvious what it is about and why it might be worth paying attention to. That way you save yourself some time reading the whole thing. Of course you will need to judge for yourself how to filter, but some amount of filtering will be required if you want to keep up with the community and also do some work.

You want to figure out what community to participate in:

Unfortunately there’s a myriad of other smaller lists for particular APIs. Usually the standard you care about has relevant pointers. If it doesn’t, please file a bug or let someone else know as it definitely should.

Studying the output of the community (the standard and tests) and its ongoing progress (the mailing list) is a good way to get a feel for how things work and what you should pay attention to. It can help to read the WHATWG FAQ too as it documents answers to many common questions. Having familiarized yourself with the material and the environment you should feel more than ready to start participating more actively, in particularly if you see something worthy of improvement.

Monkey patch

There appears to be trend where specifications monkey patch a base specification. A monkey patch being a subtle change to an existing algorithm only observable if you have seen both the new and the base specification. Some examples: Custom Elements attempts to redefine the createElement() method; Resource Timing adds a hook into each fetching end-point within a document without actually defining this in any amount of detail; Content Security Policy hijacks JavaScript’s eval(). (Using dated TR/ URLs here as an exception so these examples remain useful going forward.)

Apparently it is not clear that this is bad design. We should avoid monkey patching (hereafter patching). It has at least these problems:

Note that it is fine to have extension points. Both adopting and cloning of nodes can be hooked into by other specifications (and soon JavaScript for Custom Elements). Explicit extension points make the model clear. If adopting was instead merely patched from HTML’s img element definition it would not be clear for someone reading the adopting algorithm that adopting is actually more involved.

If you encounter patching, please file a bug. If you are writing a specification and temporarily want to patch a base specification to help implementations along, file a bug on the base specification so the community is informed of what you are trying to do.

One year at Mozilla

I love figuring out the web platform and making it better.

Last year (since I started February 4) I worked on Fetch, URLs, DOM, XMLHttpRequest, Fullscreen, Notifications, and Encoding, all published through the WHATWG under CC0. Apart from that I focused on bringing JavaScript and the web platform closer together by trying to foster better mutual understanding. The intersection of DOM, HTML, IDL, and JavaScript around the details of script execution, tasks, microtasks, and multiple globals has also been a recurring theme. This year the plan is to solve offline.

Countries in 2013

I moved to the United Kingdom to work for Mozilla last year and it has been excellent so far. Getting close to a full year now. Since I have listed countries in 2008, 2009, 2010, 2011, and 2012, I thought I would do it again: