Anne van Kesteren

Making the DOM faster

There is a pretty good comment on Hacker News (of all places) by nostrademons explaining why the DOM is not slow. Basically, all the DOM mutation operations are pretty trivial. Inserting and removing nodes, moving them around, setting an attribute, etc. It’s layout that takes a while and layout can be synchronously triggered through APIs such as offsetWidth and getComputedStyle(). So you have to be careful to group all your DOM mutations and only start asking layout questions afterwards.

(This has been a known pattern in some circles for a long time; tip of my imaginary hat to my former Q42 colleagues. However, I don’t think it’s generally well-known, especially given some of the “DOM is slow” advocacy I have seen of late.)

Still, whenever you invoke insertBefore() or remove(), there is some cost as these JavaScript functions are ultimately backed by C++ (nay Rust) with an IDL translation layer inbetween that makes sure the C++ doesn’t get to see anything funky. This happens as it’s important that the C++-backed DOM remains in a consistent state for all the other algorithms that run in the browser to not get confused. Research doing the DOM entirely in JavaScript has halted and in fact would hinder efforts to do layout in parallel, which is being spearheaded by Servo.

Yehuda came up with an idea, based on writing code for Ember.js, which in turn has been inspired by React, to represent these mutation operations somehow and apply them to a DOM in one go. That way, you only do the IDL-dance once and the browser then manipulates the tree in C++ with the many operations you fed it. Basically the inverse of mutation records used by mutation observers. With such a DOM mutation representation, you can make thousands of mutations and only pay the IDL cost once.

Having such an API would:

  1. Encourage good practice. By providing a single entry-point to mutating the DOM, developers will be encouraged to group DOM updates together before doing any kind of layout. If more sites are structured this way that will increase their performance.
  2. Improve engine performance. This requires further testing to make sure there is indeed a somewhat non-trivial IDL cost today and that we can reduce it by passing the necessary instructions more efficiently than through method calls.
  3. Potentially enable more parallelism by preparing these DOM updates in a worker, via supporting this DOM mutation representation in workers and making it transferable. That reduces the amount of DOM work done where user interaction needs to take place.

Looking forward to hear what folks think!

Update: Boris Zbarsky weighs in with some great implementer perspective on the small cost of IDL and the various tradeoffs to consider for a browser-provided API.

Users, clients, and servers

Some thoughts and lessons learned on the web’s client-server model, interoperability, and standards:

In other words, the robustness principle most certainly applies to clients, whereas servers have more freedom to be strict (or broken, if you will) both ways. This model also applies if you look at the individual subsystems, e.g., JavaScript language implementations versus deployed JavaScript code and HTML implementations versus deployed HTML. We have found time and again that minor divergence between client implementations leads to broken behavior for a given server, whereas the reverse does not apply.

Users win, software scrambles.

HTML components

Hayato left a rather flattering review comment to my pull request for integrating shadow tree event dispatch into the DOM Standard. It made me reflect upon all the effort that came before us with regards to adding components to DOM and HTML. It has been a nearly two-decade journey to get to a point where all browsers are willing to implement, and then ship. It is not quite a glacial pace, but you can see why folks say that about standards.

What I think was the first proposal was simply titled HTML Components, better known as HTC, a technology by the Microsoft Internet Explorer team. Then in 2000, published in early 2001, came XBL, a technology developed at Netscape by Dave Hyatt (now at Apple). In some form that variant of XBL still lives on in Firefox today, although at this point it is considered technical debt.

In 2004 we got sXBL and in 2006 XBL 2.0, the latter largely driven by Ian Hickson with design input from Dave Hyatt. sXBL had various design disputes that could not be resolved among the participants. Selectors versus XPath was a big one. Though even with XBL 2.0 the lesson that namespaces are an unnecessary evil for rather tightly coupled languages was not yet learned. A late half-hearted revision of XBL 2.0 did drop most of the XML aspects, but by that time interest had waned.

There was another multi-year gap and then from 2011 onwards the Google Chrome team put effort into a different, more API-y approach towards HTML components. This was rather contentious initially, but after recent compromises with regards to encapsulation, constructors for custom elements, and moving from selectors to an even more simplistic model (basically strings), this seems to be the winning formula. A lot of it is now part of the DOM Standard and we also started updating the HTML Standard to account for shadow trees, e.g., making sure script elements execute.

Hopefully implementations follow soon and then widespread usage to cement it for a long time to come.

Network effects affecting standards

In the context of another WebKit issue around URL parsing, Alexey pointed me to WebKit bug 116887. It handily demonstrates why the web needs the URL Standard. The server reacts differently to %7B than it does to {. It expects the latter, despite the IETF STD not even mentioning that code point.

Partly to blame here are the browsers. In the early days code shipped without much quality assurance and many features got added in a short period of time. While standards evolved there was not much of a feedback loop going on with the browsers. There was no organized testing effort either, so the mismatch grew.

On the other side, you have the standards folks ignoring the browsers. While they did not necessarily partake in the standards debate back then that much, browsers have had an enormous influence on the web. They are the single most important piece of client software out there. They are even kind of a meta-client. They are used to fetch clients that in turn talk to the internet. As an example, the Firefox meta-client can be used to get and use the FastMail email client.

And this kind of dominance means that it does not matter much what standards say, it matters what the most-used clients ship. Since when you are putting together some server software, and have deadlines to make, you typically do not start with reading standards. You figure out what bits you get from the client and operate on that. And that is typically rather simplistic. You would not use a URL parser, but rather various kinds of string manipulation. Dare I say, regular expressions.

This might ring some bells. If it did, that is because this story also applies to HTML parsing, text encodings, cookies, and basically anything that browsers have deployed at scale and developers have made use of. This is why standards are hardly ever finished. Most of them require decades of iteration to get the details right, but as you know that does not mean you cannot start using any of it right now. And getting the details right is important. We need interoperable URL parsing for security, for developers to build upon them without tons of cross-browser workarounds, and to elevate the overall abstraction level at which engineering needs to happen.

Upstreaming custom elements and shadow DOM

We have reached the point where custom elements and shadow DOM slowly make their way into other standards. The Custom Elements draft has been reformatted as a series of patches by Domenic and I have been migrating parts of the Shadow DOM draft into the DOM Standard and HTML Standard. Once done this should remove a whole bunch of implementation questions that have come up and essentially turn these features into new fundamental primitives all other features will have to cope with.

There are still a number of open issues for which we need to reach rough consensus and input on those would certainly be appreciated. And there is still quite a bit of work to be done once those issues are resolved. Many features in HTML and CSS will need to be adjusted to work correctly within shadow trees. E.g., as things stand today in the HTML Standard a script element does not execute, an img element does not load, et cetera.

Enabling HTTPS and HSTS on DreamHost

DreamHost recently enabled Let’s Encrypt support. This is great and makes HTTPS accessible to a great many people. For new domains there is a simple HTTPS checkbox, could not be easier. For existing domains you need to make sure the domain’s “Web Hosting” is set to “Fully Hosted” and there are no funny redirects. If you have an Internationalized Domain Name it appears you are out of luck. If you have a great many subdomains (for which you should also enable HTTPS), beware of rate limits and wildcard certificates being unsupported.

The way DreamHost manages the rate limits is by scheduling the requests not succeeding for a week later. Coupled with the fact that Let’s Encrypt certificates are relatively short-lived this places an upper bound on the amount of subdomains you can have (likely around sixty). If you manage certificate reqeusts from Let’s Encrypt yourself you could of course share a certificate across several subdomains, thereby increasing the theoretical limit to six-thousand subdomains, but there is no way that I know of to do it this way through DreamHost.

To make sure visitors actually get on HTTPS, use this in your .htaccess for each domain (assuming you use shared hosting):

RewriteEngine On
RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

(As long as domains are not using rewrite rules you can in fact share this across many domains by placing it in a directory above the domains, but you will need to copy it for each domain that does use rewrite rules. IndexOptions InheritDownBefore requires Apache 2.4.8 and DreamHost sits on Apache 2.2.22, although they claim they will update this in the near future. (Very much unclear why DreamHost wiki is still without HTTPS.))

The next thing you want to do is enable HSTS by adding this to your .htaccess (first make sure all your subdomains are on HTTPS too):

Header set Strict-Transport-Security "max-age=31415926; includeSubDomains; preload" env=HTTPS

The preload directive is non-standard, but important, since once this is all up and running you want to submit your domain for HSTS preloading. You can remove the preload directive after submitting your domain, if you care for the bytes (or standards). With that done, and thanks to DreamHost’s upgraded HTTPS story, you will get an A+ on the SSL [sic] Server Test.

Custom elements no longer contentious

I was in San Francisco two weeks ago. Always fun to see friends and complain about how poorly Caltrain compares to trains in most civilized countries. Custom elements day was at Apple where you cannot reasonably get to via public transport from San Francisco. The “express” Caltrain to Mountain View and a surged Uber is your best bet. On the way back you can count on Ryosuke, who knows exactly how much gas is in his car well after the meter indicates it’s depleted.

While there are still details to be sorted with both custom elements and shadow DOM, we have made major headway since last time. Getting cross-browser agreement on the contentious issues:

For those paying attention, none of this provides a consistent world view throughout. We gave up on that and hope that the combination of the parser doing things synchronously and other contexts not doing that will be enough to get folks to write their components in a way that is resilient to different developer practices.

Three years at Mozilla

I started in London during a “work week” of the Platform group. I had also just moved to London the weekend prior so everything was rather new. I don’t remember much from that week, but it was a nice way to get to know the people I had not met yet through standards and figure out what things I could be contributing to.

Fifteen months later I moved to Switzerland to prepare for the arrival of Oscar and Mozilla has been hugely supportive of that move. That was so awesome. Oscar is too of course and might I add he is a little bigger now and able to walk around the house.

Over the years I have helped out with many different features that ended up in Gecko and Servo (the web engines Mozilla develops) through a common theme. I standardize the way the web works to the best of my ability. In the form of answering questions, working out fixes to standards such as the security model of the Location and Window objects, and helping out with the development of new features such as “foreign fetch”. I hope to continue doing this at Mozilla for many years to come.

W3C forks HTML yet again

The W3C has forked the HTML Standard for the nth time. As always, it is pretty disastrous:

So far this fork has been soundly ignored by the HTML community, which is as expected and desired. We hesitated to post this since we did not want to bring undeserved attention to the fork. But we wanted to make the situation clear to the web standards community, which might otherwise be getting the wrong message. Thus, proceed as before: the standards with green stylesheets are the up-to-date ones that should be used by implementers and developers, and referred to by other standards. They are where work on crucial bugfixes such as setting the correct flags for <img> fetches and exciting new features such as <script type=module> will take place.

If there are blockers preventing your organization from working with the WHATWG, feel free to reach out to us for help in resolving the matter. Deficient forks are not the answer.

— The editors of the HTML Standard

Firefox OS is not helping the web

Mozilla has been working on Firefox OS for quite a while now and ever since I joined I have not been comfortable with it. Not the high-level goal of turning the web into an OS, that seems great, but the misguided approach we are taking to get there.

The problem with Firefox OS is that it started from an ecosystem parallel to the web. Packaged applications written using HTML, JavaScript, and CSS. Distributed through an app store, rather than a URL. And because Mozilla can vet what goes through the store, these applications have access to APIs we could never ship on the web due to the same-origin policy.

This approach was chosen in part because the web does offline poorly, and in part because certain native APIs could not be made to work for the web and alternatives were not duly considered. The latest thinking on Firefox OS does include URLs for applications, but the approach still necessitates a parallel security model to that of the web. Implemented through a second certificate authority system, for code. With as sole authority Mozilla, and a “plan” to decentralize that over time.

As stated, the reason is APIs that violate the same-origin policy, or more generally, go against the assumed browser sandbox. E.g., if Mozilla decides your code is trustworthy, you get access to TCP and can poke around the user’s local network. This is quite similar to app stores, where typically a single authority decides what is trustworthy and what is not. With app stores the user has to install, but has the expectation that the authority (e.g., Apple) only distributes trustworthy software.

I think it is wishful thinking that we could get the wider web community to adopt a parallel certificate authority system for code. The implications for the assumed browser sandbox are huge. Cross-site scripting vulnerabilities in sites with extra authority suddenly result in the user’s local network being compromised. If an authority made a mistake during code review, the user will be at far more risk than usual.

The certificate authority system the web uses today basically verifies that when you connect to example.com, it actually is example.com, and all the bits come from there. And that is already massively complicated and highly political. Scaling that system, or introducing a parallel one as Firefox OS proposes, to work for arbitrary code seems incredibly farfetched.

What we should do instead is double down on the browser. Leverage the assumed browser sandbox. Use all the engineering power this frees up to introduce new APIs that do not require the introduction and adoption of a parallel ecosystem. If we want web email clients to be able to connect to arbitrary email servers, let’s back JMAP. If we want to connect to nearby devices, Fly Web. If we want to do telephony, let’s solidify and enhance the WebRTC, Push, and Service Worker APIs to make that happen.

There are many great things we could do if we put everyone behind the browser. And we would have the support of the wider web community. In the sense that our competitors would feel compelled to also implement these APIs, thereby furthering the growth of the web. As we have learned time and again, the way to change the web is through evolution, not revolution. Small incremental steps that make the web better.