Anne van Kesteren

Browser differences in IDNA ToASCII processing between ASCII and non-ASCII input

At the moment the URL Standard passes the domain of certain schemes through the ToASCII operation for further processing. I believe this to be in line with how the ToASCII operation is defined. It expects a domain, whether ASCII or non-ASCII, and either returns it normalized or errors out.

Unfortunately, it seems like the web depends on ToASCII effectively being a no-op when applied to ASCII-only input (at least for some cases), as is the way browsers seem to behave from these tests:

Input Description ToASCII Expected Chrome 58 dev Edge 14.14393 Firefox 54.0a1 Safari TP 23
x01234567890123456789012345678901234567890123456789012345678901x A domain that is longer than 63 code points. Error, unless VerifyDnsLength is passed. No error. No error. No error. No error.
x01234567890123456789012345678901234567890123456789012345678901† Error. Error. Error. Error.
aa-- A domain that contains hyphens at the third and fourth position. Error. No error. No error. No error. No error.
a†-- Error. No error, returns input. No error, returns xn--a---kp0a. Error.
-x A domain that begins with a hyphen. Error. No error. No error. No error. No error.
-† Error. No error, returns input. No error, returns xn----xhn. Error.

There is also a slight difference in error handling as rather than returning input, Chrome returns the input percent-encoded.

(I used the Live URL Viewer and Live DOM Viewer to get these results, typically prefixing the input with https://.)

Using SSH securely

We have been moving WHATWG standards to be deployed through GitHub and Travis CI. This way we can generate snapshots for each commit which in turn makes it easier to read older obsolete copies of the standard. The final step in our build process moves the resources to the server using SSH.

Unfortunately we have been doing this in a bad way. The documentation from Travis suggests to use ssh_known_hosts and lots of other documentation suggests passing -o StrictHostKeyChecking=no as argument. The risks of these approaches and their secure alternatives are not (always) outlined unfortunately. Both of these open you up to network attackers. You effectively do not know what server you end up connecting to. Could be the one you know, could be that of an attacker. Note also that in case of Travis’s ssh_known_hosts it is not even trust-on-first-use. It is trust-on-each-use (i.e., trust-always). You can be attacked each time Travis runs. I filed issue 472 since what we need is trust-never, as the network is unsafe.

As far as I can tell this is not a big deal for WHATWG standards, since they are completely public and the worst that could happen is that an attacker stops publication of the standard, which they could do even if we had a proper setup (by terminating the network connection). However, it does set a bad example and we would not want folks to copy our code and have to know the limitations of it. It should just be good.

The easiest way to do Travis deployments securely that I have found is to create a known_hosts resource and pass -o UserKnownHostsFile=known_hosts as argument (thanks Tim). That ensures the ssh/scp/rsync -rsh="ssh" program will not prompt. However, rather than not prompting because you told it to bypass a security check, it is not prompting because everything is in order. Of course, this does require that the contents of known_hosts are obtained out-of-band from a secure location, but you need to be doing that anyway.

The XMLHttpRequest Standard now makes use of that secure deployment process and the remainder of WHATWG standards will soon follow.

With that, if any of the following is true, you probably need to fix your setup:

Standards on GitHub

A couple years ago I wrote Contributing to standards and it is worth noting how everything has gotten so much better since then. Basically all due to GitHub and standards groups such as TC39, WHATWG, and W3C embracing it. You can more easily engage with only those standards you are interested in. You can even subscribe to particular issues that interest you and disregard everything else. If you contrast that with mailing lists where you likely get email about dozens of standards and many issues across them, it’s not hard to see how the move to GitHub has democratized standards development. You will get much further with a lot less lost time.

Thanks to pull requests changing standards is easier too. Drive-by-grammar-fixes are a thing now and “good first bug” issue labels help you get started with contributing. Not all groups have adopted one-repository-per-standard yet which can make it a little trickier to contribute to CSS for instance, but hopefully they’ll get there too.

(See also: my reminder on the WHATWG blog that WHATWG standards are developed on GitHub.)


Andrew pointed out webrender yesterday. A new rendering technology for CSS from the folks that are reinventing C++ with Rust and browsers with Servo. There is a great talk about this technology by Patrick Walton. It is worth watching in its entirety, but 26 minutes in has the examples. The key insight is that using a retained mode approach to rendering CSS is much more efficient than an immediate mode approach. The latter is what browsers have been using thus far and makes sense for the canvas element (which offers an immediate mode rendering API), but is apparently suboptimal when talking to the GPU. Patrick mentioned this was pointed out back in 2012 by Mark J. Kilgard and Jeff Bolz from NVIDIA in a paper titled GPU-accelerated Path Rendering: We believe web browsers should behave more like video games in this respect to exploit the GPU.

The reason this is extremely exciting is that if this pans out layout will finally get the huge boost in speed that JavaScript got quite a while ago now. Going from not-even-sixty frames-per-second to hundreds of frames-per-second is just fantastic and also somewhat hard to believe. Always bet on the web?

Fetch Standard 101

The WHATWG Fetch Standard is an essential part of the browser networking subsystem. Basically any API that involves networking (e.g., <img src>, <a href> (through navigation), XMLHttpRequest, @font-face, WebSocket) goes through Fetch. The exception is WebRTC’s RTCDataChannel and perhaps not surprisingly it has a security issue. The fetch() API is also defined in terms of Fetch and the similar naming has led to some confusion. Fetch is basically the subsystem and fetch() is one of the many APIs that exposes (part of) the capabilities of Fetch.

The basic setup is that an API prepares a request, which consists of a URL and a number of variables, feeds that to Fetch, and at some point gets a response, which consists of a body and a number of variables. Fetch takes care of content security policies, referrer policies, invoking service workers, credentials, cache modes, CORS, HSTS, port blocking, default headers (and whether they get exposed to service workers), X-Content-Type-Options: nosniff, and more. In part Fetch defines essential infrastructure such as CORS, redirect handling, port blocking, and overall terminology, and in part it serves as glue between the now numerous standards that together define the browser networking subsystem.

E.g., for redirects, Fetch defines which headers are preserved, whether a request body gets cloned and reused (it usually does), how the referrer policy gets updated, what happens with redirects to non-HTTP schemes (fail, except when navigating sometimes), but the actual connection opening and request transmission is largely left to TLS and HTTP. And as a consequence of all APIs using Fetch, redirects behave the same throughout. There are exceptions to the rule of course, but redirects are no longer a problem we need to solve on a per-API basis. And when you extrapolate this redirects example to content security policies, referrer policies, service workers, and all the other little things Fetch takes care of, it should be clear why it is essential.

(See Fetching URLs for an earlier introduction.)

Web computing

There are two computing models today that have mass-market appeal, are safe-by-default, are app-driven (no OS access), and provide some degree of sandboxing for their apps: Web and Store. The major difference is that Web computing has decentralized publishing (it would be distributed if not for domain registrars and certificate authorities) and Store computing is by definition centralized. Decentralizing Store computing is unlikely to ever succeed and I have argued before that such a system cannot reasonably exist as part of Web computing. (Arguably Web computing is a form of centralized computing. Certificate authorities are ultimately grounded in a list managed by the browser or the OS the browser runs in.)

Web and Store computing both rely on the end user for a number of permission decisions that control powerful APIs. Can this app use geolocation? Can this app use notifications? Can this app use the camera? User-controlled permissions has been a great innovation in computing.

As discussed previously Web computing does not offer HTTP/TCP/UDP access. Web computing might do Bluetooth, but what is on offer is less capable than Store computing and sits behind a permission decision. USB is a similar story and there are undoubtedly more APIs.

Another way of looking at this is that Store computing is vulnerable to exfiltration of intranet data and other “local” attacks. Web computing protects the intranet and the “local network” through the same-origin policy and simply not providing certain APIs. Store computing relies on an initial installation/trust decision by the user, review by the Store owner, and app revocation by the Store owner. Store computing does not require permission decisions for these APIs. And Web computing does not offer permission decisions for these APIs as they are deemed too powerful.

Developers looking to solve problems in the HTTP/TCP/UDP/Bluetooth space will likely become Store computing developers as Web computing cannot address their needs. In turn they might convince their colleagues that Store computing is “better” and slowly grow that ecosystem at the expense of Web computing. The question is then whether there is a mismatch in the security requirements between Web and Store computing or whether this disparity of functionality is intrinsic in their respective security architectures.

The track record of reviewing apps has not been perfect. Google now performs manual reviews for Play Store submissions after getting into ActiveX-level badness and Apple had to battle a malicious version of Xcode. Assuming such mistakes continue to happen users will continue to be vulnerable as they can easily be guided towards installing a Store computing app through directions offered on Web computing (which is typically offered access to as part of Store computing). Of course, were Web computing to offer such APIs users would be vulnerable too. The only recourse would be using the anti-phishing and malware infrastructure, which is not too dissimilar from app revocation. The question is whether users would be more vulnerable.

Assume that Web computing got a trust decision that goes further than just trusting the lock-domain combination in the address bar. The next problem is Web computing apps lacking isolation, i.e., they are vulnerable to XSRF and XSS. Those are a direct result of a shared cookie jar among all Web computing apps for a given user, the ability to manipulate URLs, and the ability to inject code through forms that might end up executing in the app. Store computing apps on iOS have been attacked through URL schemes and on Android through intents, but not to the same extent as Web computing I believe. So apart from a trust decision and revocation, Web computing apps might need new isolation primitives before even being allowed to ask for more trust.

None of that addresses the aspect of app review and the Store having some kind of relationship with the app developer, other than perhaps dismissing that as not being a crucial part of the security story. The question is whether app review can reasonably protect against intranet data exfiltration.

The goal of this post is to frame the problem space and encourage exploration towards solutions that make Web computing more powerful, without ceding the aspects we hold dear. That is, if Web computing provides a new trust decision, isolation, and revocation, can it expose HTTP/TCP/UDP/Bluetooth and more?

Translation from PR-speak to English of selected portions of “Perspectives on security research, consensus and W3C Process”

From “Perspectives on security research, consensus and W3C Process”:

There have been a number articles and blog posts about the W3C EME work but we’ve not been able to offer counterpoints to every public post…

There has been enough of a shitstorm about W3C and DRM that we had to write something.

First, the W3C is concerned about risks for security researchers.

We are concerned with the PR-optics of the EFF rallying against us.

W3C TAG statements have policy weight.

The W3C TAG has no place in the W3C Process.

This TAG statement was reiterated in an EME Factsheet, published before the W3C Advisory Committee meeting in March 2016 as well as in the W3C blog post in April 2016 published when the EME work was allowed to continue.

The W3C TAG gets some publicity, but has no place in the W3C Process.

Second, EME is not a DRM standard.

We are actively enabling DRM systems to integrate with the web.

The W3C took the EFF covenant proposal extremely seriously.

We proposed it to the four-hundred-and-some conservative Member companies and let them do the dirty work, per usual. We will only lead the web to its full potential when there is agreement among the four-hundred-and-some conservative Member companies.

One criteria for Recommendation is that the ideas in the technical report are appropriate for widespread deployment and EME is already deployed in almost all browsers.

We will continue to ignore the actual ramifications of browsers shipping DRM systems.

Web platform security boundaries

What are the various security boundaries the platform offers? I have an idea, but I’m not completely sure whether it is exhaustive:

There is also the HTTP cache, which leaks everywhere, but is far less reliable.

Making the DOM faster

There is a pretty good comment on Hacker News (of all places) by nostrademons explaining why the DOM is not slow. Basically, all the DOM mutation operations are pretty trivial. Inserting and removing nodes, moving them around, setting an attribute, etc. It’s layout that takes a while and layout can be synchronously triggered through APIs such as offsetWidth and getComputedStyle(). So you have to be careful to group all your DOM mutations and only start asking layout questions afterwards.

(This has been a known pattern in some circles for a long time; tip of my imaginary hat to my former Q42 colleagues. However, I don’t think it’s generally well-known, especially given some of the “DOM is slow” advocacy I have seen of late.)

Still, whenever you invoke insertBefore() or remove(), there is some cost as these JavaScript functions are ultimately backed by C++ (nay Rust) with an IDL translation layer inbetween that makes sure the C++ doesn’t get to see anything funky. This happens as it’s important that the C++-backed DOM remains in a consistent state for all the other algorithms that run in the browser to not get confused. Research doing the DOM entirely in JavaScript has halted and in fact would hinder efforts to do layout in parallel, which is being spearheaded by Servo.

Yehuda came up with an idea, based on writing code for Ember.js, which in turn has been inspired by React, to represent these mutation operations somehow and apply them to a DOM in one go. That way, you only do the IDL-dance once and the browser then manipulates the tree in C++ with the many operations you fed it. Basically the inverse of mutation records used by mutation observers. With such a DOM mutation representation, you can make thousands of mutations and only pay the IDL cost once.

Having such an API would:

  1. Encourage good practice. By providing a single entry-point to mutating the DOM, developers will be encouraged to group DOM updates together before doing any kind of layout. If more sites are structured this way that will increase their performance.
  2. Improve engine performance. This requires further testing to make sure there is indeed a somewhat non-trivial IDL cost today and that we can reduce it by passing the necessary instructions more efficiently than through method calls.
  3. Potentially enable more parallelism by preparing these DOM updates in a worker, via supporting this DOM mutation representation in workers and making it transferable. That reduces the amount of DOM work done where user interaction needs to take place.

Looking forward to hear what folks think!

Update: Boris Zbarsky weighs in with some great implementer perspective on the small cost of IDL and the various tradeoffs to consider for a browser-provided API.

Users, clients, and servers

Some thoughts and lessons learned on the web’s client-server model, interoperability, and standards:

In other words, the robustness principle most certainly applies to clients, whereas servers have more freedom to be strict (or broken, if you will) both ways. This model also applies if you look at the individual subsystems, e.g., JavaScript language implementations versus deployed JavaScript code and HTML implementations versus deployed HTML. We have found time and again that minor divergence between client implementations leads to broken behavior for a given server, whereas the reverse does not apply.

Users win, software scrambles.