Anne van Kesteren

Testing standards

At a high level, standards organizations operate in similar ways. A standard is produced and implementations follow. Taking a cue from software engineering, WHATWG added active maintenance to the mix by producing Living Standards. The idea being that just like unmaintained software, unmaintained standards lead to security issues and shaky foundations.

The W3C worked on test suites, but never drove it to the point of test-driven development or ensuring the test suites fully covered the standards. The WHATWG community produced some tests, e.g., for the HTML parser and the canvas API, but there was never a concerted effort. The idea being that as long as you have a detailed enough standard, interoperable implementations will follow.

Those with a background in quality assurance, and those who might have read Mark Pilgrim’s Why specs matter, probably know this to be false, yet it has taken a long time for tests to be considered an essential part of the standardization process. We’re getting there in terms of acceptance, which is great as crucial parts of the web platform, such as CSS, HTML, HTTP, and smaller things like MIME types and URLs, all have the same kind of long-standing interoperability issues.

These interoperability issues are detrimental to all constituencies:

Therefore I’d like everyone to take this far more seriously than they have been. Always ask about the testing story for a standard. If it doesn’t have one, consider that a red flag. If you’re working on a standard, figure out how you can test it (hint: web-platform-tests). If you work on a standard that can be implemented by lots of different software, ensure the test suite is generic enough to accommodate that (shared JSON resources with software-specific wrappers have been a favorite of mine).

Effectively, this is another cue standards needs to take from modern software development practices. Serious software will require tests to accompany changes, standards should too. Ensuring standards, tests, and implementations are developed in tandem results in a virtuous cycle of interoperability goodness.

(It would be wrong not to acknowledge Ecma’s TC39 here, who produced a standard for JavaScript that is industry-leading with everything derived from first principles, and also produced a corresponding comprehensive test suite shared among all implementations. It’s a complex standard to read, but the resulting robust implementations are hard to argue with.)

Using GitHub

I’ve been asked a few times how I stay on top of GitHub:

This works well for me, it may work for you.

What I miss is Bugzilla’s needinfo. I could see this as a persistent notification that cannot be dismissed until you go into the thread and perform the action asked of you. What I also miss on /notifications is the ability to see if someone mentioned me in a thread. I often want to unsubscribe based on the title, but I don’t always do it out of fear of neglecting someone.


Dara was born.

MIME type interoperability

In order to figure out data: URL processing requirements I have been studying MIME types (also known as media types) lately. I thought I would share some examples that yield different results across user agents, mostly to demonstrate that even simple things are far from interoperable:

These are the relatively simple issues to deal with, though it would have been nice if they had been sorted by now. The MIME type parsing issue also looks at parsing for the Content-Type header, which is even messier, with different requirements for its request and response variants.

Browser differences in IDNA ToASCII processing between ASCII and non-ASCII input

At the moment the URL Standard passes the domain of certain schemes through the ToASCII operation for further processing. I believe this to be in line with how the ToASCII operation is defined. It expects a domain, whether ASCII or non-ASCII, and either returns it normalized or errors out.

Unfortunately, it seems like the web depends on ToASCII effectively being a no-op when applied to ASCII-only input (at least for some cases), as is the way browsers seem to behave from these tests:

Input Description ToASCII Expected Chrome 58 dev Edge 14.14393 Firefox 54.0a1 Safari TP 23
x01234567890123456789012345678901234567890123456789012345678901x A domain that is longer than 63 code points. Error, unless VerifyDnsLength is passed. No error. No error. No error. No error.
x01234567890123456789012345678901234567890123456789012345678901† Error. Error. Error. Error.
aa-- A domain that contains hyphens at the third and fourth position. Error. No error. No error. No error. No error.
a†-- Error. No error, returns input. No error, returns xn--a---kp0a. Error.
-x A domain that begins with a hyphen. Error. No error. No error. No error. No error.
-† Error. No error, returns input. No error, returns xn----xhn. Error.

There is also a slight difference in error handling as rather than returning input, Chrome returns the input percent-encoded.

(I used the Live URL Viewer and Live DOM Viewer to get these results, typically prefixing the input with https://.)

Using SSH securely

We have been moving WHATWG standards to be deployed through GitHub and Travis CI. This way we can generate snapshots for each commit which in turn makes it easier to read older obsolete copies of the standard. The final step in our build process moves the resources to the server using SSH.

Unfortunately we have been doing this in a bad way. The documentation from Travis suggests to use ssh_known_hosts and lots of other documentation suggests passing -o StrictHostKeyChecking=no as argument. The risks of these approaches and their secure alternatives are not (always) outlined unfortunately. Both of these open you up to network attackers. You effectively do not know what server you end up connecting to. Could be the one you know, could be that of an attacker. Note also that in case of Travis’s ssh_known_hosts it is not even trust-on-first-use. It is trust-on-each-use (i.e., trust-always). You can be attacked each time Travis runs. I filed issue 472 since what we need is trust-never, as the network is unsafe.

As far as I can tell this is not a big deal for WHATWG standards, since they are completely public and the worst that could happen is that an attacker stops publication of the standard, which they could do even if we had a proper setup (by terminating the network connection). However, it does set a bad example and we would not want folks to copy our code and have to know the limitations of it. It should just be good.

The easiest way to do Travis deployments securely that I have found is to create a known_hosts resource and pass -o UserKnownHostsFile=known_hosts as argument (thanks Tim). That ensures the ssh/scp/rsync -rsh="ssh" program will not prompt. However, rather than not prompting because you told it to bypass a security check, it is not prompting because everything is in order. Of course, this does require that the contents of known_hosts are obtained out-of-band from a secure location, but you need to be doing that anyway.

The XMLHttpRequest Standard now makes use of that secure deployment process and the remainder of WHATWG standards will soon follow.

With that, if any of the following is true, you probably need to fix your setup:

Standards on GitHub

A couple years ago I wrote Contributing to standards and it is worth noting how everything has gotten so much better since then. Basically all due to GitHub and standards groups such as TC39, WHATWG, and W3C embracing it. You can more easily engage with only those standards you are interested in. You can even subscribe to particular issues that interest you and disregard everything else. If you contrast that with mailing lists where you likely get email about dozens of standards and many issues across them, it’s not hard to see how the move to GitHub has democratized standards development. You will get much further with a lot less lost time.

Thanks to pull requests changing standards is easier too. Drive-by-grammar-fixes are a thing now and “good first bug” issue labels help you get started with contributing. Not all groups have adopted one-repository-per-standard yet which can make it a little trickier to contribute to CSS for instance, but hopefully they’ll get there too.

(See also: my reminder on the WHATWG blog that WHATWG standards are developed on GitHub.)


Andrew pointed out webrender yesterday. A new rendering technology for CSS from the folks that are reinventing C++ with Rust and browsers with Servo. There is a great talk about this technology by Patrick Walton. It is worth watching in its entirety, but 26 minutes in has the examples. The key insight is that using a retained mode approach to rendering CSS is much more efficient than an immediate mode approach. The latter is what browsers have been using thus far and makes sense for the canvas element (which offers an immediate mode rendering API), but is apparently suboptimal when talking to the GPU. Patrick mentioned this was pointed out back in 2012 by Mark J. Kilgard and Jeff Bolz from NVIDIA in a paper titled GPU-accelerated Path Rendering: We believe web browsers should behave more like video games in this respect to exploit the GPU.

The reason this is extremely exciting is that if this pans out layout will finally get the huge boost in speed that JavaScript got quite a while ago now. Going from not-even-sixty frames-per-second to hundreds of frames-per-second is just fantastic and also somewhat hard to believe. Always bet on the web?

Fetch Standard 101

The WHATWG Fetch Standard is an essential part of the browser networking subsystem. Basically any API that involves networking (e.g., <img src>, <a href> (through navigation), XMLHttpRequest, @font-face, WebSocket) goes through Fetch. The exception is WebRTC’s RTCDataChannel and perhaps not surprisingly it has a security issue. The fetch() API is also defined in terms of Fetch and the similar naming has led to some confusion. Fetch is basically the subsystem and fetch() is one of the many APIs that exposes (part of) the capabilities of Fetch.

The basic setup is that an API prepares a request, which consists of a URL and a number of variables, feeds that to Fetch, and at some point gets a response, which consists of a body and a number of variables. Fetch takes care of content security policies, referrer policies, invoking service workers, credentials, cache modes, CORS, HSTS, port blocking, default headers (and whether they get exposed to service workers), X-Content-Type-Options: nosniff, and more. In part Fetch defines essential infrastructure such as CORS, redirect handling, port blocking, and overall terminology, and in part it serves as glue between the now numerous standards that together define the browser networking subsystem.

E.g., for redirects, Fetch defines which headers are preserved, whether a request body gets cloned and reused (it usually does), how the referrer policy gets updated, what happens with redirects to non-HTTP schemes (fail, except when navigating sometimes), but the actual connection opening and request transmission is largely left to TLS and HTTP. And as a consequence of all APIs using Fetch, redirects behave the same throughout. There are exceptions to the rule of course, but redirects are no longer a problem we need to solve on a per-API basis. And when you extrapolate this redirects example to content security policies, referrer policies, service workers, and all the other little things Fetch takes care of, it should be clear why it is essential.

(See Fetching URLs for an earlier introduction.)

Web computing

There are two computing models today that have mass-market appeal, are safe-by-default, are app-driven (no OS access), and provide some degree of sandboxing for their apps: Web and Store. The major difference is that Web computing has decentralized publishing (it would be distributed if not for domain registrars and certificate authorities) and Store computing is by definition centralized. Decentralizing Store computing is unlikely to ever succeed and I have argued before that such a system cannot reasonably exist as part of Web computing. (Arguably Web computing is a form of centralized computing. Certificate authorities are ultimately grounded in a list managed by the browser or the OS the browser runs in.)

Web and Store computing both rely on the end user for a number of permission decisions that control powerful APIs. Can this app use geolocation? Can this app use notifications? Can this app use the camera? User-controlled permissions has been a great innovation in computing.

As discussed previously Web computing does not offer HTTP/TCP/UDP access. Web computing might do Bluetooth, but what is on offer is less capable than Store computing and sits behind a permission decision. USB is a similar story and there are undoubtedly more APIs.

Another way of looking at this is that Store computing is vulnerable to exfiltration of intranet data and other “local” attacks. Web computing protects the intranet and the “local network” through the same-origin policy and simply not providing certain APIs. Store computing relies on an initial installation/trust decision by the user, review by the Store owner, and app revocation by the Store owner. Store computing does not require permission decisions for these APIs. And Web computing does not offer permission decisions for these APIs as they are deemed too powerful.

Developers looking to solve problems in the HTTP/TCP/UDP/Bluetooth space will likely become Store computing developers as Web computing cannot address their needs. In turn they might convince their colleagues that Store computing is “better” and slowly grow that ecosystem at the expense of Web computing. The question is then whether there is a mismatch in the security requirements between Web and Store computing or whether this disparity of functionality is intrinsic in their respective security architectures.

The track record of reviewing apps has not been perfect. Google now performs manual reviews for Play Store submissions after getting into ActiveX-level badness and Apple had to battle a malicious version of Xcode. Assuming such mistakes continue to happen users will continue to be vulnerable as they can easily be guided towards installing a Store computing app through directions offered on Web computing (which is typically offered access to as part of Store computing). Of course, were Web computing to offer such APIs users would be vulnerable too. The only recourse would be using the anti-phishing and malware infrastructure, which is not too dissimilar from app revocation. The question is whether users would be more vulnerable.

Assume that Web computing got a trust decision that goes further than just trusting the lock-domain combination in the address bar. The next problem is Web computing apps lacking isolation, i.e., they are vulnerable to XSRF and XSS. Those are a direct result of a shared cookie jar among all Web computing apps for a given user, the ability to manipulate URLs, and the ability to inject code through forms that might end up executing in the app. Store computing apps on iOS have been attacked through URL schemes and on Android through intents, but not to the same extent as Web computing I believe. So apart from a trust decision and revocation, Web computing apps might need new isolation primitives before even being allowed to ask for more trust.

None of that addresses the aspect of app review and the Store having some kind of relationship with the app developer, other than perhaps dismissing that as not being a crucial part of the security story. The question is whether app review can reasonably protect against intranet data exfiltration.

The goal of this post is to frame the problem space and encourage exploration towards solutions that make Web computing more powerful, without ceding the aspects we hold dear. That is, if Web computing provides a new trust decision, isolation, and revocation, can it expose HTTP/TCP/UDP/Bluetooth and more?