Anne van Kesteren

webrender

Andrew pointed out webrender yesterday. A new rendering technology for CSS from the folks that are reinventing C++ with Rust and browsers with Servo. There is a great talk about this technology by Patrick Walton. It is worth watching in its entirety, but 26 minutes in has the examples. The key insight is that using a retained mode approach to rendering CSS is much more efficient than an immediate mode approach. The latter is what browsers have been using thus far and makes sense for the canvas element (which offers an immediate mode rendering API), but is apparently suboptimal when talking to the GPU. Patrick mentioned this was pointed out back in 2012 by Mark J. Kilgard and Jeff Bolz from NVIDIA in a paper titled GPU-accelerated Path Rendering: We believe web browsers should behave more like video games in this respect to exploit the GPU.

The reason this is extremely exciting is that if this pans out layout will finally get the huge boost in speed that JavaScript got quite a while ago now. Going from not-even-sixty frames-per-second to hundreds of frames-per-second is just fantastic and also somewhat hard to believe. Always bet on the web?

Fetch Standard 101

The WHATWG Fetch Standard is an essential part of the browser networking subsystem. Basically any API that involves networking (e.g., <img src>, <a href> (through navigation), XMLHttpRequest, @font-face, WebSocket) goes through Fetch. The exception is WebRTC’s RTCDataChannel and perhaps not surprisingly it has a security issue. The fetch() API is also defined in terms of Fetch and the similar naming has led to some confusion. Fetch is basically the subsystem and fetch() is one of the many APIs that exposes (part of) the capabilities of Fetch.

The basic setup is that an API prepares a request, which consists of a URL and a number of variables, feeds that to Fetch, and at some point gets a response, which consists of a body and a number of variables. Fetch takes care of content security policies, referrer policies, invoking service workers, credentials, cache modes, CORS, HSTS, port blocking, default headers (and whether they get exposed to service workers), X-Content-Type-Options: nosniff, and more. In part Fetch defines essential infrastructure such as CORS, redirect handling, port blocking, and overall terminology, and in part it serves as glue between the now numerous standards that together define the browser networking subsystem.

E.g., for redirects, Fetch defines which headers are preserved, whether a request body gets cloned and reused (it usually does), how the referrer policy gets updated, what happens with redirects to non-HTTP schemes (fail, except when navigating sometimes), but the actual connection opening and request transmission is largely left to TLS and HTTP. And as a consequence of all APIs using Fetch, redirects behave the same throughout. There are exceptions to the rule of course, but redirects are no longer a problem we need to solve on a per-API basis. And when you extrapolate this redirects example to content security policies, referrer policies, service workers, and all the other little things Fetch takes care of, it should be clear why it is essential.

(See Fetching URLs for an earlier introduction.)

Web computing

There are two computing models today that have mass-market appeal, are safe-by-default, are app-driven (no OS access), and provide some degree of sandboxing for their apps: Web and Store. The major difference is that Web computing has decentralized publishing (it would be distributed if not for domain registrars and certificate authorities) and Store computing is by definition centralized. Decentralizing Store computing is unlikely to ever succeed and I have argued before that such a system cannot reasonably exist as part of Web computing. (Arguably Web computing is a form of centralized computing. Certificate authorities are ultimately grounded in a list managed by the browser or the OS the browser runs in.)

Web and Store computing both rely on the end user for a number of permission decisions that control powerful APIs. Can this app use geolocation? Can this app use notifications? Can this app use the camera? User-controlled permissions has been a great innovation in computing.

As discussed previously Web computing does not offer HTTP/TCP/UDP access. Web computing might do Bluetooth, but what is on offer is less capable than Store computing and sits behind a permission decision. USB is a similar story and there are undoubtedly more APIs.

Another way of looking at this is that Store computing is vulnerable to exfiltration of intranet data and other “local” attacks. Web computing protects the intranet and the “local network” through the same-origin policy and simply not providing certain APIs. Store computing relies on an initial installation/trust decision by the user, review by the Store owner, and app revocation by the Store owner. Store computing does not require permission decisions for these APIs. And Web computing does not offer permission decisions for these APIs as they are deemed too powerful.

Developers looking to solve problems in the HTTP/TCP/UDP/Bluetooth space will likely become Store computing developers as Web computing cannot address their needs. In turn they might convince their colleagues that Store computing is “better” and slowly grow that ecosystem at the expense of Web computing. The question is then whether there is a mismatch in the security requirements between Web and Store computing or whether this disparity of functionality is intrinsic in their respective security architectures.

The track record of reviewing apps has not been perfect. Google now performs manual reviews for Play Store submissions after getting into ActiveX-level badness and Apple had to battle a malicious version of Xcode. Assuming such mistakes continue to happen users will continue to be vulnerable as they can easily be guided towards installing a Store computing app through directions offered on Web computing (which is typically offered access to as part of Store computing). Of course, were Web computing to offer such APIs users would be vulnerable too. The only recourse would be using the anti-phishing and malware infrastructure, which is not too dissimilar from app revocation. The question is whether users would be more vulnerable.

Assume that Web computing got a trust decision that goes further than just trusting the lock-domain combination in the address bar. The next problem is Web computing apps lacking isolation, i.e., they are vulnerable to XSRF and XSS. Those are a direct result of a shared cookie jar among all Web computing apps for a given user, the ability to manipulate URLs, and the ability to inject code through forms that might end up executing in the app. Store computing apps on iOS have been attacked through URL schemes and on Android through intents, but not to the same extent as Web computing I believe. So apart from a trust decision and revocation, Web computing apps might need new isolation primitives before even being allowed to ask for more trust.

None of that addresses the aspect of app review and the Store having some kind of relationship with the app developer, other than perhaps dismissing that as not being a crucial part of the security story. The question is whether app review can reasonably protect against intranet data exfiltration.

The goal of this post is to frame the problem space and encourage exploration towards solutions that make Web computing more powerful, without ceding the aspects we hold dear. That is, if Web computing provides a new trust decision, isolation, and revocation, can it expose HTTP/TCP/UDP/Bluetooth and more?

Translation from PR-speak to English of selected portions of “Perspectives on security research, consensus and W3C Process”

From “Perspectives on security research, consensus and W3C Process”:

There have been a number articles and blog posts about the W3C EME work but we’ve not been able to offer counterpoints to every public post…

There has been enough of a shitstorm about W3C and DRM that we had to write something.

First, the W3C is concerned about risks for security researchers.

We are concerned with the PR-optics of the EFF rallying against us.

W3C TAG statements have policy weight.

The W3C TAG has no place in the W3C Process.

This TAG statement was reiterated in an EME Factsheet, published before the W3C Advisory Committee meeting in March 2016 as well as in the W3C blog post in April 2016 published when the EME work was allowed to continue.

The W3C TAG gets some publicity, but has no place in the W3C Process.

Second, EME is not a DRM standard.

We are actively enabling DRM systems to integrate with the web.

The W3C took the EFF covenant proposal extremely seriously.

We proposed it to the four-hundred-and-some conservative Member companies and let them do the dirty work, per usual. We will only lead the web to its full potential when there is agreement among the four-hundred-and-some conservative Member companies.

One criteria for Recommendation is that the ideas in the technical report are appropriate for widespread deployment and EME is already deployed in almost all browsers.

We will continue to ignore the actual ramifications of browsers shipping DRM systems.

Web platform security boundaries

What are the various security boundaries the platform offers? I have an idea, but I’m not completely sure whether it is exhaustive:

There is also the HTTP cache, which leaks everywhere, but is far less reliable.

Making the DOM faster

There is a pretty good comment on Hacker News (of all places) by nostrademons explaining why the DOM is not slow. Basically, all the DOM mutation operations are pretty trivial. Inserting and removing nodes, moving them around, setting an attribute, etc. It’s layout that takes a while and layout can be synchronously triggered through APIs such as offsetWidth and getComputedStyle(). So you have to be careful to group all your DOM mutations and only start asking layout questions afterwards.

(This has been a known pattern in some circles for a long time; tip of my imaginary hat to my former Q42 colleagues. However, I don’t think it’s generally well-known, especially given some of the “DOM is slow” advocacy I have seen of late.)

Still, whenever you invoke insertBefore() or remove(), there is some cost as these JavaScript functions are ultimately backed by C++ (nay Rust) with an IDL translation layer inbetween that makes sure the C++ doesn’t get to see anything funky. This happens as it’s important that the C++-backed DOM remains in a consistent state for all the other algorithms that run in the browser to not get confused. Research doing the DOM entirely in JavaScript has halted and in fact would hinder efforts to do layout in parallel, which is being spearheaded by Servo.

Yehuda came up with an idea, based on writing code for Ember.js, which in turn has been inspired by React, to represent these mutation operations somehow and apply them to a DOM in one go. That way, you only do the IDL-dance once and the browser then manipulates the tree in C++ with the many operations you fed it. Basically the inverse of mutation records used by mutation observers. With such a DOM mutation representation, you can make thousands of mutations and only pay the IDL cost once.

Having such an API would:

  1. Encourage good practice. By providing a single entry-point to mutating the DOM, developers will be encouraged to group DOM updates together before doing any kind of layout. If more sites are structured this way that will increase their performance.
  2. Improve engine performance. This requires further testing to make sure there is indeed a somewhat non-trivial IDL cost today and that we can reduce it by passing the necessary instructions more efficiently than through method calls.
  3. Potentially enable more parallelism by preparing these DOM updates in a worker, via supporting this DOM mutation representation in workers and making it transferable. That reduces the amount of DOM work done where user interaction needs to take place.

Looking forward to hear what folks think!

Update: Boris Zbarsky weighs in with some great implementer perspective on the small cost of IDL and the various tradeoffs to consider for a browser-provided API.

Users, clients, and servers

Some thoughts and lessons learned on the web’s client-server model, interoperability, and standards:

In other words, the robustness principle most certainly applies to clients, whereas servers have more freedom to be strict (or broken, if you will) both ways. This model also applies if you look at the individual subsystems, e.g., JavaScript language implementations versus deployed JavaScript code and HTML implementations versus deployed HTML. We have found time and again that minor divergence between client implementations leads to broken behavior for a given server, whereas the reverse does not apply.

Users win, software scrambles.

HTML components

Hayato left a rather flattering review comment to my pull request for integrating shadow tree event dispatch into the DOM Standard. It made me reflect upon all the effort that came before us with regards to adding components to DOM and HTML. It has been a nearly two-decade journey to get to a point where all browsers are willing to implement, and then ship. It is not quite a glacial pace, but you can see why folks say that about standards.

What I think was the first proposal was simply titled HTML Components, better known as HTC, a technology by the Microsoft Internet Explorer team. Then in 2000, published in early 2001, came XBL, a technology developed at Netscape by Dave Hyatt (now at Apple). In some form that variant of XBL still lives on in Firefox today, although at this point it is considered technical debt.

In 2004 we got sXBL and in 2006 XBL 2.0, the latter largely driven by Ian Hickson with design input from Dave Hyatt. sXBL had various design disputes that could not be resolved among the participants. Selectors versus XPath was a big one. Though even with XBL 2.0 the lesson that namespaces are an unnecessary evil for rather tightly coupled languages was not yet learned. A late half-hearted revision of XBL 2.0 did drop most of the XML aspects, but by that time interest had waned.

There was another multi-year gap and then from 2011 onwards the Google Chrome team put effort into a different, more API-y approach towards HTML components. This was rather contentious initially, but after recent compromises with regards to encapsulation, constructors for custom elements, and moving from selectors to an even more simplistic model (basically strings), this seems to be the winning formula. A lot of it is now part of the DOM Standard and we also started updating the HTML Standard to account for shadow trees, e.g., making sure script elements execute.

Hopefully implementations follow soon and then widespread usage to cement it for a long time to come.

Network effects affecting standards

In the context of another WebKit issue around URL parsing, Alexey pointed me to WebKit bug 116887. It handily demonstrates why the web needs the URL Standard. The server reacts differently to %7B than it does to {. It expects the latter, despite the IETF STD not even mentioning that code point.

Partly to blame here are the browsers. In the early days code shipped without much quality assurance and many features got added in a short period of time. While standards evolved there was not much of a feedback loop going on with the browsers. There was no organized testing effort either, so the mismatch grew.

On the other side, you have the standards folks ignoring the browsers. While they did not necessarily partake in the standards debate back then that much, browsers have had an enormous influence on the web. They are the single most important piece of client software out there. They are even kind of a meta-client. They are used to fetch clients that in turn talk to the internet. As an example, the Firefox meta-client can be used to get and use the FastMail email client.

And this kind of dominance means that it does not matter much what standards say, it matters what the most-used clients ship. Since when you are putting together some server software, and have deadlines to make, you typically do not start with reading standards. You figure out what bits you get from the client and operate on that. And that is typically rather simplistic. You would not use a URL parser, but rather various kinds of string manipulation. Dare I say, regular expressions.

This might ring some bells. If it did, that is because this story also applies to HTML parsing, text encodings, cookies, and basically anything that browsers have deployed at scale and developers have made use of. This is why standards are hardly ever finished. Most of them require decades of iteration to get the details right, but as you know that does not mean you cannot start using any of it right now. And getting the details right is important. We need interoperable URL parsing for security, for developers to build upon them without tons of cross-browser workarounds, and to elevate the overall abstraction level at which engineering needs to happen.

Upstreaming custom elements and shadow DOM

We have reached the point where custom elements and shadow DOM slowly make their way into other standards. The Custom Elements draft has been reformatted as a series of patches by Domenic and I have been migrating parts of the Shadow DOM draft into the DOM Standard and HTML Standard. Once done this should remove a whole bunch of implementation questions that have come up and essentially turn these features into new fundamental primitives all other features will have to cope with.

There are still a number of open issues for which we need to reach rough consensus and input on those would certainly be appreciated. And there is still quite a bit of work to be done once those issues are resolved. Many features in HTML and CSS will need to be adjusted to work correctly within shadow trees. E.g., as things stand today in the HTML Standard a script element does not execute, an img element does not load, et cetera.