Anne van Kesteren

Prefixes

As you might have heard at Opera we decided to support some vendor-prefixed properties from WebKit. For reasons unclear to me we explained this as a problem with developers and received criticism from e.g. Faruk and Wilfred. I do not think this is a problem with developers. Like everyone developers are constrained on time and will ship what makes them and their clients happy.

There are two problems here: slow standardization and impending software monoculture. Whether software monoculture on mobile is impending or has already happened we can tell in the future, but it is clear to everyone that WebKit is a dominating force. And the longer that continues to be the case, the more competitors will have to copy proprietary aspects, including bugs and misfeatures. Netscape 4 and Internet Explorer 5 & 6 have demonstrated that in the past.

Standardization in general is quicker these days, mostly thanks to the WHATWG which pioneered and changed many aspects of how standards are developed now. CSS however is still quite slow. End of 2007 Dave Hyatt announced CSS Animation on the WebKit blog. The first CSS Transitions Module Level 3 draft did not get published before March 2009. And now over three years after that first draft and over four and half since it was initially proposed, it’s still a draft with a theoretical ban on implementors to support it without prefixes. For a wildly popular feature proposed by the market leader on mobile keeping it prefixed for four and half years is just too long.

(See also Vendor Prefixes Are Hurting the Web by Henri.)

Encodings: presentation and more big5

The other day I gave a presentation at a Fronteers meetup here in the Netherlands on bytes and code points, primarily to repeat a point I have made for the past eight years now: Use utf-8! (Thanks to Sam Ruby and Peter Krefting for indirectly helping with this presentation.) But with some new information on what will go wrong if you don’t and which new features will be unavailable to you. I also briefly explained how JavaScript and APIs are the only place where you will be exposed to surrogate pairs and isolated surrogates. As explained in detail by Norbert Lindenberg, this might change, but the additional convenience you get by working with code points is not quite it, as you really want to operate on grapheme clusters.

Today big5 finally got fully defined in the Encoding Standard. Philip Jägenstedt emailed some more analysis on the big5-uao and big5-2003 extensions of big5. It seems the only one we want to support is big5-hkscs which is what the standard defines today, with a slightly restricted encoder.

Apart from some minor bugs the encoding part of the Encoding Standard is complete now, with work on limited/broad encoding sniffing lined up. Just got a little bit closer to fully predictable and understandable platform.

Same code points

I have always wanted to file something away for future claim chowder. Shawn Steele of Microsoft gave me the opportunity: Our implementation of encodings WILL NOT change. Ever.

Encodings: status update and big5

The Encoding Standard now nearly defines all encodings user agents have to support to work with the platform, including all the idiosyncrasies of the decoders and encoders; their indexes, end-of-file handling, and handling of errors. To recap, a decoder maps bytes to Unicode code points, an encoder does the reverse. Either might make use of an index if there is no algorithmic conversion to Unicode possible. Some indexes are huge. Over 9000!

In the end each encoding consists of a decoder and encoder, but the complexity differs. Single-byte encodings are rather trivial, each byte either maps to a code point via an index, or is in error. iso-8859-3, macintosh, and windows-874 are examples of single-byte encodings. The two algorithmic encodings (utf-8 and utf-16) were not too hard to figure out as they are reasonably well documented (minus some error details). In the remaining encodings one or more bytes map to a code point (either directly or via an index or expression). These are the legacy encodings from China, Japan, and Korea. Determining the exact boundaries and testing/reverse engineering them cross browser was a rather involved process. Examples would be euc-kr, shift_jis, and hz-gb-2312.

These complex encodings have proprietary extensions that gained widespread use due to the dominance of Internet Explorer. Other browsers copied the extensions over time reaching a somewhat stable equilibrium. There is one encoding however, that is worse off. The interpretation of proprietary extensions to big5 is a regional affair. The same byte sequence can have a different meaning to a Taiwanese user and a user from Hong Kong. In Taiwan an extension called "big5-uao" (Unicode-at-on) got traction whereas in Hong Kong "big5-hkscs" (Hong Kong Supplementary Character Set) is used. Sites however typically use "big5" as label (presumably because only "big5-hkscs" exists as distinct label and Internet Explorer handles that as "big5").

In Hong Kong Microsoft has provided a patch for Windows for a while that changed system fonts and may or may not have changed the "big5" index in order to support "big5-hkscs" under the "big5" label. In Taiwan something similar happened for "big5-uao".

For the Encoding Standard I went through the dotnetdotcom.org data with help from Simon (part 1 and part 2). Being more fluent with Chinese, Philip Jägenstedt happily took over, analysed my data and even gathered more. 呂康豪 (Kang-Hao Lu) is doing the same and both are reporting process on public-html-ig-zh@w3.org. I hope the conclusion will be that we can define a single "big5" at the cost of breaking a few pages, as regional differences for decoding royally sucks, but time will tell.

Encoding API — naming

The WHATWG is looking into defining an API around encodings. Bytes in, string out, string in, bytes out, the works. The non-streaming encoder design we are looking at now is as follows (the decoder works analogously):

enc = new Encoder("utf-8")
bytes = enc.encode(str)

Streaming would look like:

enc = new Encoder("utf-8")
bytes = enc.encode(str, {continues:true})

Anyone with a better name for the continues dictionary member?

Standards red pill

Similarly to what was said about HTML many years ago, this is where the standards rabbit hole goes:

Visited in 2011

In rough chronological order (see also 2008, 2009, and 2010):

Fronteers and EFF

I forgot about it yesterday, but I also wrote an article for Fronteers’ advent calendar: Het platform bouwen. Reflecting on how we are building the platform and explaining how developers can influence it. As described on the site, Fronteers is the non-profit trade organization of Dutch front-end developers. And it is pretty awesome. As part of the advent calendar, Fronteers donates money to a good cause. I chose the EFF and matched Fronteers’ donation as fighting SOPA and protecting our rights on the internet is important.

Loose thoughts

There have been a few things I wanted to write about, but I manage to never get around to actually do it. So here it goes, before I postpone it yet again. While in Norway for Opera’s annual Christmas party (fun was had) I was able to buy the album Destination by trio42, an ensemble my brother is part of. Via my phone, on iTunes. So easy these days. My brother is probably the sole reason I go to classical music concerts and listen to classical music in general. And I enjoy it greatly. Currently anticipating his solo debut album.

A little before that, I was in Tokyo and worked on the Fullscreen Standard. The API is largely done, but rendering specifics might drastically change still. There I also wrote a parser and validator for WebVTT in ECMAScript. The result is the Live WebVTT Validator. I had never written something semi-serious in ECMAScript before, so that was cool. Together with Mike I climbed Mount Maehotaka. Japan’s eleventh highest mountain as the elderly man told us at the top. Without either food or water this was rather rough, but the view was pretty spectacular. Took a few days to walk sensibly again after that though.

Between those two events there was TPAC (yearly big W3C meeting), where I had the pleasure to get a personal thank you from Jeff Jaffe. There are many things wrong at the W3C, but he inspires confidence in that they will overcome them. Open minded, constantly evaluating, and committed to change. TPAC was close to San Francisco this year by the way, and just before, I went from Tokyo to the Netherlands for a few days. Oops. Other than meetings, going through my inbox, and cycling over the Golden Gate bridge not much was published, but I had a great time after the sleepiness was over.

Back to Norway, last week, Dominique (better known as dom) created a repository so I could publish the Encoding Standard, based on research I did last year. It is still very much work in progress, but has already helped Opera and Mozilla to make adjustments to their browsers to become more interoperable. Currently encoding documentation is all over the place and misses important details, which makes it harder for people who write software that consumes (legacy) content on the internet to interoperate. I want to change that. If you are a web developer, this matters less to you, just use UTF-8.

There is more, but I should really get some food.

Lucky

“A cat met up with a big male rat in the attic and chased him into a corner. The rat, trembling, said, ‘Please don’t eat me, Mr. Cat. I have to go back to my family. I have hungry children waiting for me. Please let me go.’ The cat said, ‘Don’t worry, I won’t eat you. To tell you the truth, I can’t say this too loudly, but I’m a vegetarian. I don’t eat any meat. You were lucky to run into me.’ The rat said, ‘Oh, what a wonderful day! What a lucky rat I am to meet up with a vegetarian cat!’ But the very next second, the cat pounced on the rat, held him down with his claws, and sank his sharp teeth into the rat’s throat. With his last, painful breath, the rat asked him, ‘But Mr. Cat, didn’t you say you’re a vegetarian and don’t eat any meat? Were you lying to me?’ The cat licked his chops and said, ‘True, I don’t eat meat. That was no lie. I’m going to take you home in my mouth and trade you for lettuce.’”

From 1Q84 by Haruki Murakami.