Anne van Kesteren


When working on new features for the web platform we try to be careful not adding too much all at once. And when the need for more functionality arises we generally wait until implementations are reasonable stable before adding it. E.g. the 2D graphics API on top of the canvas element did not support transformations right from the start. And when it eventually did, the text API was yet to be added. Evolving platform features incrementally is important for at least these two reasons:

To some extent this is what we did with the media (video and audio) elements. Support for separate timed tracks (e.g. captions or subtitles) was not there from the beginning. At least not in terms of an API web developers could use. The idea was that if media streams came with such tracks the browser would expose them in its user interface. This idea did not work too well because there was no good timed track format browsers were willing to support. Authors had to hack support for captioning using the timeupdate event. Not ideal, so based on feedback received in the last couple of years Ian started his usual process of gathering requirements and use cases. Out of this work came the following new HTML5 features that are currently being drafted:

As is normal with new features, some are controversial. As anticipated, there is a bit of an outcry over the new WebSRT format. Unfortunately, so far the outcry seems to resolve around it being part of HTML5 (rather than a separate document) and it not being TTML. No technical debate. I personally do not really care where the format is specified. Both inside and outside HTML5 make sense to me. It will allow for independent implementations and that is what matters most.

Of course, the main question is why WebSRT is the format we should go with here. The W3C has been working on TTML for quite a while and there are over fifty other formats in existence. I.e. timed tracks are a gigantic mess. The answer is a combination of simplicity, implementor interest, existing content, and extensibility.

SRT is a really simple format. Not only is this a great way to foster tool development and support in non-browser clients. It is also good for authors. Timed tracks are often tweaked in a simple text editor. Implementors of Mozilla, Apple, and Opera have all said they would like to support the SRT format at a minimum. None of them have interest in TTML (see also captioning markup). SRT content is quite widespread already and is implemented by many players.

SRT by itself however does not meet all the requirements. For the Japanese market some kind of ruby markup support is needed. Fortunately, SRT can be easily extended with support for this. It already has been for some basic formatting. Most players support <i> and <b>. WebSRT will add <ruby> and <rt>. WebSRT will also formally define the format. What it comes down to is that WebSRT has implementor interest, is extremely simple, can be further extended in the future, and builds on a successful existing format. It cannot really get any better than this.


  1. WebSRT may be the best suggested approach so far, but certainly it could be better?

    Yes, I'll send feedback to the list, just saying that it ain't perfect, so let's keep our feet on the ground.

    P.S. Comments must be well-formed XML? Didn't expect to see that here :)

    Posted by Philip Jägenstedt at

  2. @Philip: Multilanguage captions do happen in countries with multiple languages such as Malaysia (Chinese, Malay, Hindi) or Belgium or Canada, etc.

    In japan, there are also many strange ways of displaying subtitles with a combination of vertical and horizontal in the same feature.

    Posted by karl at