Anne van Kesteren

SVG and text/html

The W3C HTML WG is a lot less active these days than in the beginning. I’m not really sure why that is. Maybe because everyone has now requested their <burger> tag and waits for it to be integrated into the HTML5 (or HTML 5 if you will) draft. Publication of anything noteworthy also doesn’t seem to be nearing although maybe things will move a bit during the W3C Technical Planery of this year where the HTML WG will be meeting in unconference style. (Not all 489 participants will show up. No technical decisions will be made during this meeting or any other meeting for that matter; they are always made asynchronously: through surveys or on the mailing list.)

What has been discussed on the mailing list is some more of that ARIA-goodness. I think people more or less agree with how we are going to tackle this problem in HTML5 and XHTML5 (well, some decent subset of the people involved), but there is some debate now on how to do it best in SVG.

The other interesting thread is about an HTML compatible syntax for SVG. In other words, integrating SVG in text/html. Now that an HTML parser, that is compatible with deployed content and Web browsers, is defined, it is possible to create small extensions on top of it that enable new tricks. As SVG is essentially a DOM-based language the only thing the parser will have to ensure is that the elements end up in the right namespace in the DOM. There are a number of issues to be solved first though:

We want SVG in text/html because deployed content is text/html and software is written for text/html. It has a much higher chance of adoption if it’s easy to integrate in existing content, because people are only required to paste in some string somewhere in their HTML (or XHTML as text/html), as opposed to rethinking and rebuilding their software. I think this is also a strong argument for keeping the new syntax simple. There’s a unique opportunity here to simplify the SVG syntax. We can remove the need for namespaces, prefixes, and maybe even case-sensitivity. Keeping the /> versus > distinction is likely to be useful though as SVG has quite some constructs with optional child elements. Please note that we’re talking about the text/html-syntax here. In the DOM the nodes will be namespaced, have their canonicalized node name, et cetera; SVG renderers won’t have to be updated.

I would expect SVG in HTML5 to look something like the following:

<!doctype html>
<title>SVG in text/html</title>
<p>
 A green circle:
 <svg> <circle r="50" cx="50" cy="50" fill="green"/> </svg>
</p>

(Long ago I wrote a test for SVG in XHTML that works in most browsers. It’s invalid though as a number of required elements are missing.)

It makes sense if MathML is done in a similar way I think. As for other languages, dunno. I suppose it’s nice if the HTML parser has as little knowledge of SVG and MathML as possible, but I’m all for giving up theoretical purity for authoring convenience. (If you’re thinking of Ruby at this point, don’t worry. It will be integrated in due course.)

Comments

  1. How and where will SVG and MathML's text/html serializations be defined? In the HTML5 specification or in specifications of their own? Wouldn't it make most sense to define a general extension mechanism along with a set of rules for serialization to text/html as well as application/xhtml+xml and then move the specifics of serializing SVG and MathML to text/html to their own specifications?

    Posted by Asbjørn Ulsberg at

  2. Well, some part of it definitely has to be in the HTML parser. In theory that part doesn’t have to be SVG or MathML specific, but you can gain a lot if it is (as illustrated by my simple example). I’m also not convinced there’s a real need to allow arbitrary DOM trees to be embedded in some form within the text/html-syntax.

    Posted by Anne van Kesteren at

  3. The problems that are applicable to SVG are also applicable to MathML

    MathML is a little easier than SVG:

    And then you said

    There’s a unique opportunity here to simplify the SVG syntax.

    Since most SVG is tool-generated, there's no compelling reason to "simplify" the syntax in a way that would hurt interoperability between the XML and text/html serializations. It should be a design goal that the same SVG fragment that works in a standalone SVG document should also work when embedded in XHTML and when embedded in HTML5. This may require an SVG profile to prescribe what constructs are "safe" to embed in HTML5 (tools can be rewritten to adhere to that profile). But, whatever that profile is, it should be namespace well-formed SVG. In particular, I wouldn't start downcasing element and attribute names.

    Posted by Jacques Distler at

  4. I'm not quite following why SVG/MathML has to be *inlined* into HTML - why not just use the object tag, and ask browsers to support these formats natively?

    The opposite extreme would be: If you can inline SVG/MathML, why not also inline RDF? ooh! how about inlining images or Flash?

    Posted by roberthahn at

  5. Why do we need SVG embedded into web pages?

    An external file linked in via <img> or background-image is more consistent with the way authors currently add images to their websites. This way lets it be cached, used on several pages (such as in a template), updated independently from the markup, etc. External files can be shared easily, like clipart or stock photography or icon sets.

    I can understand embedding MathML into web pages. But if you do that, CellML and a billion others would need to be let in for fairness?

    So long as unknown elements and attributes end up in the DOM sensibly, are scriptable and stylable, that’s all people actually need?

    If people want to use custom elements, maybe just let them? SVG, MathML, CellML, custom attributes for Microformats and so on could define their own text/html compatible version. HTMLWG could produce a tutorial to help them do that, which would be less work than integrating each and every markup language into text/html ourselves. Or so I’d guess?

    Posted by Ben 'Cerbera' Millard at

  6. Why do we need SVG embedded into web pages?

    Here's a use-case: I allow SVG in comments on my blog. Do you think I should, instead, allow people to embed <object> elements? If so, how do you propose that I sanitize such inclusions?

    Posted by Jacques Distler at

  7. Jacques:

    I noticed that your comment form does not allow image tags. Why not? What harm could an image bring to your site that an SVG file couldn't?

    Perhaps this constraint (using <object /> tags to include SVG) will help people decide between using XHTML and HTML?

    Posted by roberthahn at

  8. Jacques Distler wrote:

    Since most SVG is tool-generated, there's no compelling reason to "simplify" the syntax in a way that would hurt interoperability between the XML and text/html serializations. It should be a design goal that the same SVG fragment that works in a standalone SVG document should also work when embedded in XHTML and when embedded in HTML5. This may require an SVG profile to prescribe what constructs are "safe" to embed in HTML5 (tools can be rewritten to adhere to that profile). But, whatever that profile is, it should be namespace well-formed SVG. In particular, I wouldn't start downcasing element and attribute names.

    For SVG, I think we should tokenize element and attribute names (except perhaps the tag name in the start <svg> tag) case-sensitively, because using existing DOM-reading browser code requires the right case and hard-coding element tables in the parser seems like a bad idea for forward-compatibility. I’m on the fence about what to do with CDATA sections and tokenization of the content of SVG script and style elements. Given your comment above, I take it that you’d prefer the tokenizer to support CDATA sections in SVG in text/html, right?

    Ben 'Cerbera' Millard wrote:

    Why do we need SVG embedded into web pages?

    An external file linked in via <img> or background-image is more consistent with the way authors currently add images to their websites. This way lets it be cached, used on several pages (such as in a template), updated independently from the markup, etc. External files can be shared easily, like clipart or stock photography or icon sets.

    Ben 'Cerbera' Millard wrote:

    I can understand embedding MathML into web pages. But if you do that, CellML and a billion others would need to be let in for fairness?

    This is not about fairness. If we can get big wins by giving preferential treatment to the most-wanted cases, we should go for it instead of letting a syntactically uglier generic solution drag everything down.

    Posted by Henri Sivonen at

  9. roberthahn wrote:

    I noticed that your comment form does not allow image tags. Why not? What harm could an image bring to your site that an SVG file couldn't?

    I don't allow SVG files either. So I'm not sure why you think I am being inconsistent. (Note that I carefully sanitize the inline SVG that I do allow.)

    Henri wrote:

    Given your comment above, I take it that you’d prefer the tokenizer to support CDATA sections in SVG in text/html, right?

    I'm afraid I don't have an opinion (informed or otherwise). As has been noted, not a lot of SVG authoring tools produce CDATA sections (do any?) So, from the point of view of copy/pasteability it's not a big deal. If the objective is to be able to include unescaped scripts or CSS that would work in text/html and XML contexts, that's already possible:

    <script>
    <!--//--><![CDATA[//><!--
       ...
    //--><!]]>
    </script>

    Not pretty, but it doesn't involve changing anyone's parsing model, either.

    Posted by Jacques Distler at

  10. Thank you for the clarification, Jacques. I'm not sure how popular this opinion is, but I still think that SVG (or any other XML dialect) shouldn't be inlined with HTML 5. If such a feature is valuable, then it's a good thing that we have XHTML for the job. I do like the idea that any XML content that is linked in (via an object tag) should be automatically exposed in the DOM though.

    Posted by roberthahn at

  11. I still think that SVG (or any other XML dialect) shouldn't be inlined with HTML 5. If such a feature is valuable, then it's a good thing that we have XHTML for the job.

    I'm still undecided about how I feel about inline SVG. I presented you with a use-case. Dunno how compelling that use-case is. In the case of MathML, however, I happen to know a lot of people who would like to use MathML, but who are not remotely capable of producing XHTML. (Just for starters, look at these people, or this person.)

    Indeed, I'll go one step further and state categorically that I believe the number of people who would benefit from MathML-in-HTML5 vastly exceeds the number of people who are capable of reliably producing XHTML. (And, to make matters worse, these are largely non-overlapping sets.)

    Posted by Jacques Distler at

  12. I have to agree with roberthahn.

    It's increasingly striking me that HTML5 is slowly reimplementing all of the things that an XHTML gives you with very little in the way of rationale. There's nothing wrong with saying “hey, if you want to embed bits of SVG or MathML or WhateverML in your document, you either use <object>, or you use XHTML.”.

    This isn't about backwards-compatibility or popular use-cases, it's about fudging the mark-up to support things XHTML has supported forever (because that's what it's for), as if somebody somewhere has some kind of grudge against XHTML (and if so, why not just fix XHTML?). Browser makers will have to do work to support it in either case: with XHTML, it's part and parcel, but with this it's a bunch of special-cases.

    There's absolutely nothing to suggest that embedded SVG or MathML would become far more widespread as a result of these kludges as compared to the support they already have via XML, as far as I can see (for the next couple of years, it's all down to Microsoft in any case). What am I missing? There are obviously very smart people involved in all of this, so there must be something.

    Posted by Mo at

  13. We produce a WYSIWYG editor and as tool vendor, supporting SVG through the IMG element is by far the best option for vendors and users. The reasons are:

    Opera 9.5 beta supports rendering SVG images through the img element, although the implementation of width and height attributes may be incorrect. Here is a test page:

    http://xhtml.com/misc/svg-img.htm

    Posted by Vlad Alexander at

  14. There's nothing wrong with saying “hey, if you want to embed bits of SVG or MathML or WhateverML in your document, you either use <object>, or you use XHTML.”.

    In the case of SVG, I can see arguments both ways. But, in the case of MathML, using <object> is a complete non-starter. In both cases, the 'alternative' of using XHTML is more-or-less completely out of reach of anyone wishing to do so. Perhaps that will change in the next decade (the estimated ETA of HTML5); more likely, it won't.

    why not just fix XHTML?

    Do you have a proposal for "fixing" XHTML? A non-Draconian parsing model and a new MIME-type to go with it? Something else? And on what timescale could this "fixed" version of XHTML make it to market?

    with XHTML, it's part and parcel, but with this it's a bunch of special-cases.

    You mean because of the lack of namespace support in the parser? Whether or not the browser can do anything useful with some given foreign namespace comes down to a bunch of "special cases" anyway. Sam Ruby proposed a more general mechanism for extending HTML5. There's much to be said for that sort of approach, even though the payoff for tailoring something to the handful of namespaces with actual browser support is more immediate.

    Posted by Jacques Distler at

  15. Actually, this raises an interesting question for Henri and Sam and others (which would also apply to a hypothetical non-Draconian formulation of XHTML+XML). What error-recovery algorithm is to be applied within a MathML or SVG fragment? Do we demand that it satisfy XML well-formedness constraints? (And, if it fails, do what?) It seems to me that the HTML5 error-recovery algorithm is rather tied to the vagaries of HTML, and would not necessarily make sense, applied to a foreign XML dialect.

    Posted by Jacques Distler at

  16. The HTML5 error handling for custom elements is pretty sane.

    Posted by Anne van Kesteren at

  17. The HTML5 error handling for custom elements is pretty sane.

    In Henri's proposal (if not Sam's), the <svg> element is special. It's not just a generic "Phrase Element". To pick an example of the sort of thing I'm wondering about, consider how the current algorithm for handling the <title> element will have to change.

    Posted by Jacques Distler at

  18. Since the parser has to know at some level that the <svg> and <mathml> elements are special to make any of this work, I don't think it's too difficult to add a InForeignDialect phase to the parser which treats all tags like unknown tags in the InBody phase (with the obvious changes to case sensitivity, void element handling, etc.). The problem is slightly more interesting if you want to allow multiple levels of embedding e.g. MathML in HTML in SVG <foreignObject> in HTML, but I guess not insurmountably so.

    Posted by jgraham at

  19. From a practical standpoint, svg in HTML img elements gets you most of what you need.

    But on principle, I don't support inline svg elements, because they're not semantic, they're presentational. Colors, gradients, filters, crops, shapes, and even geometry (e.g. rotation), should be put in CSS instead, so they can be applied to ANY element.

    If you can do something like this in CSS:
    p {rotation:50; shape:oval; gradient(10%, red: 90%, green);}
    then you keep the separation of content from presentation, and you don't need any SVG elements at all!

    Posted by Chris Jay at

  20. If SVG was a binary format instead of a text format, you would not consider putting it inline. So it does not make sense to put it inline just because it's a text format.

    And SVG files can also get quite big, some are over 100KB. That's way too much to put inline.

    Putting SVG inline entices Web page writers to mess with SVG manually and screw things up. Keeping SVG in outside files reduces the risk of manual edits and increases the changes that SVG authoring programs will be used.

    If you put SVG inline, you will get incorrect use of SVG. Web page writers will confuse SVG "a", "title", "script", "textArea" and "font" with the same HTML elements. The element names may be similar but they work differently.

    MathML is different than SVG. So it may not be right to have one solution that works for both SVG and MathML. MathML may work inline but SVG will not.

    I say use the IMG element and keep SVG in outside files.

    Posted by David at

  21. My own take on the discussion.

    "I say use the IMG element and keep SVG in outside files."

    And we incorporate the elements that make up the SVG into the DOM how...?

    SVG is a vector markup. If we want to treat SVG as a raster image we can use a tool to convert a SVG document to a PNG and use IMG.

    As for large SVG files, there are SVG specific approaches one can use to incorporate larger SVG files by reference, including SVG fragments and the SVG image element. Neither approach includes redefining the semantics of the img element.

    Posted by Shelley at

  22. Chris:

    If you can do something like this in CSS: p {rotation:50; shape:oval; gradient(10%, red: 90%, green);} then you keep the separation of content from presentation, and you don't need any SVG elements at all!

    That's an interesting idea: instead of embedding presentation in markup, let's embed markup in the presentation.

    Eventually we'll have a page with nothing but one paragraph element, and a class attribute with 453 entries, and our job will be done.

    Posted by Shelley at

  23. Inlining SVG could make sense for case like explaining things with arrows or organigrams although it does not add any semantics unfortunately. It would have to be closely integrated into html to allow <p> in boxes (like diamonds) for exemple.

    However, it definitely doesn't make any sense to me to inline SVN images like the icons you have on your desktop.

    For inline SVG, how about alternative contents in case the browser doesn't support it?

    Posted by Grégoire Cachet at

  24. I'll just repeat Sam's idea here, since I think it's a good one:

    <script type="image/svg+xml"> <circle r="50" cx="50" cy="50" fill="green"/> </script>

    Posted by Asbjørn Ulsberg at

  25. Actually, this raises an interesting question for Henri and Sam and others (which would also apply to a hypothetical non-Draconian formulation of XHTML+XML). What error-recovery algorithm is to be applied within a MathML or SVG fragment?

    I suggest: Establish a new scope (in the HTML5 tree constructor sense) for svg. Upon end tag, search for a matching start in scope. If there is one, pop up to and including it. If there isn’t, ignore the end tag.

    To pick an example of the sort of thing I'm wondering about, consider how the current algorithm for handling the <title> element will have to change.

    SVG title should probably be parsed as PCDATA.

    Posted by Henri Sivonen at

  26. Short SVG content can be in-line as data URL. Long SVG content should not be in-line at all.

    Where arrows are needed to join labels in diagrams, the labels themselves will feel much better alongside inside SVG.

    Posted by Christopher Yeleighton at