Anne van Kesteren

MathML in HTML5

I read MathML in HTML5 from Jacques Distler today. My understanding of the proposal is as follows:

Given that I don’t think there’s much need for a profile or anything. The language is defined by MathML. HTML5 would just define a different serialization for it, making it more accessible to use in web content. (In the end all that matters is the DOM. And that would be the same here.)

Comments

  1. Ian's proposal was to have a list of elements that are put in the MathML namespace when parsing. I don't think this is flexible enough; what if MathML 3.0 adds new element types? Why would we want to restrict the list of MathML elements one can use? I think it makes more sense to put the math element and all of its descendants in the MathML namespace.

    However, I think some other parsing things need to be altered anyway, for instance elements that are always empty in MathML shouldn't have an end tag in the text/html serialization (IMHO). (I agree thouth that we shouldn't mess with namespace declarations or prefixes.)

    There's also an issue with entities. I think it makes little sense to support only some of them, and supporting all of them would break compat with the Web, so I'd say don't add any of them. MathML is generally generated by tools anyway so let them emit NCRs or real characters instead.

    Posted by zcorpan at

  2. A different serialization means that tools that create the MathML will have to offer two output modes (which makes things harder for authors). And that the produced code will only be useful for a handful of webbrowsers, because tools that use MathML input will not be able to handle the different serialization. No simple copy & paste of a nice formula from the web to your MathML tool. And we can count on authors producing more-or-less invalid MathML in time, making it much harder for MathML tools to ever create an input filter for this serialization.

    But using the XML serialization inside HTML syntax is something that makes me cringe. It is bad enough to see <br /> in plain HTML documents, making it mandatory to produce invalid HTML when you include MathML is just wrong.

    No easy answers here.

    Posted by Rijk at

  3. A different serialization means that tools that create the MathML will have to offer two output modes (which makes things harder for authors).

    Exactly so. Authors are bound to screw up (supplying the wrong commandline flag or whatever) and you will surely see lots of HTML5 documents with the XML serialization of MathML and vice versa. For that, and for the other reasons you cite, it would be best if the HTML5 serialization of MathML were the same (or, more precisely, a special case of) the XML serialization specified in the MathML Spec.

    It is bad enough to see <br /> in plain HTML documents, making it mandatory to produce invalid HTML when you include MathML is just wrong.

    As I understand it, HTML5 is not an application of SGML. And, even if it were, there's no reason in the world that the empty none element of MathML could not simply be declared to use tag-minimization <none />

    Since there's no legacy MathML-in-HTML markup to be backwards-compatible with, the best solution is to choose a serialization with the fewest interop headaches moving forward. Call it "Appendix C" MathML.

    Posted by Jacques Distler at

  4. Copying back shouldn’t be a problem I suppose. Given that you copy the DOM and not the source code.

    Posted by Anne van Kesteren at

  5. I'd say copying the source code would be the normal way - that's also how I copy interesting Javascript snippets, for example. I doubt my mathematics program (if I were a mathematician) would know what to do with DOM. In fact, how do you copy 'the DOM' of a part of a compound document to another application?

    Posted by Rijk at

  6. In fact, how do you copy 'the DOM' of a part of a compound document to another application?

    By serializing as tag soup in between. So in practice with a compound document, you don’t. It is a can of worms.

    (FWIW, the implementation referred to in the referenced document ended up in trash, so please ignore whatever I claimed about the behavior of Cocoa Gecko. The document still says what others do.)

    Posted by Henri Sivonen at

  7. Unfortunately, this proposal is a clear return to the old <irony>good</irony> days of the web, when browsers introduced specific extensions.

    This proposal is not a proposal for MathML in HTML5 (WhatWG) but a Mozilla-only proposal. An Opera math developer already has rejected this and i doubt that Microsoft was interested because they already support more math in HTML 4 than Ian is promoting now. Moreover, many MathML 3 WG folks rejected this also.

    Please do not call it MathML because is not. It is a new language with a new syntax and different parsing rules is not backward compatible with MathML. Fortunately after some debate now Ian accepts that is not MathML. I call it MatHTML in a funny way.

    Early analysis of some of problems with Mozilla proposal. No complete, no updated

    Moreover, since Mozilla can render mathml in HTML pages with prefix .html being served as text/html and i do not see why of this proposal except maybe as anti-Microsoft movement to open XML formats (including maths).

    Posted by Juan R. at

  8. From a DOM point of view, it’s MathML. I don’t really care about things beside that, except that if we introduce it into text/html the serialization format should be simple. The current proposal looks simple to me.

    And regarding Opera, we do have a person on the MathML Working Group, but he isn’t a developer. Also, there are always conflicting views within a company (I hope!). My manager, for instance, likes RDF.

    Posted by Anne van Kesteren at

  9. Great initiative, but the essetially new format needs a new name (if it gets its own spec), or perhaps an appendix to HTML5 is sufficient. Either way, I do not oppose the idea of embedding MathML in HTML documents, although I prefer the XHTML+MathML combination.

    Will there be (or is there) any similar initatives with regards to XForms in HTML, by the way, or would that be too much in conflict with WebForms 2.0?

    Posted by Asbjørn Ulsberg at

  10. From a DOM point of view, it’s MathML. Also <math type='LateX'>\frac{a}{2}</math> is MathML from a DOM point of view, when parsed and converted at client-side (e.g. via JS or XSLT). But the source is not MathML, you cannot copy and paste for instance. Not all MathML tools work via DOM; in fact, MathML is not 'a DOM' in a Mozilla UA, MathML is a well-defined XML DTD.

    I don’t really care about things beside that, except that if we introduce it into text/html the serialization format should be simple. Basically, none of current a hundred of MathML tools would generate valid code for HTML 5 proposal and none specific HTML 5 tool of the future (if any) would generate valid MathML 2 code. Do you not care about that?

    The current proposal looks simple to me. In surface? maybe but it will generates many problems in the developers' arena: incompatibility with current tools, MSIE+MathPLayer problems, incompatibility with Docbook, Elsevier DTD and other currently used standards. Moreover, the proposal is more difficult to be implemented in browsers than pure MathML, e.g. Ian's reduced syntax (also discussed at WhatWG) adds extra parsing layer.

    Regarding Opera, It would sound strange that a guy at Core team joins to the MathML 3 for merging MathML into an XML + CSS framework, whereas Opera people at WhatWG advance steps in contrary way.

    Since Mozilla can render MathML in text/html, and since almost any guy in MathML comunity is rejecting this proposal for HTML 5, it has not rationale.

    Posted by Juan R. at

  11. In the end all that matters is the DOM.

    Most of MathML tools (excluding browsers, more precisely one browser) have nothing to do with DOM.

    Given that you copy the DOM and not the source code.

    Yep, we will edit DOM not source, DOM will travel through wire, DOM will be stored in databases, DOM will be handled by converters, DOM will be published, peer reviewed, proofreaded etc.

    And that would be the same here.

    By the way if DOM is "all that matters", note that there is condition ensuring that DOM is isomorphic to actual input. In XML it is wellformedness that ensures this. In other words well formed markup <mfrac linethickness="1"><mi>N</mi><mi>D</mi></mfrac> produces the same DOM in all browsers, if you remove this condition and admit <MFRAC linethickness="1><mi>N<Mi>D</mi></mfrac></MI> then what kind of the DOM you will get is the matter of error handling and will NEVER work interoperably.

    I don't want to discuss proposal in details here [1], but basically it is damaging and I don't see any fundamental reason for moving in that direction. We have more then enough problems with MathML, adding tagsoup problem is not in our interests.

    [1] there are plenty of discussions around, see for instance

    	http://lists.w3.org/Archives/Public/www-math/2006Oct/
    	http://groups.google.com/group/mozilla.dev.tech.layout
    	http://groups.google.com/group/mozilla.dev.tech.mathml
    

    By the way even you own blog software does not seem to like WhatWG "proposal":

    Your comment does not follow the guidelines, here are the mistakes: 
    Invalid element: MFRAC (line 12) 
    XHTML is not well-formed (line 12)
    

    Posted by White Lynx at

  12. Asbjørn Ulsberg said: Either way, I do not oppose the idea of embedding MathML in HTML documents, although I prefer the XHTML+MathML combination. Yes, the MathML into HTML way can be useful for people wih no control of server mimes, extensions and so on (e.g. webpages in free domains) but it is a not the average user of MathML, therefore no need for a new spec.

    MathML is a XML application therefore its natural environment is XML (MathML is not limited to XHTML) in fact the usage of MathML into XHTML is minimal when compared with usage of MathML into other DTDs.

    If you need MathML into HTML i recommend you a look to my recent message on the W3C MathML list MathML in HTML 4, 5,... See also today notes on Docbook mml:prefixes and final DOM. There you can see how today you can render MathML islands into HTML thanks to a JS from Peter Jipsen. No need for reinventing the wheel doing it square!

    Sorry Anne van Kesteren but "MathML in HTML 4 (5, 6...)" is better than Roger/Ian's proposal for MathML in HTML 5. Some benefits are:

    Regards

    Posted by Juan R. at

  13. I'm surprised to read all this work going into bringing MathML into legacy HTML. As others have pointed out, MathML is an XML application and works quite well in that context. The obstacles to deploying XML to browsers seems quite small compared to the obstacles and headaches in trying to make XML work within HTML. I was just recently reading an old post post from Anne that discusses delivering full-fledged XHTML and XML to IE from two years ago. It’s amazing to me that this hasn’t gotten the attention it deserves. If it had, I imagine we'd be finished talking about finding hacks to deploy XML into legacy HTML.

    After all, so we get MathML working. Then someone brings up XForms. Then another ?ML, and so on. We can put tremendous efforts into trying to make one XML application work in legacy HTML or we can just demand that browsers adequately support XML (and XHTML) and be done with it.

    Posted by Rob Burns at

  14. Whyte Lynx, I’m not sure what you mean. HTML5 is exactly intended to make sure every input will produce a reliable DOM. And please don’t clutter the comments with remarks on how to comment. That’s not very useful (and known).

    Rob, I don’t really see MathML as an XML application. More as a definition of how to render a certain DOM subtree. The way to get to that subtree has been XML so far, but it can be done by other means, as indicated. Regarding other markup languages, the plane is to not let it go that far. For instance, many browser vendors don’t really want XForms. Languages such as SVG are probably better binded to HTML using XBL, et cetera.

    Posted by Anne van Kesteren at