Anne van Kesteren

XML entities

There are several solutions to the XML entities problem. We just have to pick one:

Comments

  1. Or (d) do nothing.

    If you want exotic entities in an XML document, either include them as literals (preferably as UTF-8, but that's a matter of preference and known support footprint) or use numeric entities (the latter being slightly more useful for things like non-breaking spaces where spotting them might be tricky). I don't really see why it's so immensely important to support   et al, other than “because HTML does”.

    Any change to the level of support will take a very long time before it's mainstream enough to be useful in any case.

    Posted by Mo at

  2. More math entity pain coming up. DTDs don’t work on the Web.

    Posted by Henri Sivonen at

  3. Mo, doing nothing equals going with “magic” public identifiers which is the situation we are in now. Not really optimal in my opinion. I suppose another option is to remove that feature, but stuff will break, I’m sure.

    Posted by Anne van Kesteren at

  4. Anne: Okay, “nothing” wasn't too well-phrased. I'm of the firm opinion that the DOCTYPE magic shouldn't have ever been present in the first place, and shouldn't be there now. Browsers that do properly support XHTML should deprecate the “feature” with immediate effect and remove it in a few months' time. Nobody actually churning out XHTML is supposed to be using it, and XHTML support is arguably new enough that there aren't enough documents in the wild for a reversion to the specified behaviour to cause widespread panic, riots in the streets, and so on.

    Mind you, people will complain, and an awful lot of people seem remarkably scared of that.

    In more detail: I'm opposed to (a) because it strikes me as unnecessary and won't work reliably for a long time (plenty of browsers installed on users' machines will do DOCTYPE sniffing for a while to come irrespective of what people decide); I'm opposed to (b) because it's dirty and messy and, quite frankly, sucks, and I'm opposed to (c) because it's a solution looking for a problem. Most operating systems work perfectly well with UTF-8 (and other Unicode encodings) when told to: they have to parse the damn things anyway.

    Posted by Mo at

  5. If it weren't for the math people, we could ignore the problem and just live with it. Irritating but not fatal. But the math people really don't have a good alternative to name all their glyphic weirdness, that I know of.

    Posted by Tim at

  6. If it weren't for the math people, we could ignore the problem and just live with it. Irritating but not fatal. But the math people really don't have a good alternative to name all their glyphic weirdness, that I know of.

    MathML isn’t written by hand in practice. Instead, it is generated from something else. The generator should output unescaped UTF-8 and not leak mnemonic names to the browser.

    Posted by Henri Sivonen at