Anne van Kesteren

What’s wrong with `text/xml`?

27 November 2006

Sam made the following remark in his rather nice response to my flamebait: I will also note that the same reasons that text/xml should be deprecated apply equally well to text/html. I think draft-murata-kohn-lilley-xml-02.txt is the last edition of the draft that tries to deprecate text/xml. Section 3.1, paragraph 3 of XML media types is the reason (section 8.5 gives an example) some people want to deprecate it in favor of application/xml. Browsers seem to ignore that particular rule though and treat it the same as application/xml. (Do I hear someone saying silly browser vendors in the background?) Any chance we can get that in the specification? It seems pretty compatible to me. And more pragmatic.

Comments

There exists (existed?) a class of software called transcoders. Their purpose in life was to convert from one character set to another.
Such software does not understand every possible data format, in fact the text/* class of MIME types was designed for them. They merely needed to look for this pattern and possibly a charset parameter. By the rules, this charset was supposed to override everything you might find inside the document.
The reality is that some data formats embed their own charset information. XML does so in the prolog. HTML permits meta http-equiv. The knowledge of XML required to parse the prolog is minimal. The knowledge of HTML required to parse the meta tag is somewhat deeper. Furthermore, the popular usage of the meta tag, namely to specify the charset, is not supported by the specification (see references below).
Needless to say that a transcoder that changes the character set and only the external HTTP header produces broken results for these formats.
This leads to some tension. Should one follow the specifications, or follow actual usage? Should one make exceptions only for popular formats, but not for unpopular ones?
References
- Martin J. Duerst / Chris Lilley / MURATA Makoto
- Lachian Hunt
Posted by Sam Ruby at 8:08PM
I know that Outlook does ‘transcoding’. I sent an attachment with the MIME type correctly specified as text/plain; charset=UTF-8, and the damn thing actually converted the file to Latin-1 when saving it (breaking all the non-English names in the file). Transcoding bad :).
~Grauw
Posted by Laurens Holst at 8:35PM
There exists (existed?) a class of software called transcoders.

Until someone proves otherwise, I continue assuming that transcoding proxies (that are not tightly coupled with a mobile browser to form a distributed UA) are a myth on the Web today. Most often people who point to the transcoder problem in the context of HTTP have heard about transcoding proxies but have not seen one in the wild recently. (Russian Apache does not count. It is a transcoding origin server.)
Besides, considering how form submissions work, I highly doubt that a transcoding proxy could be deployed without breaking form submissions spectacularly.
Posted by Henri Sivonen at 11:04PM
So this problem exists for text/html, text/css, text/javascript and text/xml. Look at that, all the popular web formats!
Posted by Anne van Kesteren at 11:55PM
So this problem exists for text/html, text/css, text/javascript and text/xml. Look at that, all the popular web formats!

I am not aware of a mechanism to specify a character set inside of text/css or text/javascript. If no such mechanism exists, one could transcode text/css from iso-8869-1 to utf-8, for example, and do so safely by simply adjusting the charset parameter on the Content-Type HTTP header to match.
Posted by Sam Ruby at 12:48AM
For CSS there is @charset.
Files with the .js extension are generally served as application/x-javascript by servers and thus aren't affected by transcoding problems. Now that there are registered MIME types for JavaScript one can still switch to application/javascript or application/ecmascript since browsers ignore the media type for files referenced from <script src>.
Posted by zcorpan at 6:35AM
Until someone proves otherwise, I continue assuming that transcoding proxies ... are a myth on the Web today.

I bet Google Mobile transcodes. It's not tightly-coupled with a specific UA, either.
Posted by Robert Sayre at 12:58AM
Until someone proves otherwise, I continue assuming that transcoding proxies ... are a myth on the Web today.

I bet Google Mobile transcodes. It's not tightly-coupled with a specific UA, either.

Isn’t it a URL rewriter and not a proxy in the HTTP sense?
Posted by Henri Sivonen at 7:11AM
It does a variety of things, but you're correct that it's not a transparent proxy (unless Google uses the same system in a proxy for its own apps).
Posted by Robert Sayre at 11:59PM

Anne van Kesteren

What’s wrong with text/xml?

Comments

What’s wrong with `text/xml`?