Anne van Kesteren

Encodings: ISO-8859-1 and Internet Explorer

A little over a year ago I investigated legacy encodings. Today I created a very basic PHP script that outputs octets in the range %01-FF. I quickly found out %00 is best excluded as Internet Explorer starts dropping the whole resource. The script takes a parameter for setting the encoding label (e.g. "us-ascii").

Fetching this resource directly in Internet Explorer caused a download unless I changed the HTTP media type to text/html from text/plain. This can be worked around using the proprietary X-Content-Type-Options header revealing that there is no difference in encoding handling when it comes to media types. (There is a media type sniffing difference.) When fetching the resource using XMLHttpRequest however there was a difference. For a long time I thought the label "iso-8859-1" always mapped to Windows-1252, but apparently this is not the case in Internet Explorer when fetching files through XMLHttpRequest. No other browser exhibits this behavior though.

I was hoping XMLHttpRequest could be used for encoding tests as loading files in an iframe subjects them to media type sniffing and might cause download dialogs to appear. I guess I end up having to make sure both work. On the positive side, the eventual specification for legacy encodings on the web will probably not have to have different requirements based on context. Browsers other than Internet Explorer are consistent.

(I realize there is some kind of specification today consisting of in part an IANA registry and a bunch of references from there. However, the IANA registry is apparently not just for the web, but also for a bunch of applications that are not planning to converge with the legacy requirements of the web and therefore does not come further than vague hand waving for certain issues. As such, a suitable replacement is needed.)