Anne van Kesteren

HTTP interoperability

8 October 2007

I’ve been wondering about HTTP interoperability for some time now. What if a response has two Content-Type or Location headers? What if newlines are done using 0x0A instead of 0x0D followed by 0x0A? The Python Web server I got allowed me to simply test what happens after I modified it to support “.asis files” which are simply read from the disk and then feeded directly to the client without any modifications made by the server code. I made the following file and tested it in various browsers:

HTTP/1.1 200 FOOBAR
Content-Type: text/plain
Content-Type: text/html

<!DOCTYPE html><title>HTML or text?</title><p>...

Newlines are separated by a 0x0A octet in my file. That appears to be no problem. Firefox and Opera return an HTML representation of the entity body. IE shows a text representation. Well, there’s issue one. Next I made a similar test using the Location header:

HTTP/1.1 302 FOOBAR
Location: /test
Location: /foobar

Although RFC 2616 does not allow relative references in Location everyone supports that and has to. More interesting however is that Opera and IE go to /test and Firefox goes to /foobar. Make that interoperability issue two. This is pretty much all I actually tried to test so far. Seems that HTTP gives you new layer of browser interoperability holes to exploit.

From what I heard so far RFC 2616bis is not going to address these issues (error handling, thorough interoperability testing, et cetera). gsnedders is working on testing HTTP parsing interoperability and plans to write an HTTP parsing specification which would at least address some of the issues, but in the end we either need the HTTP WG to get their act together or find a lot of free time for HTTP5.

Comments

RFC2616 has a section entitled "Tolerant Applications" that recommends supporting things like US-ASCII 0x0A (as it isn't Unicode in any shape or form) as a valid linebreak separator.
As for Location, I know that older Safari versions (≤1.2, off the top of my head) did not redirect when it was a relative URI (technically it doesn't allow IRIs — another interesting thing to reverse engineer).
To quote the introduction to the current draft of the afore mentioned spec I'm working on (disclaimer: I really don't like this text, though it brings across the basic message):
Ever since HTTP's conception, there have never been any standards regarding its parsing in the real world. [RFC2616] tried to improve this situation with a section (19.3) entited "Tolerant Applications", providing advice about parsing requests and responses. However, it did not go into specific details that are needed for interoperability with current (non-conformant) servers and clients. The lack of any current specification defining such specifics makes it hard for any new UA to be created without first spending large amounts of time reverse engineering what is in cases purely bizzare behaviour, which unless you know about beforehand, you may not write enough test cases to find some of the oddest behaviour.

Posted by Geoffrey Sneddon at 11:53PM
Necko's choice of the last of multiple Location headers is bug 309668. I'm inclined to say that the lack of duplicates makes it rarely-trodden ground, but then I don't actually watch bugs in network, much.
Posted by Phil Ringnalda at 1:25AM
I'd say that every specification that doesn't well define error-correction or -handling for any non-conformant cases will be a subject to ambiguities and vendor-specific behaviour causing interoperability problems. HTTP is no stranger here...
Posted by Tino Zijdel at 6:42AM
I tried these out in Konqueror (using PHP to supply the headers), and both the second Content-type was used (same as Firefox, Opera) and the second Location was used (same as Firefox). Just for the curious. (New lines were Unix newlines – I've had so many 0x0A vs 0x0D vs 0x1F0D problems with files received at work that I can't recall which is the linefeed and which is the carriage return, or which "Unix" is.)
Posted by Christopher Fritz at 9:04AM
How about having the cited browers implement properly the handling of Content-Location ? :)
And you can of course propose the handling of two Content-Type in a HTTP message to the HTTP-WG, it may be seen as a clarification of the spec. ALso working on test suites and interop has been discussed, be it part of not of the WG effort is not known now, but there is an effort in that direction.
If you want to work on HTTP5 as not interoperable with HTTP/1.1 (as rules will be different), please tackle ipv5 ant tcp5 first :)
Posted by Yves at 3:43PM
Content-Location can’t be implemented by browsers because of deployed server software. I mentioned this to the HTTP WG in not so many words eleven months ago. What ensued was a thread with lots of opinions and little actual debugging/testing. With the stance of the HTTP WG to just wait until the situation will improve it is unlikely that getting involved in the RFC 2616bis process will help us browser vendors much apart from maybe minor clarifications such as what to do with duplicate headers, et cetera. I think I rather wait until someone is proposing actual improvements before getting involved.
Posted by Anne van Kesteren at 4:21PM
Interoperable in what way Yves? With the current spec? With current implementations? They mean drastically different things.
Posted by Geoffrey Sneddon at 10:03PM
The CSS philosophy:
- color: #f00;
- color: #fff;
The color is white, not red.
This is most likely how it would be handled.
In short, good documentation leaves no doubt. Most "high level language" documentation is obviously not written by someone who has any insight beyond the base 16 number system. They don't understand the rigid nature of computer information protocols.
Posted by Raymond at 8:05PM