Anne van Kesteren

XML 2.0: XML with graceful error handling?

26 January 2007

Is XML 2.0 Under Development? So in that article Micah Dubinko mentions mobile browsers living up to their premise and all that. What he says however, isn’t really true. Mobile browsers and XHTML is tag soup parsing all the way. Even for content labeled application/xhtml+xml! Don’t take my word for it. Because of this we at Opera are being blamed for doing things “correctly.” Correctly here means showing a silly error to the end user. For instance, that the page he’s trying to view didn’t escape an &. Draconian XML failed for feeds (RSS, if you wish) as well. The time has probably arrived to define graceful error handling for XML, put some IETF, W3C or WHATWG sticker on it, label it XML 2.0 and ship it. Perhaps we can drop this internal subset thing in the process.

Comments

As I say to myself most days: "Let's hope so..."
Posted by J. King at 5:20AM
Well, I applaud Opera for doing the right thing. :-)
Posted by Ethan Poole at 5:52AM
I don't think tag-soup parsing for XML is a long-term solution for anything. The real problem is all the software that accepts invalid code in a variety of random ways, so creating specs that define how to turn some invalid code into valid code will be helpful. The random handling of invalid code has created a race to the bottom that wastes the time and money of all involved. I wish more people had the chutzpah to reject invalid code.
Posted by Andrew Gregory at 3:35PM
Sounds like a pretty good idea. I don't think the problem can be fixed any other way. Browsers can't stop supporting poorly formated sites since there are so many of them. And amateur developers (and sadly enough professionals as well) won't stop making them unless the sites they make (note that this isn't the same as those they have finished) stop working.
Posted by Wilco at 4:27PM
I wish more people had the chutzpah to reject invalid code.

...says the man serving XHTML 1.1 as text/html (and no, trying to change it with a META element doesn't work).
Posted by Mark at 11:09PM
Noone cares about error handling or the lack thereof. People care about features. Had there actually been effort to add useful capabilities to XHTML above and beyond HTML4, then there would have been adoption – draconian error handling and all. Only eggheads like you and I care about markup, and as it is, XHTML doesn’t anything to ordinary people.
I don’t know about draconian error handling failing for feeds either. I subscribe 400 of them with a feed reader that uses a strict XML parser and the crushing majority of them work consistently. Now, 5-10% of feeds or so being malformed some of the time may still be a huge total number of feeds, but it’s a whole different story from HTML where the fraction is more like 99.5%.
Now consider that far and away the majority of these well-formedness errors are charset issues. If you define some error handling for that, you are left with 1% or so of malformed feeds.
I don’t know about you, but I think that’s not how failure smells.
Posted by Aristotle Pagaltzis at 1:05AM
Had there actually been effort to add useful capabilities to XHTML above and beyond HTML4, then there would have been adoption – draconian error handling and all.
In that case, Internet Explorer wouldn't understand new features in XHTML that weren't already in HTML 4. If XHTML (as text/html) didn't work in Internet Explorer as it does now, no mainstream website would pick it up.
Posted by Blaise Kal at 2:57AM
That presupposes that IE would have followed the same history of half-decade torpor. I don’t see that as a given.
Posted by Aristotle Pagaltzis at 7:46AM
Anne,can you define what 'graceful error handling'is?
Posted by Don Ulrich at 11:49AM
Imo "graceful error handling" is a set of rules that define how to deal with unexpected (invalid according to syntax) input for each possible situation. Although this may not always represent what the author intended it at least does make sure the content is (partly) accessible and is consistently so on various UA's.
Posted by Tino Zijdel at 6:57AM
So the browser wars in the mobile area did the same thing the Netscapian wars of the 90's did. What a surprise. Kudos to Opera. I respect them for not "jumping off the bridge" just because everyone else is doing it. Although, even I, an XMLite deep down, am slowly becoming convinced that XML needs error handling instead of "error based demolition".
Posted by Devon at 10:19AM
@wilco, who said stop supporting poorly formatted sites? I say, if a dev specified a modern doctype, they should be prepared ot put in the work to make it at least 97% valid and somehow hide any minor errors that make up the missing 3%
as for the millions of current sites with no doctype or an html4 doctype...well they obviously weren't made with handhelds in mind were they. The only way to handle and display them 'perfectly' would be to put an entire desktop quality browser in the handheld (i-phone?)
Posted by slim at 2:47AM
Opera, Opera Mini? You obviously are quite misinformed about the capabilities of mobile phone browsers.
Posted by Anne van Kesteren at 4:11AM
By not defining error handling for HTML, UAs just began interpreting broken HTML in different ways. Although the WHATWG is now trying to define how a UA should handle broken HTML, it definitely would be much nicer to have had a standardized consistent way of doing it a long time ago.
Even if it isn't a part of the official XML standard, the people who are accepting broken XML should get together and define a standard way of doing it. At least then they'll interpret the same tag soup in the same way.
Posted by HeroreV at 5:43AM
I think it would be a very bad idea to implement any kind of error recovery in XML. If people can’t get it right, then perhaps they shouldn’t be using it in the first place.
Would it make sense if a compiler started to correct errors in your C++ code? No, and it could quite possibly have negative side effects if it guessed incorrectly what you were trying to do.
It really isn’t difficult to check for errors in hand coded XML. If your XML is automatically generated, then as long as any human input is sanitised, then you should never get XML errors in the first place…
Posted by Martin Payne at 8:07PM
This is not about “guessing” at all. This is also not about authoring XML. This is about years of experience in consuming XML. You are free to ignore the facts of course.
Posted by Anne van Kesteren at 10:52PM