Anne van Kesteren

OpenID, HTML parsing, SVG, and the `Link` header

27 January 2008

Michael encouraged me to submit an idea for XTech 2008. As it happens, that site lets you use OpenID. So I try setting up my own identity provider using some script I found on the web on id.annevankesteren.nl. Before actually testing the identify provider on some site I modify various things such as the HTML it is outputting. It turns out that doing that is a bad idea. My markup looked like this:

<!doctype html>
<title>id.annevankesteren.nl</title>
<link rel="openid.server" href="http://id.annevankesteren.nl/">
<link rel="openid.delegate" href="http://id.annevankesteren.nl/">

The problem here is that OpenID does not use a proper HTML parser (not even one that follows the HTML 4 specification). <head> et cetera are actually required as Sam Ruby pointed out after I tried using it on his site. You’d think they make those things more robust.

Trying to give the page a little bit of style I found 14 by 14 pixel PNG of the OpenID logo. However, this looked tiny with the font size I was using and trying to scale it up using img { height:1em } didn’t make it look better. So I searched on Google for “sam ruby openid” and copy and pasted some SVG. It looks pretty good, though at this point no browser can renders that page without bugs…

Comments

I believe I am missing something painfully obvious, but it is going to bother me until I know the answer. Where are the style rules for http://id.annevankesteren.nl coming from?
Posted by Jeremy Smith at 7:23AM
Jeremy,
view HTTP headers. :-)
Posted by Eric at 7:55AM
Jeremy, try this command:
curl -i http://id.annevankesteren.nl/
The stylesheet is pulled in by an HTTP header:
Link: <fancy.css>; rel=stylesheet
Posted by Simon Willison at 8:00AM
at this point no browser can renders that page without bugs…

Anne, instead of as an <img> tag, why don't you do it inline (like Sam) or using <object>?
Posted by Jeff Schiller at 8:31AM
A while back I was serving up HTML 4.01 with as many tags omitted as the spec would allow (among other quirks), and noticed the same thing. I managed to get OpenID discovery to work, but the trick was pretty ugly: I had my home page's server-side code start looking for application/xrds+xml in the incoming Accept header, and served up a redirect to a YADIS file if that was the best match.
It'd be nice if more things played nice with "quirky" but still perfectly valid HTML...
Posted by James Bennett at 8:32AM
view HTTP headers. :-)

I had no idea the Link header existed (it's not in the spec; though, I did find some drafts from 1999 describing it). I learned something today.
Posted by Jeremy Smith at 9:48AM
Jeremy you should look at links to style with http headers. ;)
Posted by Karl Dubost at 10:14AM
If you don't mind messing up your markup with a little javascript, here's a method I came up with to display that SVG image as another type of image for browsers that don't support SVG as images yet.
Posted by Fyrd at 11:16AM
The LINK header is defined in rfc2068 though..
Section 19.6.8.1
Posted by Callek at 12:36PM
(erm: 19.6.1.2) [Note to anne, feel free to edit my earlier comment to fix my typo]
Posted by Callek at 12:37PM
Jeff, that would require changes to my style sheet. It doesn’t really matter as this is an “academic” web page anyway. For instance, I could have used a link element to point to a style sheet.
Posted by Anne van Kesteren at 6:04PM
Anne, yeah I figured this was an "academic" type page - fair enough ;)
Posted by Jeff Schiller at 12:54AM
It does kinda suck that none of the deployed OpenID libraries use a real HTML parser. If it makes you feel any better, there was some debate about this, but the library implementors won that one. The main argument was that they wanted to avoid having too many dependencies, and at the time the state of the art in HTML parsers for most languages wasn't very good anyway.
These libraries could potentially use an HTML5 parser library now, but if I remember correctly the provisions to allow OpenID libraries to parse pages using regexes are now baked into the 2.0 spec and into all of the main implementations.
Posted by Martin Atkins at 8:46AM

Yes, this kind of head parsing is a frequent issue, for instance to discover an FoaF, or whatever. So my current algorithm is as follow:

if (content is application/xhtml+xml)
{
 use_good_old_xml_parser_and_precise_xpath();
}
elseif (lucky_to_have_an_html_parser())
{
 use_html_parser_and_relaxed_xpath();
}
else
{
 use_regex();
}

Posted by Alkarex at 9:35AM

Ideally you would set up your Open ID as annevankesteren.nl/about right? ...and this is just a kind of crammed testcase?
Anyway I am really annoyed that most OpenID URLs seem to be set up on a separate, empty page. ...that's really not how it was meant to be used!
Posted by Gee at 1:30AM
Gee's username above is linked to "http://data:text/html,Sissiefuss" which, when interpreted by Firefox, redirects you to www.lightreading.com, a highly confusing result... But upon closer inspection it's simply caused by Firefox's autocompletion of URLs, it just appends www. and .com to the part which looks like a hostname and uses anything after a colon as the port (which in this case is discarded) and anything after a forward slash as the GET request. And it turns out that data.com redirects to lightreading.com. Weird... Sorry for the off-topic comment.
Posted by Dylan at 11:42AM

Anne van Kesteren

OpenID, HTML parsing, SVG, and the Link header

Comments

OpenID, HTML parsing, SVG, and the `Link` header