Anne van Kesteren

OpenID, HTML parsing, SVG, and the Link header

Michael encouraged me to submit an idea for XTech 2008. As it happens, that site lets you use OpenID. So I try setting up my own identity provider using some script I found on the web on id.annevankesteren.nl. Before actually testing the identify provider on some site I modify various things such as the HTML it is outputting. It turns out that doing that is a bad idea. My markup looked like this:

<!doctype html>
<title>id.annevankesteren.nl</title>
<link rel="openid.server" href="http://id.annevankesteren.nl/">
<link rel="openid.delegate" href="http://id.annevankesteren.nl/">

The problem here is that OpenID does not use a proper HTML parser (not even one that follows the HTML 4 specification). <head> et cetera are actually required as Sam Ruby pointed out after I tried using it on his site. You’d think they make those things more robust.

Trying to give the page a little bit of style I found 14 by 14 pixel PNG of the OpenID logo. However, this looked tiny with the font size I was using and trying to scale it up using img { height:1em } didn’t make it look better. So I searched on Google for “sam ruby openid” and copy and pasted some SVG. It looks pretty good, though at this point no browser can renders that page without bugs…

Comments

  1. I believe I am missing something painfully obvious, but it is going to bother me until I know the answer. Where are the style rules for http://id.annevankesteren.nl coming from?

    Posted by Jeremy Smith at

  2. Jeremy,

    view HTTP headers. :-)

    Posted by Eric at

  3. Jeremy, try this command:

    curl -i http://id.annevankesteren.nl/

    The stylesheet is pulled in by an HTTP header:

    Link: <fancy.css>; rel=stylesheet

    Posted by Simon Willison at

  4. at this point no browser can renders that page without bugs…

    Anne, instead of as an <img> tag, why don't you do it inline (like Sam) or using <object>?

    Posted by Jeff Schiller at

  5. A while back I was serving up HTML 4.01 with as many tags omitted as the spec would allow (among other quirks), and noticed the same thing. I managed to get OpenID discovery to work, but the trick was pretty ugly: I had my home page's server-side code start looking for application/xrds+xml in the incoming Accept header, and served up a redirect to a YADIS file if that was the best match.

    It'd be nice if more things played nice with "quirky" but still perfectly valid HTML...

    Posted by James Bennett at

  6. view HTTP headers. :-)

    I had no idea the Link header existed (it's not in the spec; though, I did find some drafts from 1999 describing it). I learned something today.

    Posted by Jeremy Smith at

  7. Jeremy you should look at links to style with http headers. ;)

    Posted by Karl Dubost at

  8. If you don't mind messing up your markup with a little javascript, here's a method I came up with to display that SVG image as another type of image for browsers that don't support SVG as images yet.

    Posted by Fyrd at

  9. The LINK header is defined in rfc2068 though..

    Section 19.6.8.1

    Posted by Callek at

  10. (erm: 19.6.1.2) [Note to anne, feel free to edit my earlier comment to fix my typo]

    Posted by Callek at

  11. Jeff, that would require changes to my style sheet. It doesn’t really matter as this is an “academic” web page anyway. For instance, I could have used a link element to point to a style sheet.

    Posted by Anne van Kesteren at

  12. Anne, yeah I figured this was an "academic" type page - fair enough ;)

    Posted by Jeff Schiller at

  13. It does kinda suck that none of the deployed OpenID libraries use a real HTML parser. If it makes you feel any better, there was some debate about this, but the library implementors won that one. The main argument was that they wanted to avoid having too many dependencies, and at the time the state of the art in HTML parsers for most languages wasn't very good anyway.

    These libraries could potentially use an HTML5 parser library now, but if I remember correctly the provisions to allow OpenID libraries to parse pages using regexes are now baked into the 2.0 spec and into all of the main implementations.

    Posted by Martin Atkins at

  14. Yes, this kind of head parsing is a frequent issue, for instance to discover an FoaF, or whatever. So my current algorithm is as follow:

    if (content is application/xhtml+xml)
    {
     use_good_old_xml_parser_and_precise_xpath();
    }
    elseif (lucky_to_have_an_html_parser())
    {
     use_html_parser_and_relaxed_xpath();
    }
    else
    {
     use_regex();
    }
    

    Posted by Alkarex at

  15. Ideally you would set up your Open ID as annevankesteren.nl/about right? ...and this is just a kind of crammed testcase?

    Anyway I am really annoyed that most OpenID URLs seem to be set up on a separate, empty page. ...that's really not how it was meant to be used!

    Posted by Gee at

  16. Gee's username above is linked to "http://data:text/html,Sissiefuss" which, when interpreted by Firefox, redirects you to www.lightreading.com, a highly confusing result... But upon closer inspection it's simply caused by Firefox's autocompletion of URLs, it just appends www. and .com to the part which looks like a hostname and uses anything after a colon as the port (which in this case is discarded) and anything after a forward slash as the GET request. And it turns out that data.com redirects to lightreading.com. Weird... Sorry for the off-topic comment.

    Posted by Dylan at