Anne van Kesteren

More changes ahead

When I launched my new weblog system there were not many changes in functionality. Actually, everything visible to the end user — you — kept working. Perhaps I shouldn’t say “launched” though, as I delicately replaced WordPress functionality with my own over a few days. Some templates were converted and others were still WordPress driven at some point in history. When all was done I emptied my WordPress folder and removed some WordPresss specific fields and tables out of the database when I no longer needed them, but it wasn’t a “take it all down and upload the new stuff” approach, which had the advantage that nobody really noticed the change. Uploading the new stuff mostly fails, and there are almost always some notable regressions.

Although I did add some functionality, most notable the storage of URI identifiers in the database for Atom and the four different date fields, I also removed pingback support. I had the feeling it never worked very well, as pinging other weblogs failed more than it worked, but I did get some incoming pings from other WordPress weblogs. Trackback was something I disabled long ago because trackback contained invalid HTML and the encoding wasn’t discoverable. That is still a problem, but can be worked around by setting them as non-approved in the comment system and fixing them at a later point. When they arrive you can edit them and approve so they are well-formed. You can obviously automate parts of this process using some character encoding functions of your server-side scripting language. For PHP that would be iconv I believe. Nevertheless, trackbacks were incompatible with my way of “it should be perfect” thinking back then and therefore I disabled them completely.

That I let pingback enabled was because I didn’t understand it all that well I guess. I thought pingback was somehow superior to trackback and solved its problems. It does, but the pingback specification doesn’t specify how some excerpt should be transferred from the pinging weblog to the pinged weblog. WordPress had some implementation of this and just took something from the page that linked; more specifically, the part of text where the link to your weblog entry was in. This gave the same trouble as with trackbacks. Namely that invalid encoding appeared on your page and that elements could become incorrectly nested. (That latter problem was mostly solved.) To work around this problem I set all incoming pingbacks to non-approved until I modified them to my liking. The same as I could have done for trackbacks you say and you’re right. But then you missed the point of me misunderstanding it.

Now most of my weblog is pretty stable I’m planning to add some functionality back with regard to trackback and pingback. I’m not sure how I want to implement pingback, but I think in a similar way WordPress does. I searched the internet for some implementations and found a PHP class for trackback with documentation and two pingback implementations. (pingbackClient and Pingback Server.) One was written by J. King and I e-mailed him about it to see if it could be easily adapted to my weblog. I also e-mailed him something about Simon Willison’s implementation which might make it easier to implement. Well, it would require less code. To quote J. King:

To summarise Simon’s implementation, it uses referrers and user-agent sniffing for a server to detect incoming pings. This effectively cuts out the need for XML-RPC and server auto-discovery. Certainly it makes Pingback much more simple.

As for trackback, it really is a mess. But as it is this mess which everyone seems to be using I want to support it. To make an analogy: It would be nonsense for a web browser to only accept XML documents. Or for that matter: only well-formed XHTML documents send with an XML media type as defined by RFC3023. Such a thing would kill the web browser before it was even released. Hence, I need support for trackback.

(O yes, that ten of you would love such a browser and would totally use it doesn’t change the point. And it also really isn’t worth discussing here, but you may e-mail me about it.)

Sam Ruby has a weblog called ‘intertwingly’ which has support for all kind of comments. I haven’t been able to locate the posts where he announced his features, but if you read a few posts you can see the things he added. Now I’m not entirely sure but I believe he has got referrer support. So when someone links to him from a weblog without pingback or trackback support, but that same someone (or someone else) follows the link from someone’s post to Sam’s post, Sam catches that referrer, does some magic, and displays an excerpt of the post in his comments, along with a link and the title of the post that linked him. I believe he uses the feed that is linked from the referrer’s post for that. (Yes, I’m jealous and like this feature.)

Besides that, I noticed in his code section that you can also comment by e-mail; perhaps by other ways as well. I want to add these possibilities eventually. Perhaps not e-mail, but commenting by referrer, trackback and pingback should certainly be made possible. I guess I want them to be non-approved at first. Eventually though I should have some strong filters ready after scanning all the incoming content so these ‘comments’ can be published instantly.

Actually, as I’m using HTML I could publish these invalid characters and markup instantly so people can benefit from it. It has of course been argued that having invalid markup sucks, but what should come first? Should I refuse some arguable important content to the user because I haven’t checked if it’s entirely valid? People would laugh at you for such statements. I don’t, but your boss will.

(More on the HTML move later; don’t worry and don’t ask.)

As for the implementation: I’m not sure how to do it. I guess it’ll be a lot of reading and asking help from some PHP gurus. For comments I’m currently storing a unique URI as identifier. I’m also storing the publish date (filled on publish and never modified ever again) and the last modified date (for when I modify comments). I figured storing an updated and created date didn’t make any sense here, unless of course I implement a way for people to edit their comments within ten minutes after posting; then, updated would make sense. Of course there are also fields for the author URI, his name and e-mail address, but such information is almost always needed in a non-registration-comment-environment.

A few days back I added a type field, which currently takes either ‘comment’ or ‘pingback’ as value. WordPress stored some empty (for strange reasons Appendix C, but not XHTML, compatible) XML element in the content field for this, which looks like: <pingback />. Or <trackback />, but I didn’t enable that as stated above. So I removed that and instead used a new field to store this information. I’m going to add at least ‘trackback’ and ‘referrer’ to that list and perhaps ‘e-mail’ when I got around to implement them or having found someone to do it.

Communication between weblogs is important; I certainly don’t want to stop it.

Comments

  1. Sounds cool :) We still need to fix trackback/pingback, you know?

    Posted by Mark Wubben at

  2. Will your weblog software be open source when it is finished? It's really interesting :)

    Posted by Jort at

  3. At some point I had this referrer stuff too. It's broken now, but it worked really well. If you build it, I'd drop pingback and trackback. It's ugly technology.

    Another idea: you could use a Technorati watchlist to do all this.

    Posted by Sjoerd Visscher at

  4. Just for your information, everything did not keep working. I discovered your blog a few weeks back and subscribed to the feed in Bloglines. I had marked a dozen or so posts to be Kept New as I wanted to go back and study them later when I had time as I am trying to learn to write good XHTML. When you made your change everything disppeared! Yes, I could go back through your archives and search them out but that isn't very likely. Like everyone else I am quite busy and I don't have a lot of time to just browse. Maybe from your point of view and with the blog locally nothing changed for the user. But be aware that indeed things did change.

    Posted by Mark at

  5. Mark, ouch that sucks. Could it be that you subscribed to my RSS feed which is about the only thing that changed (in output) and might have messed this up?

    Posted by Anne at

  6. WordPress stored some empty (for strange reasons Appendix C, but not XHTML, compatible) XML element in the content field for this, which looks like: <pingback />. Or <trackback />

    WordPress doesn't do this anymore. It has a comment_type field in your comments table to hand this, which can take values of 'comment', 'pingback' or 'trackback' I believe.

    Posted by David House at

  7. Anne - my approach to making my comments, emails, etc well formed is to first escape everything safely. Then I selectively unescape safe things.

    That means that if you attempt to do something bad, we will all get to see your markup instead.

    Perhaps this is best explained by example.

    Here's the code. Look at sanitize.

    Posted by Sam Ruby at

  8. I have found that Sam has a lot of gems in his code section.

    Posted by Scott Johnson at

  9. Just for the record (and for the reader), PingBack support was rewritten in WP 1.5 (I'm the one to blame), to stick closer to the specification and also to get rid of the ugly mess of bad code we had in xmlrpc.php.
    And I reckon we always stripped all markup but the link in PingBack excerpts...
    Type field was added in WP 1.5.

    As for trackback encoding discovery... a pox on blogware vendors that didn't agree on a single way to do it, a pox on blogware vendors that assumed everyone would just use ASCII too.

    Posted by michel v at