Anne van Kesteren

The perfect weblog system

If I'm ever going to write a weblog, or someone else is going to write one I'm going to use, here is an outline of what it should (or must) have. Inspiration comes from an article of Henri Sivonen: Outlining the “Ultimate” Blogging Server and various people: Asbjørn Ulsberg, Mark Wubben and Robbert Broersma. I was going to use a definition list, but dropped that option in favor of the unordered list, since that seems to be used for this kind of things. I hope your screen is wide enough.

I will update this document if it needs any changes. (For real.)

Comments

  1. [...] XHTMLML/CSS caretaker I've encountered recently is Anne van Kesteren, who's recent Perfect Weblog System includes some interesting suggestions [...]

    Posted by Danny Ayers, Raw Blog at

  2. Anne, your ideas of URI structure remind me heavily of the proposed URI structure of the (unwritten) Inklog Content Management System. See Inklog: what is it? and Inklog theory: the storage system for more information. Maybe some of Reverend Jim's insights might help you clarify or explore your thoughts, which I agree on. In particular, the content negotiation model with regards to languages and file types is a very good one, but difficult to implement unless the weblogging software works like Inklog, providing "nodes" and elements. Should this be a component of the perfect CMS?

    Posted by Basil Crow at

  3. Great work!

    Posted by Randy Charles Morin at

  4. As a web application developer I'd like to cast a vote for custom extensions and plug-ins. After all, what good is a blogging engine for if you can't extend it or add fancy gadgets?

    Posted by Milan Negovan at

  5. Wiki?

    Posted by Robbert Broersma at

  6. normilisation in it's prime.

    Posted by flump at

  7. Very well. I've tried some bloging/CMS tools and all have shortcomings, and not the minor ones (yes, WP too). This is a good list of points to consider, altough I do not agree with all you've said, this gave some useful thoughts for a start.

    There are some points about which I never thought before - thanks for sharing.

    Posted by Rimantas at

    • Comments
      • Threaded comments should be possible. One comment can be a reply to multiple others.

    How is this displayed? What happens with a reply to this comment? Can the (second) reply be to only certain grandparents? I can see how this would cause a lot of problems, and I think that's why you don't see it anywhere. (Or at least, I never have. If you have seen a good implementation, I'd love to hear about it. As a postscript, would this be called "woven comments" rather than threaded? ;-)

    • Post metadata
      • Summaries
        • Each post must have a summary.

    What about one line posts? Would the summary be empty or identical to the post?

    • Comments
      • Both name and email must be required, but the email address must never be displayed.

    What if the user wants their email displayed? They'd have to resort to something like I have right now as my url. (How's it work? Email me with suggestions :-))

    • URI structure
      • General
        • The URI doesn't need a 'www.' subdomain, remove redundancy.

    Just making sure this applies only to the blog itself and not to the database of links. Some of those links may be to poorly configured servers that allow access only to the www subdomain (or properly configured servers that have a valid use for it).

    Posted by dolphinling at

  8. »Use HTTP in favor of the META element with the HTTP-EQUIV attribute. Actually, that attribute must not be used.«

    Mozilla »forgets« the encoding when one uses »save as« if you don't use the http-equiv ersatz.
    Sad, but true.

    I'd like to see metadata for images and code to make this better searchable.

    Posted by Thomas Scholz at

  9. Okay, so my data url got mangled, which only proves my point more: if I want people to be able to contact me, I should be able to provide an email address for them, and in a way other than the post body.

    One more nitpick: the root of the blog isn't always the root of the domain. (The weblogs.mozillazine.org blogs are a good example.) That's easy to adapt around, though.

    Other than those, though, this looks wonderful. Be sure to tell me when it gets made. :-)

    Posted by dolphinling at

  10. Why shouldn't Last-Modified be a date in the past? I thought that was the whole point.

    Posted by J. King at

  11. Acts of Volition - Ultimate Weblog System : May 2003

    Posted by Abhi at

  12. Ok, I updated some parts that were unclear, especially the thing J. King pointed out was quite a stupid mistake :-)

    I also added some extra parts: caching, compression, images and markup.

    Posted by Anne at

  13. In these instances I think you want 'where' in place of 'were'.

    Posted by Chris Neale at

  14. I have one suggestion. I have beef with your demanding to provide feeds for everything. It does not make sense in some situations, and is actively harmful. Think of a log which does not have forced moderation turned on. Anyone can comment, and even if you delete, ban blacklist later, the damage is done via comments feed. To speak plainly, it defeats the purpose.

    Therefore, my opinion is that it should not be an unconditional request to provide feeds for everything. You might just as well tell people to publish their email addresses in plain text. In other words, veto :)

    M.

    Posted by Moose at

    • Comments
      • Threaded comments should be possible. One comment can be a reply to multiple others.

    How is this displayed? What happens with a reply to this comment? Can the (second) reply be to only certain grandparents? I can see how this would cause a lot of problems, and I think that's why you don't see it anywhere. (Or at least, I never have. If you have seen a good implementation, I'd love to hear about it. As a postscript, would this be called "woven comments" rather than threaded? ;-)

    I would call it a relational comment system (damn you Anne, now the word is out!). Actually this is what's used at Dunstan's blog. It could be visualized using Javascript, for example.

    Anne, I agree with almost everything you wrote here. Some things are highly subjective, such as your URI and syndication proposals. The ultimate weblog system should allow these things and alternatives.

    Posted by Mark Wubben at

  15. Does anyone actually use /archives?

    Posted by Redund at

  16. ...And alternatives is one of the most important of all comments of all. Thanks Mark!

    Posted by Robbert Broersma at

  17. I use /archives -- but I must say I did toy with the idea of getting rid of it.

    A comment: I didn't see "language" anywhere in the post attributes. I think language should be an attribute of a post, even if the post is not available in alternate languages. It could of course be inherited from the weblog language.

    Just for history, some time ago: Requirements for a perfect weblogging tool.

    Posted by Steph at

  18. Good call, I added things about languages (for both comments and posts).

    Posted by Anne at

  19. All links should be stored in a separate database table. Referenced to from within the post.

    Does this mean to display the post the software has to first look up the link? Or is the link merely copied into the database at time of posting?

    A link database is an idea I had ages ago. It would mean I could keep track of all my links, and search for one I needed.

    The title of the image that is going to be set in the TITLE attribute should really be a title and not a description. The weblogging software should make the author aware of this in some way. For example, the title should be "Mona Lisa" and not: "A women with a very nice smile... et cetera", which would be more appropriate for the ALT attribute.

    I've been guilty of doing it the other way round - short ALT text and long TITLE text. The reason is that a page shows the short text when the image is missing.

    Dates should be completely transparent to the end user.

    What does this mean?

    An additional /archives/ before the year isn't necessary. It makes the URI complex and long.

    Fine if you want to fill your root directory with folders. It makes sense to have a archive folder. Plus it tells users what the URI is.

    The actual day isn't important for most weblogs. However, if your weblog or news system publishes at least 5 items a day it might be worth considering adding it.

    Did you mean "The actual hour"? Of course the day is important!

    Homepage: http://example.org/ The trailing slash here must not be considered optional. It is required, in other words.

    Why so?

    You must not use RSS 2.0, since it has big problems.

    Really? I see it on most major sites with feeds. What are the problems? (Lack of an agreed precise format? People using CDATA?)

    Posted by Chris Hester at

  20. Finally, I agree with Steph about this commenting system. It's not a user's job to keep a site valid. No flamebait, honestly, but if I had to type any more XHTML here, I'd expect a paycheck. :-)

    Posted by Jan! at

  21. First Chris comments. Stored in the database and in the post probably but they should somehow reference to each other so the post can be updated with the new link if a link changed in a 301 for example or if a link is gone 410.

    The long alternate text is really better, you could of course include "Mona Lisa" in the alternate text, but a description of what the picture looks like is important. The TITLE attribute just does what it says, giving the title. The W3C should of course have released a lot more attribute like HINT and DESCRIPTION to address our needs, but they did not. (Actually, having elements would probably be better.)

    The user of the system shouldn't have to care about something as dates. Timestamps are relevant to the user, but he shouldn't have to enter them himself. The software should generate it as explained in the post.

    Having something like /2004/ also tells what the URI is. Something that has been archived, since it has a date in the URI. It doesn't necessarily have to pollute the root folder by the way. Ever heard of Apache? ;-)

    Day is not an important part of the URI in my opinion. (This post bas heavily biased as is all the internet.)

    RSS 2.0 has problems, don't ask Dave. Ask Mark.

    Jan, my commenting system has nothing to do with this post. Actually, this post is wishlist for a better weblog system. I hoped you figured that out ;-)

    I wonder what you mean with upsetting newbies. A weblog is not meant to be used offline, in my opinion (again).

    Posted by Anne at

  22. 'Current weblog software doesn't use this and rewrite everything to a single file that handles the request and shows some custom error message instead of returning a header.'

    Not exactly true: if you were to try to get: http://blog.raena.net/2004/08/06/nonexistent/ and I were using Movable Type or Blogger, you would most likely get a 404 response -- it creates actual files and directories, rather than fudging them, so your server will generally do what it would do for any other missing resource.

    I had Movable Type generate a really nice 404 errordocument for me, including a friendly message and a bunch of links to archives and the search engine.

    'Individual entries: http://example.org/2004/02/slug'

    And if your blog isn't 'like' that, what then? What if my weblog made more sense with URIs like http://www.cakesrock.org/recipes/cheesecakes/vanilla/ ?

    Asides from those nitpicks, I'm right with you.

    The user must be able to enter the full alternate content, like a table for example or several paragraphs of text. Markup must be allowed for this information.'

    Markup allows already. I have long awaited the day where I can write a longdesc for a photo or other image and have a tool auto-generate it for me.

    Posted by Raena Armitage at

  23. I would call it a relational comment system (damn you Anne, now the word is out!). Actually this is what's used at Dunstan's blog. It could be visualized using Javascript, for example.

    It is called a threaded-comment system, and it is implemented in BLOG:CMS. Try looking at this sample article, and click on show in context to see what I mean.

    For Anne: brilliant work, was this inspired a bit by my email to you re: your feedback on BLOG:CMS? I guess so, as quite a lot of your items seem to follow BLOG:CMS features ;)

    Posted by rADo at

  24. rADo, I haven't had time to fully test BLOG:CMS yet. This post is inspired on features I miss in current weblog systems and I have seen in others (Dunstan's relation comment system for example). I thing that I really miss is good support for dates, which is not a part of Nuclues, not a part of WordPress, Movable Type doesn't do it, et cetera.

    Atom IDs that are actually stored in the database so you can switch your domain, but keep your Atom ID for older entries the same. And endless more things mentioned above.

    Posted by Anne at

  25. I thing that I really miss is good support for dates

    Well, within about 5 minutes of trivial PHP coding, you can have in BLOG:CMS exactly URLs you describe, not to mention that stuff like always valid comments, threaded comments, cached articles and sidebars, automatic rel links, etc. are already in there, in a default install package, plus wonderful plugin interface with 100 events, unlike in other systems (like trivial, yes, I am biased, WordPress, or more advanced Movable Type...).

    Posted by rADo at

  26. I was not talking about the URIs, I was talking about the dates ;-). But lets take this to email if you think there is more to say.

    Posted by Anne at

  27. to: Anne, ok, just drop me few lines, when you install BLOG:CMS, I think most of these requirements are easy to met, if not done already (most of them).

    Posted by rADo at

  28. Some neat ideas here. I always enjoy reading these types of posts and mpt's original was an inspiration for the early releases of WordPress. The discussion from this and other posts is helpful in conceptualizing some things. Thanks. :)

    Posted by Matt at

  29. The W3C should of course have released a lot more attribute like HINT and DESCRIPTION to address our needs, but they did not.

    There's always the longdesc attribute.

    Having something like /2004/ also tells what the URI is. Something that has been archived, since it has a date in the URI.

    Disagree. "2004" could mean anything. Since that's the current year, it might suggest documents relevant to this year only, such as the latest news, or the Olympics. Ie: not archived material. Far better to tell the user they are in an archive by using "/archives/".

    Posted by Chris Hester at

  30. So then you use /archives/; everyone happy. I and others think it is redundant. It is not about the details, it is about better systems.

    Posted by Anne at

  31. Good work. Reminds me of Mark Pilgrim's "30 days to a more accessible weblog".

    Posted by Martin-Éric at

  32. If an entry can't be found, don't say the entry can't be found, but return an actual 404 header. (Current weblog software doesn't use this and rewrite everything to a single file that handles the request and shows some custom error message instead of returning a header.)

    404 is a status code, not a header, and you can use that status code with meaningful error pages, so what you are presenting as an either/or scenario is actually two unrelated topics.

    I'd say that friendly error messages are something that a good weblog system must have. In particular, 404 pages should offer explanatory text, an explicit suggestion of visiting the archives or search page, and perhaps a list of articles that might be of interest (you can pull keywords out of the slug to find articles of interest). The same applies to 410 and other error states.

    I'd also echo the questions about including the date in the URI. What purpose does this serve? It's metadata, not something that is needed for addressing a resource.

    I think the multilingual requirement of something like index.en.html could use a bit of work. End-users aren't familiar with language codes. It would be much friendlier to have a prefix, e.g. http://example.com/english/foo and only negotiate when the prefix is not given. If you use relative URLs, this also means you don't have to mess around with conditional logic when you just want to output a link.

    Also, why .htm and not .html?

    Posted by Jim Dabell at

  33. One reason I can think of for using /archives/ is that many weblogs will want an archive page as a starting point for browsing them. It seems cleaner to have /archives/ with the index page showing the available years and having 2004/ etc as subdirectories than to have a separate /archives.html with 2004/ etc as siblings.

    I don't think anybody has come up with a decent way of visualising weaved/relational comments yet. The example linked to was just confusing. I would suggest that unless you expect this problem to be solved soon (I don't), the complexity and confusion outweighs the usefulness. Email, Usenet and existing comment systems work on a threaded basis and everybody is comfortable with them. Replies to two separate comments can be posted as two replies; having a system that can deal with multi-parent comments seems to me to be of extremely limited value.

    Posted by Jim Dabell at

  34. As far as the 'www.' prefix goes, I agree that it is redundant, but leaving it off is very unusual. It looks very odd to me.

    Plus, there is actually a software compatibility issue, I seem to remember that plenty of browsers would recognise a hostname if you left off http://, but only so long as there was a 'www.' prefix. I'm not sure if that's still the case now though.

    Finally, it could confuse people. If you say to somebody "go to www.example.nl", everybody will immediately realise that it is a website. If you say to somebody "go to example.nl" they may not, especially if they are not from that country or are not a techie.

    Posted by Jim Dabell at

  35. For language: what I'm going towards (and what I see as ideal) is the following:

    http://url-of-archive-month-or-category/en/ or http://url-of-archive-month-or-category/fr/

    In short, adding a language "tag" at the end of any url would (a) change the language of the "furniture" for the page (the comment link text, the headings, the date) and (b) show only posts in that language.

    Posted by Steph at

  36. Anne, I'm glad to see you agree with me that archives in permalinks are unnecessary. They have been bothering me for a long time now, and it so happens that I just posted about that on my weblog (amoungst other things).

    However, I must be honest that I think calling anything perfect is a little too subjective. :-)

    Posted by Charl van Niekerk at

  37. Plus, there is actually a software compatibility issue, I seem to remember that plenty of browsers would recognise a hostname if you left off http://, but only so long as there was a 'www.' prefix. I'm not sure if that's still the case now though.

    I think I was a little unclear here. I'm talking about when you type an unadorned hostname into the address bar of a browser.

    Posted by Jim Dabell at

  38. In response to URI Structure/individual posts: To make categories more apparent and important, I think it's better to put categories into the URL, like: http://example.org/blog/general/slug. Especially, if the site has more than just a blog, it's good to say "blog" and categories after that are pretty logical.

    For example, my site is done like that: http://www.visiomode.com/

    Posted by Ilkka Huotari at

  39. To make categories more apparent and important, I think it's better to put categories into the URL

    That fails when you put an article into more than one category at once. There's nothing wrong with having browsable categories, but the permalink shouldn't be inside them.

    Posted by Jim Dabell at

  40. That fails when you put an article into more than one category at once.

    Yes... in case you tend or want to do that. I'm so much favor of structuring the content hierarchically that I wouldn't want to do it.

    Of course, I might change my mind sometime, but not yet :-)

    Posted by Ilkka Huotari at

  41. Anne, I suggest you take a peek at what I'm working on over at http://backup.greywulf.net as it includes a lot of the thoughts you've voiced - and added a lot more into the pot which I will certainly think about too. Good work!

    I'm trying to create as small and complete a blogging system as possible. Right now it's focussed on the size of the core engine (using blosxom+photogallery+comments, etc) with an aim to steadily increasing it's useability once I'm happy that the centre is working correctly. The only area you've touched on that I'm not sure about is that of compression. I'd rather have the posts remain as ascii-pure as possible - that way if everything else changes, the content itself will always be visible and accessible. Compression for me adds another unecessary step.

    As an aside - it's just taken 7 attempts to post this one comment because I received numerous errors including "XHTML is not well-formed" and "You need to use block level elements in order to post a comment". It's a comment. I wouldn't expect everyone who posts a comment on a weblog to be converstent in XHTML, XML or whatever. I would want their thoughts, not proof of their technical competence. If something "isn't compliant", it should be silently fixed, not throw an error. That just leads to frustration.

    Posted by Robin Stacey at

  42. The perfect weblog system: I agree with most of this list.

    Posted by hitormiss.org: The perfect weblog system at