Anne van Kesteren

Changes

2 May 2005

Since last week this site has had some major changes in the backend. A few were made to the front end as well, but those are not particularly interesting. Probably the biggest change was moving from WordPress to my own weblog system. Other people — Hanni — have done the same and it was quite a fun experience. Before last week I was using a hacked WordPress 1.2 version. I had build in some custom functions and most of the WordPress filters were turned off or partly rewritten. For some reason editing comments from the admin panel became impossible. I’m still not sure what triggered that, but in the end I’m quite happy it happened as it made me write my own comment backend. After that I tried if it was possible to add some additional columns to the post table in the database without creating all kinds of errors.

For better dates I added created, published, modified and updated. These are implemented similar to the outline I have given before in Atom dates. Every time I add a new entry created is filled with a timestamp and can never be modified again from the admin panel. (It is possible through the database, but that would suck as things rely on it.) When an entry is made readable for public published gets filled with a timestamp and can never be edited again. modified is updated every time something changes to the entry and updated is only changed when I publish an entry and when I think some major revision is done and I check the ‘updated checkbox’. As you can see I don’t have to enter any date myself, the admin is taking care of that.

Every entry now also has an id_uri field. Here are two examples:

tag:annevankesteren.nl,2005-04-22:/388
tag:annevankesteren.nl,2005-04-27:/001125/svg-nightlies

… as you can see I can easily modify how they are generated over time as they are stored in the database now. As long as I keep them the same everything will be fine. (These are also needed for Atom by the way and perhaps other publishing formats.) I’m considering adding the string ‘weblog’ to a future version to make them more consistent with my link weblog and your comments:

tag:annevankesteren.nl,2005-04-29:/href/090345
tag:annevankesteren.nl,2005-05-01:/comments/202909

Although my weblog system is mostly hacked together it does use a lot less files than WordPress did and has a lot less of things I don’t need to have. As outlined above it does offer some new features and I’m planning to add more. It also survived Slashdot yesterday so it is capable of taking some hits.

Something else I solved with this new system was the quote problem I had with WordPress and other systems. My WordPress 1.2 install was actually so messed up that quotes were stored in the database like ‘\'’. After I removed those and made sure the database was completely well-formed I hacked my post mechanism a bit further to make sure ‘\’ is ‘\’ in the database and ‘\’ outside of the database and not ‘\\’ or ‘’ (empty string). I also added some server-side post validation similar to the comment validation that is enabled here at the moment to make sure everything stays well-formed XML.

For the comments and the links I have done similar steps and I believe everything is working quite good at the moment. Comments can now use PRE without having to worry that it will not look the same as it did on the preview page, because it does. I haven’t been able to find the bug in WordPress 1.2 but it is no longer important for me to do so. In fact, I removed the WordPress folder today among with WordPress specific database columns. (I still need to rename some.)

Instead of the search page I’m now using Google when you type something in above. I’ve stolen the idea and implementation from no less than Mark Pilgrim so I expect it to be good. When I have searching using XPath enabled I might set up a demo page but Google is mostly superior to what I could came up with for now. (Although I could do some pretty cool stuff using XMLHttpRequest I guess.)

I’d like to thank Matt for the WordPress effort. It has been the weblog system I have been using for the longest time now and had almost everything I needed. I have used the system longer than Blogger, Movable Type, NucleusCMS and another weblog system I build myself together. (For completeness it might be worth adding that I have tried WordPress and MovableType before NucleusCMS and after.)

Anyway, time for moving on and getting the last bits done.

Comments

If you type /weblog on my site you can see what I had started on currently based on the Wordpress DB structure (although I would most likely modify it, I wanted it to be a little bit more compatible than just writing my own DB structure, although it appears that in the end it wasn't so very different). My initial main problem with it was that I'd have to rewrite some parts of my old Watchzine code to ensure well-formed XML (as it currently doesn't partly due to some changes I made in how Wordpress stores posts in the database), but I plan to continue on it in the future again.
Wordpress is quite a nice thing to extend on, but right now I am wondering if the time I spent on customizing WP shouldn't have rather been spent on improving and extending my own code. On the other hand, I learned a few tricks from the code of others (read: Compreval XHTML check part) which I can use to my advantage. My ideas are there. The actual completion of them could very well take a year... Besides, I didn't spend much time customizing Wordpress (deleting code isn't so hard if the code is well-written and easy to get), most of my time was in the design (and I'm still working on the markup), which is something I will be able to take to my own system immediatly, so it's time well-spend, after all.
Posted by Frenzie at 2:23AM
I forgot to compliment you on the fact that there aren't any notable differences and getting your own system up. Hereby.
Posted by Frenzie at 2:24AM
Awesome. I never knew you moved away from Wordpress. The same as Frenzie, kudos to you for not showing any differences between your old log and your new one.
I never fooled with Wordpress myself because I just prefer to do my own PHP coding. I didn't code the weblog system that I used for mine when I had it, but J. King did. I helped a bit on it, but not much. It was pretty much his project. We both learned a lot out of it.
I always find that if you make things yourself you're more satisfied with it. I'm a graphic designer, and when I make an entire book by hand along with designing it I am more satisfied with it because it's personal. It's me. The same goes for web stuff. When I program stuff with PHP, HTML, etc. I am more happy with the results because what I programmed is displaying something I designed as well. It's essentially the same thing. It's also part of a learning process, too. It's satisfying to learn something new and useful. I'm sure you feel similarly about it as I have, Anne.
Posted by Dustin Wilson at 3:20AM
Is it just me or did your feed used to feature full posts instead of summeries?
Posted by Federico at 3:46AM
Just for other people considering upgrading, the quote problem is transparently fixed when you upgrade to 1.5. Texturize has had several improvements and no longer touches things inside of pre, code, kbd, etc. The new GUID field in the database ensures a global and permanent unique identifier for each post, and our RSS importer preserves GUIDs that come from other systems. 1.5 includes a new theme framework and a much improved plugin API to make sweeping changes easy without touching core files. It also abstracts out the content type so you can easily switch to alternative content types for your pages should you like to. Finally I recommend a plugin like X-Valid to ensure validity of comments and posts, it will be rolled into bbPress soonish and possibly WP in the future, after some integration points with KSES are worked out.
Posted by Matt at 5:01AM
So when are you making this available for other people to use? ;-)
It looks very nice. Like Dustin, I didn't even realize anything was different until you said so. :-)
How much of your perfect weblog system do you have implemented? Any idea on what's next?
Posted by dolphinling at 6:47AM
Well done! There's nothing like having it custom-written!
I like the fact that it's HTML instead of XHTML, because on weblogs there's nothing like having incremental loading on posts with many comments (in other words, large documents). I still hope they add that feature to the XML rendering in Gecko too, though.
And your minimalist HTML looks very neat! Again, congrats!
Posted by Charl van Niekerk at 2:20PM
Nice move. One certainly benefits from such a customization - me, I currently hack WordPress 1.5 to get a tailored blog system - and it's fun since the system really gets suited to one's own needs, the code becomes better, and one gains some new experience. But on the other hand, it's that time-consuming...
Posted by Jens Meiert at 3:18PM
Federico, I changed from RSS1.0 to RSS0.92 or 0.91 and I didn’t want to dive in the RSS mess to solve things so I choose the excerpts, as they are text only. You can use the Atom feed for full posts.
Posted by Anne at 4:55PM
Shouldn’t this line: <link rel="author" href="/about" title="Anne van Kesteren"> actually be <link rev="author" href="/about" title="Anne van Kesteren">?
Posted by Doug W at 9:17PM
Interesting. So you still use XHTML for input. (I was wondering how you were going to validate HTML input.)
How are you serializing to HTML for output? Are you just using a bunch of regexp's or something more robust?
Posted by Jacques Distler at 9:20PM
Regexp's, apparently.
Which fails quite spectacularly.
Posted by Jacques Distler at 9:25PM
You've heard me say it before, but I'll repeat it again.
Parsing XHTML with regexp's is evil!
Far, far more evil than sending XHTML as text/html.
Posted by Jacques Distler at 9:30PM
No regular expressions. I should somehow forbid the content model you entered though.
Posted by Anne at 9:35PM
Regular expressions, string substitutions, whatever. If you're not taking the parsed XHTML and serializing that, then your setup is prone to failure. (If not because of comments then — one day, when you forget — because of something you type).
Posted by Jacques Distler at 9:48PM
Possibly, but as long as it is stored as well-formed XML in the database I don’t have any major problems with it.
Posted by Anne at 9:54PM
You don't have a problem with turning well-formed, valid XHTML into broken, invalid HTML, and sending that off to the client? Why would that ever be preferred over sending the original (well-formed, valid) XHTML to the client?

Posted by Jacques Distler at 10:50PM
I didn’t said that. However, I now improved the comment validation to make sure such things can’t happen anymore I hope. Eventually I want to build up everything using a DOM tree, including feeds, et cetera, but such things take time. (Also, you are doing kind of the same. Sending XHTML to IE as text/html doesn’t break you less or so.)
Posted by Anne at 5:24PM
Testing comments:
- Astral character:
  Posted by Henri Sivonen at 2:20AM
- Looks like you are not normalizing to NFC as per charmod-norm. :-)
  Posted by Henri Sivonen at 2:23AM
- Seriously though, I think it is cool that you were able to swap the system with so no disturbance.
  Posted by Henri Sivonen at 2:33AM
- Henri, any chance you pass me some PHP function to normalize characters to NFC? Would be a nice thing to do although I wonder how many people use UTF-8 the way you do :-)
  Posted by Anne at 3:48AM
- I am not aware of any such PHP function. The Unicode infrastructure of PHP is rather weak—or more to the point: there is no Unicode infrastucture, which is why app developers need to manage stuff like conversion between UTF-8 and Unicode code points.
  I am aware of normalization functions for C, Objective-C, Java, Python and Perl, though, so in general tools are available. I suppose the best bet for PHP would be writing wrappers for ICU or glib.
  As for real-world usefulness, I have seen a case in the wild where an author using Safari and WordPress had copied and pasted decomposed umlauts into his blog, because Preview.app exported decomposed Unicode to the clipboard. No problem was visible in Safari, but in Mozilla it looked ugly.
  Posted by Henri Sivonen at 9:59PM
- I had no idea there were so many people starting with WP and ending up with a custom blogging system. I did the same, but kept only the WP admin console and rewrote a frontend from scratch. Maybe we should join our forces and stat a new, collective project.
  Posted by ludo at 5:44AM
- try http://textpattern.com very usefull, get rc3 1.0
  Posted by Tim at 12:48PM
- Wordpress is good for what I need though.
  Posted by Tim at 7:29AM