Anne van Kesteren

Playing with flat files

Although I tried to move away from files since I started on the web as a "developer", I somehow like it that they always make sense when you have a good configured server. Let me explain. To remove complexity when updating your website you will eventually starting to work with some includes or a file containing functions you append to every page. Most likely with some server side scripting language like PHP or ASP or server side includes from Apache. After that you might move to only a couple of files, index.php, .htaccess and perhaps some other files you include. (Something like WordPress.) Doing so reduces the complexity of creating new pages a lot, however, it introduces new problems most people are not aware of.

For example, the following page could exist: http://example.org/2004/05/foo, but the following page does not: http://example.org/2004/05/bar. The problem is that your rewrite rules redirect that request to your not so complex index.php which will return some text like: Sorry, no posts matched your criteria. Still, that doesn't have to be a problem, but most likely it is. The software should also, besides returning some text, send out a 404 - Not Found HTTP header. If it does not, search engines and other bots who can't read and rely on HTTP headers to know if the page is there or not will index the page. (It's like semantics, really.) And you don't want that page to be indexed, since it simply doesn't exist. Unfortunately, such details are never taken care of.

On the other hand, a flat file system doesn't have such problems. For example, Movable Type generates flat files and those files can therefore be controlled with Apache. If a page isn't there, it simply doesn't exist and no URI is rewritten. Apache returns a 404 and nobody is doomed. This advantage, as mentioned in the first paragraph, is also one of major reasons people drop MT and switch to WP. (Of course, those people probably don't need all the power of MT, since it goes a lot further than the default (and extended) WP installation.)

When you have Apache server software and the ability to change things in a .htaccess or httpd.conf file using flat files can sometimes be very attractive because of the possibilities it provides. When you publish a website in multiple languages, you don't want to make it complex you could simply create files like contact.nl.html, contact.de.html and contact.fr.html and put them in your root directory. If you put the following, very simple line in either one of the Apache configuration files the magic will start:

Options +MultiViews

When someone requests the following file: contact, it will get one of the three above mentioned files depending on the users' browser configuration. When we assume a perfect world scenario, where someone who is French, has a French browser with the following Accept-Language header: fr, the user will get back contact.fr. (If we also had contact.fr.xhtml the Accept header has to be looked up as well, compared, et cetera.) You could even add another extension to be able to switch between different character encodings although the world would be much better of with UTF-8 as the only option available.

Another advantage of flat files is that, when served with Apache, they are fast. Apache is designed for flat files. That might be quite logical, a flat file is ready to be send over the wire, it doesn't have to be generated with some server side scripting language first, it can be delivered instantly. Something else I thought of:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<title>OBJECT test</title>
<object data="/media/meeting"></object>

(It may be obvious that I left out the alternate content inside the element since I assume that every browser supports it.) The file meeting has different versions, this continues on the multiviews story. We have meeting.png and meeting.html. When someone has disabled graphics, they are probably not in his Accept header anymore and he will get the HTML version. Other persons will get the picture, which should of course tell exactly the same story, only then visually, not verbally. Being realistic, I'm not quite sure if this last example is achievable given the current Accept headers among browsers.

Less complexity makes things complicated, to get back to WP.

Comments

  1. The software should also, besides returning some text, send out a 404 - Not Found HTTP header [...] Unfortunately such details are never taken care of.

    weirdly enough, i made this exact type of change on my site the other day for the experiments and portfolio pages (which indeed are an index file fed by some mod_rewrite/.htaccess). some of us do care...

    flat files are of course optimised for fast delivery, but can be a bugger to maintain (if, for instance, you have dynamically generated elements, or you want to change your layout site-wide, etc). ok, depending on your needs, they may often be the right choice, but it's not all rosy...

    Posted by patrick h. lauke at

  2. Funky caching, baby! And a control mechanism which keeps track of which content exists where... it ain't hard.

    Posted by Mark Wubben at

  3. Just a little note: the language code for French isn't fa, it's fr.

    Posted by Nicolas at

  4. Unfortunately, such details are never taken care of.

    As far as I can remember, I have taken care of this on every site I have created so far, unless somewhere it slipped my mind. So obviously, never is a bit of an overstatement.

    About static versus dynamic pages.. On Blogger we use static pages. Every time I make one little change in my template (which I happen to do quite frequently), I need to republish my entire weblog, which can sometimes take quite long. And I don't even really have that many posts yet - what if you've been blogging frequently for years and you have literally thousands of posts...

    However, obviously this isn't a problem if you only need to change something to your template once in a (rather long) while. But in my case, I would rather switch to dynamic.

    Posted by Charl van Niekerk at

  5. I mentioned this on the WordPress developers list. In the current 1.3 development code, we do return a 404 on requests that don't match a rewrite rule. But we are currently talking about expanding this as you suggest.

    Pretty much any request that isn't in the form of a querystring (i.e. mod_rewrite or pathinfo derived URIs) should probably return a 404 when no posts match.The main exception I can think of is a search request, which is handled by query. So I think a 200 reply, with the "no matches" result in the response body, is appropriate there.

    Posted by Dougal Campbell at