Anne van Kesteren

HTTP 304: Not Modified

  1. Knock, knock.
  2. Who’s there?
  3. It’s one of the feed readers that download your feed to check if there are any updates.
  4. Ugh.
  5. You are lucky; some feeds are over a 100 mebibytes and fetched every hour by at least a thousand different feed readers.
  6. Ugh.

Sounds familiar? Could be. There is an answer to this problem: HTTP. The answer is always HTTP. This is not about media types though.

This also doesn’t solely apply to feeds. This does mostly apply to people who generate their pages on the fly; not to people who create actual files on the server. Apache is good with that stuff. The day that Apache also supports status code 410 natively will be a great day. Not just for me, but for every content publisher on the internet who happens to like correct status codes. And also for the majority who don’t, because they are no longer bugged with it by those who do; et cetera.

There are a few headers I want to introduce to you. Bear with me, as they are important. Last-Modified returns the date when the retrieved page was, well, last modified. This is a trivial thing, but when I first implemented it I forgot comments. And after that I forgot to check if the post was perhaps modified after the last comment was made. Don’t you do the same! The whole purpose behind writing things up is that others don’t make the same mistake. Or at least, that’s the purpose of this entry.

Besides Last-Modified we need ETag. For reasons not entirely clear to me by the way, but as it seems to work and was simple to implement I decided not to care. The day I will pay for that probably comes sooner than expected so I hope one of you can enlighten me on this. (Does it have to do with HTTP 1.0 versus 1.1?)

Your weblog now returns some additional headers and all is fine. When the feed reader says knock, knock again it will now send two additional headers to you. They are to be known as If-Modified-Since and If-None-Match. The former contains your Last-Modified value and the latter contains the ETag value. If both are equal — now comes the point — you return a 304 status code of approximately 200 bytes. Than the visitor’s user agent will know nothing has changed and it will pull the page out of the cache.

Besides that point, there’s a catch. If the client only returns a single header you have to compare only that header. I borrowed some code from WP and implemented it like:

function http_modified($last_modified,$identifier){
 $etag = '"'.md5($last_modified.$identifier).'"';
 $client_etag = $_SERVER['HTTP_IF_NONE_MATCH'] ? trim($_SERVER['HTTP_IF_NONE_MATCH']) : false;
 $client_last_modified = $_SERVER['HTTP_IF_MODIFIED_SINCE'] ? trim($_SERVER['HTTP_IF_MODIFIED_SINCE']) : 0;
 $client_last_modified_timestamp = strtotime($client_last_modified);
 $last_modified_timestamp = strtotime($last_modified);

 if(($client_last_modified && $client_etag) ? (($client_last_modified_timestamp == $last_modified_timestamp) && ($client_etag == $etag)) : (($client_last_modified_timestamp == $last_modified_timestamp) || ($client_etag == $etag))){
  header('Not Modified',true,304);
  exit();
 }else{
  header('Last-Modified:'.$last_modified);
  header('ETag:'.$etag);
 }
}

Comments

  1. OT: Just a thought, but why did you choose an unordered list for the conversation since the order is actually definitive? I've done some thinking on this subject lately. Lots of people use unordered list for menus and stuff while in fact the order is actually pretty important, and those people have actually really thought about the order. To me in such cases an ordered list would be much more semantic.

    Let me know what you think. Hopefully I don't make a complete fool of myself ;)

    Posted by Maarten at

  2. Nice article. I should probably make use of the Not Modified header for my message board system.

    Posted by Jero at

  3. Just a thought, but why did you choose an unordered list for the conversation since the order is actually definitive?

    The WA1 draft suggests OL, CITE and BLOCKQUOTE for dialogues.

    Posted by zcorpan at

  4. Besides Last-Modified we need ETag. For reasons not entirely clear to me by the way, but as it seems to work and was simple to implement I decided not to care. The day I will pay for that probably comes sooner than expected so I hope one of you can enlighten me on this.

    One reason I have heard is that Last-Modified isn't granular enough, but that sounds like over-engineering to me - who updates resources more than once a second?

    ETag is more flexible though, as it allows for strong and weak validators. This means that you can retransmit when substantive changes to the resource have been made (e.g. a retraction), but serve older copies when non-substative changes to the resource have been made (e.g. spelling corrections). I don't know to what extent this has been implemented though.

    Posted by Jim Dabell at

  5. Nice to see 304's getting an airing. But I am curious to know what you want Apache to do with a 410 that it can't already do.

    Rather an irritating comment system, btw. Won't let me use id or class attributes...

    Posted by Danny at

  6. AFAIK, Last-Modified is an HTTP 1.0 header and ETag is an HTTP 1.1 one.

    I implemented these on my feeds some time ago and they help me to save lots of bandwidth.

    If you want only to send a 304 Not Modified Header to the user agent when the feed was not modified, generating a static xml file with the contents of the feed seems better to me. When a user agent accesses a static file, apache sends the Last-Modified header to it - using the modified date value of the file - and also sends an ETag - I think apache generates ETags based on modified date, file contents and inode value (if applicable). What makes it better to generate feeds on the fly is that you can send only the latest entry of the feed to the user agent. If it's an aggregator, it'll work just as if you sent the full feed contents to it. I implemented this feature on my feeds and it's working very well.

    Unfortunately, browsers don't work as aggregators, so we can only choose between sending the full content or an HTTP 304

    When I first implemented caching on my blog I made the same mistakes as you, forgot comments too.

    Posted by Bruno Torres at

  7. If I'm not mistaken then it's only necessary for either the ETag or the timestamp to indicate that it's the same version and you may already send a 304 not modified. In any case; effectively you're getting a timestamp and a hash code from the browser (if you sent those in the first place) so it's up to the application to determine whether a 304 suffices, based on those two facts.

    Posted by Eamon Nerbonne at

  8. I believe it's also standard practice to resend the Etag when returning a 304. ~d

    Posted by Douglas Clifton at

  9. I believe that you don't need to do BOTH Last-Modified and ETag. You can choose to do either one. The big difference is that a resource can have one, and only one, last-modified time. But a file can have an arbitrary number of entities. And More importantly, your browser should send all the entities that it has. You can read this at http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26.

    Lets say that you've got some page and you can be either authenticated or unauthenticated on that page. And for convenience, lets say that the page has somewhat different, but still relatively static information, for when you're logged in and when you're not. This is a somewhat contrived case, but website tied to a CRM might do the trick.

    So back to the example. Lets imagine the case where we load the page, log in, return to the page, log out, and return to the page. Normally, you will have to send a 200 for each instance because the first time the logged out version is new, the second time the logged in version is new and the third time the logged out version is still valid but the date passed in "If-Unmodified-Since" refers to the logged in version. We know that we have no choice but to resend the page.

    Using multiple ETags gives us the following sequence of exchanges (I'm doing this mostly from the top of my head but you should get the idea):

    1. Request:

      GET /FOO HTTP/1.1

      Response:

      HTTP/1.x 200 OK
      Date: Tue, 24 May 2005 06:44:54 GMT
      ....
      Etag: "unauthenticated"

    2. Log in
    3. Request:

      GET /FOO HTTP/1.1
      ....
      If-None-Match: "unauthenticated"

      Response:

      HTTP/1.x 200 OK
      Date: Tue, 24 May 2005 06:44:54 GMT
      ....
      Etag: "authenticated"

    4. Log out
    5. Request:

      GET /FOO HTTP/1.1
      ....
      If-None-Match: "unauthenticated", "authenticated"

      Response:
      HTTP/1.x 304 Not Modified
      Etag: "unauthenticated"

    You'll see that, where conventionally we'd have to send a 200 response and the content, we can set a 304 and avoid sending the content

    This works because ETags and If-None-Match work not by asking "Is this still valid?" which is what If-Modified-Since asks but rather, "Which of these is still valid?" This is a very powerful notion that is underused. I say ditch Last-Modified altogether.

    Posted by Adam van den Hoven at

  10. I've posted an article in response to this here: On HTTP Last-Modified and ETag.

    Posted by Christopher Lenz at