Anne van Kesteren

Optimizing Optimizing HTML

This started as a joke, but I thought I would share the additional thoughts I have anyway in an attempt to make my blog more "boeiend" (ask Krijn). A few of these tips are only valid under HTML5, but will work everywhere else too (e.g. in HTML4 or XHTML1). Because well, as you know, HTML5 actually reflects reality. Blah blah. (In the event you have no idea what I am talking about, this is a somewhat response to Optimizing HTML.)

  1. <link rel="stylesheet" href="…" type="text/css"> — Nobody is using that type attribute. It is just sitting there making your link element less readable and makes it consume slightly more memory due to the additional attribute. You can do this right now as this is valid in all versions of HTML. Apart from the validator versions is a dead concept as far as HTML is concerned, but people seem to care about them. (And some even care enough about them to battle for introducing them in HTML5. Weird world.)

  2. <style type="text/css"> — Same trick, different element. There is only one styling language for the Web in existence and that is CSS. Whenever we introduce a new one (if ever) we can start to use the type attribute. It is a waste of bandwidth and source code reading time to include it now. Then again, including the style element itself is hardly ever needed. Think again when you do it.

  3. <script type="text/javascript"> — The only reason you ever want to include this attribute is if you want to specify a different JavaScript version. Indeed, you do not use the language attribute for this. However, you probably do not want to do that either since it is very much vendor-specific. You know, dragons and all. Having said that, to enable e.g. JavaScript 1.8 in Gecko you would use type="text/javascript;version=1.8".

    The type attribute can very often also be nuked from other elements, such as a and object (if your server configuration is correct). The cases mentioned above are just the most common (well, there is input of course, but it was already mentioned in the original post).

  4. <link rel="shortcut icon"> — While this should actually be just type="icon" you can remove this element altogether by simply having a /favicon.ico resource. This will also reduce the amount of 404 traffic you get on your site; potentially substantially. Do not worry about the extension, you can put any file format there your browser supports and configure the Content-Type header appropriately using your server. If you have no idea how to do that let me tell you that it does not really matter anyway since browsers will sniff the MIME type of the resource regardless. Oooh, evil. Just avoid the HTTP gods and you should be safe. (Well-known locations are evil too, but we lost this battle.)

  5. Trashing Trailing Slashes — Including the space usually in front of them, to be complete! Lots of people bought into the XHTML fad and end their tags with  /> rather than >. Besides that this is useless it takes up two more characters per element of which the only effect is that it increases the time (not noticeable though) that software will take to download and parse the document. The claim that everything is now consistently closed makes no sense since e.g. script does not accept this notation. You need to know exactly which elements do not have an end tag and if you already do why would you add cruft to their start tag? I know, I know, to please the XML gods. Pleasing the XML gods makes no sense in a HTML world. There.

The original post mentions anti-patterns as well. Let me give some contradictory advice. Please do replace em with i and strong with b. The resulting markup is likely more accurate. Especially with WYSIWYG software it is extremely unlikely they will be used correctly, but also because the notion spread that they are interchangeable with the sole difference being that strong and em are semantic a lot of misuse happened. When in doubt, use i and b. And in most other cases probably too. (Does anyone know if there is a plug-in for WordPress that fixes this?)

As a final "tip", if you do not care much for cross-browser compatibility you can get rid of <link rel="stylesheet"> entirely. Investigate id.annevankesteren.nl for details.

Comments

  1. Great post! +2 internets to you, sir.

    Another optimization you haven’t mentioned is using extensionless files for assets. You can forget about .css, .js, .jpg, .png, .gif, .html, and .php altogether – just add Options +MultiViews to your .htaccess and omit those extensions already.

    I tend to disagree on your anti-anti-patterns statement though. If you know how and when to use em and strong, replacing them with i and strong undeniably is taking a step back. You’re right that a lot of people are misusing these elements, though.
    As for the WordPress plugin – I suppose you could just apply a simple str_replace()-based filter.

    Posted by Mathias Bynens at

  2. I feel a rant coming on, like it was still the noughties.

    *Take's a deep Breath*

    Posted by Egor Kloos at

  3. What a shame! I just discovered Chrome does not seem to unterstand the "Link" HTTP header. Anybody out there who wants to file a bug? ;)

    And in the course of my investigations, I realized mozillaquestquest.com is now owned by grabbers? That's even worse news! Hixie, would you like to send me a copy? That was definitely one of my favorite sites ...

    PS: Anne, why does your blog software force me to write well-formed XML? This feels truly abasing!

    Posted by david at

  4. It is mozillaquest.com. Oh, and my blog software is teh evil. Wrote it myself.

    Posted by Anne at

  5. Thanks for pointing the wrong use of em and strong out. I often feel like the only one realising that. And Mathias Bynens, Anne sure does not say to replace em with i when you know that em is right; he means just replacing it when you don't know if it's right. And that would be always the case when a WYSIWYG editor has to make the choice.

    But Anne, about the type in objects: Don't you think the hint it provides to the user agent is of value? object does not have a default type like style or link with rel=stylesheet. And if you think there is no value, what about source elements and type?

    An answer would delight me.

    Posted by JoJo at

  6. Anne, have you noticed that your list of "allowed" elements includes the typo: KDB?

    Posted by Ben Joffe at

  7. Also, it seems suitable to add B and I to the "allowed" list now that you've done a 180 on the issue:

    The element b just means bold and what on earth is semantic on that word? Please tell me. It is exactly the same for the i element. You should drop it together with all your useless span elements.

    Posted by Ben Joffe at

  8. Nope, Anne, I did mean mozillaquestquest.com. :) Ask Hixie, I'm pretty sure it was his brilliant project. (Unfortunately, web.archive.org breaks it.)

    Posted by david at

  9. 100% in agreement with poster number 7.

    And also trashing trailing slashes, really?

    -

    To be honest if I was going to spent time on optimisations I'd be looking firstly at images (optimisations and sprites), rewriting and improving javascript, and then at how the server is configured from gzip compression, to expiry headers and then to the physical server setup. Saving some space by removing trailing slashes seems daft to me.

    /L

    Posted by Laurie at

  10. Ben, it should be all fixed, thanks! Test.

    Posted by Anne van Kesteren at

  11. JoJo, in case of object the type would simply be known the moment the file is downloaded from the server. The same can be down for source, but in that case there is an advantage in letting the browser look at the type attribute. Namely that it will not have to inspect every resource that is linked.

    Posted by Anne van Kesteren at

  12. When I do not care much for cross-browser compatibility I'm doing XML + client-based browser XSLT page. It works well though (aside from IE and Konqueror).

    Posted by Tenno Seremel at

  13. Thanks

    I have done a test about omitting the type on object and video in Safari 4.0.4

    http says Content-Type: application/vnd.rn-realmedia about the video, what is true and an unsupported format.

    <object data=video
            type=application/vnd.rn-realmedia>
    </object>
    
    <video controls>
       <source src=video
               type=application/vnd.rn-realmedia>
    </video>

    object results in broken Quicktime icon displayed. video results in no loading indication, not even a empty progress bar in the video UI. In fact Safari does not try to download the video - claims its webinspector.

    <object data=video></object>
    
    <video controls>
        <source src=video>
    </video>

    object results in direct saving the video on my drive. video results in endless "loading" in video UI displayed.

    It seems that Safari depends on hints given in html. One could argue that Safari sucks, and probably Chrome too.

    But to display a eternal downloading video wich is in fact not loading at all or a greyed out UI is a big difference in user experience. And to download a file on the desktop or to not even trying to load it an even bigger.

    Maybe I am a little bit to chatty about this thing. But I would so much like to do as you say and I don't have a life - so delete me if this is spam.

    Posted by JoJo at

  14. I removed the shortcut element with the favicon.ico on the server root and the favicon stopped showing up in the browser (FF 3.5.6). FYI.......

    Posted by Aeron at

  15. "Removing cruft from HTML" is a better title. I'm not sure how serious this post is but I'll answer on one point anyway.

    The trailing slash is handy because I can use XML parsing tools on my markup. Just like your commenting system does.

    Posted by Dean Edwards at

  16. I liked the no css css "trick"! You had me puzzled for a few moments there :)

    XHTML is not well-formed (line 1)

    Heu? XHTML? Fix your blog software!

    Posted by Blaise Kal at

  17. I removed the shortcut element with the favicon.ico on the server root and the favicon stopped showing up in the browser (FF 3.5.6). FYI.......

    Firefox hard-caches favicons. You’ll need to close the page, delete your cache, restart the browser, then open the page again. Your favicon should be there.

    Posted by Mathias Bynens at

  18. Originally posted by Blaise Kal:

    I liked the no css css "trick"! You had me puzzled for a few moments there :)

    You have to love lynx -head -dump "http://id.annevankesteren.nl". Not that Opera and Fx don't display it if you ask nicely, but it seems like more effort. I only have to press i on my customized setup, but the problem is that Opera hints at the headers more than that it really displays any, and Firefox requires too much mouse interaction (yes, you can do it with your keyboard but it's clearly not made or optimized for it — besides it takes a lot longer to start).

    By the way, Anne, why is it sent twice: once with and once without a space? Does that have something to do with browser compatibility?

    Posted by Frans at

  19. One more thing: Safari treads object and embed without type as iframes - behaviorwise. And at least in case of images it trys to get a hint through a file extension to switch beetween image or iframelike state. With no extension it will stay in iframelike state and no http header or content sniffing can change it anymore. Looks funny.

    So, leaving type out is not working for Webkit. And it shows how much information Webkit is trying to wrinch out of the markup - even if its wrong, like trying to wrinch a filetype out of a URI.

    Shit, I would love to omit type, it looks so nice and minimal.

    Posted by JoJo at

  20. The Html functions I wrote for MediaWiki (in Html.php) do this kind of thing automatically. So in HTML5+non-well-formed XML mode, Html::element( 'input', array( 'type' => 'text', 'value' => '', 'autofocus' => 'autofocus' ) ) will return <input autofocus>. In XHTML1 mode, you'll get <input type="text" value="" autofocus="autofocus" />. (I could drop some of the attributes in XHTML too, but can't be bothered to maintain two sets of defaults.) Not really needed if you're manually typing the elements, but handy if they're programmatically generated.

    Posted by Aryeh Gregor at

  21. Frans, it was just a bug, fixed now. Thanks!

    Posted by Anne van Kesteren at

  22. Evil as always…

    Moving the CSS from HTML to HTTP headers probably won't affect network payload in any way. HTTP headers are still sent over the wires.

    BTW, how much network payload will you save by reducing page weight from 1460 chars including HTTP headers to 460 chars?

    Answer: Zero!

    The IP packet size is not affected.

    Going from 1600 to 1400 means going from 2 packets to 1. Reductions can happen, but the math is not simply about bytes.

    Posted by Lars Gunther at

  23. I wouldn't mind trashing the type attributes in link[rel=stylesheet] and script elements if text/javascript and text/css were the defaults in current and past specifications. I try to avoid validation errors, and I certainly want to avoid browsers going mad about my scripts and styles, so I'll take every precaution that's necessary.

    Since I'm not Google where every byte is sacred because it adds up to terrabytes in a fantastillion requests I don't mind having a hundred bytes more on a page. I embrace the two bytes for the ending space and slash as it helps to avoid errors, especially when you work in a team with other or less experienced developers where the strictest rules are just good enough.

    Otherwise you're contradicting yourself: if nobody uses the type attribute anyway, why should you evangelize against its use?

    Posted by Martin Kliehm at

  24. I just noticed that without the "type" attribute on the link tag, the W3C validator will not find any style sheets on your page.

    Something to look out for if you're planning on having validation links.

    Posted by Josh L at

  25. Nobody ever browses using a validator.

    Posted by James John Malcolm at

  26. "Trashing Trailing Slashes — [...] useless [...] it increases the time (not noticeable though) that software will take to download and parse the document."

    As Lars Gunther said above, such micro-optimizations won't probably affect the loading time at all due to the size (MTU) of packets.

    As for the parse time argument this is also wrong : what takes time in a html parser is to check and apply various corrections patterns for parsing tag soup and unclosed elements.

    After reading this article I feel I should quote Postel's law (aka as first rule of interoperability) :

    "be conservative in what you do, be liberal in what you accept from others"

    - RFC 793

    It's not because browser are loose, that you should advocate to trash good practice.

    Posted by Nicolas R. at

  27. What about the optional tags?

    Posted by Tae at

  28. (Sorry for the slightly late comment.) I have to protest a little about your suggestion to serve up non-ico favicons. If you want to send a PNG as "favicon.ico", fine, and do send the right MIME type if you feel virtuous. However, it just makes no sense not to use the ico format anyway. (The point of your remark though to be fair is that link elements are superfluous, which is clear.)

    Posted by Nicholas Wilson at