This is one of the myths I always wanted to write a small post about with some examples. It is actually just some terminology confusion, but you still see it often: smaller bandwidth as advantage of XHTML, while it is actually an advantage of standard compliant markup or semantic markup. When we look more closely to the bytes and required start and end tags HTML is the language to use for bandwidth optimization. Even Google could have a smaller website. Two examples:
This is only a small difference, but when you take in account the optional end tag for LI
, OPTION
, P
and other elements HTML will get smaller and smaller. Note that you won't loose any semantics or the ability to style elements. That will stay the same. I believe that the only difference is that you can have a different DOM in the end, because a browser has to decide whether the SCRIPT
element (as example) needs to be in the HEAD
or BODY
element when none of those elements has a start or end tag.
See also tags versus elements if you don't know the difference.
Programmers have to make the same choice, going for small and quite unreadable constructions but that work, or longer and easier to adapt lines of code. And if you're programming a big project, having a good code lay-out together with a good indenting with enough whitespace, you'll certainly experience the benefits of it.
It's the same voor XHTML, I like it logical and with a nice code-layout. If you can handle such screwed up HTML, good for you, but it's only useable for plain websites, just forget about MathML and other XML-variants. The eXtensible part of XHTML is important sometimes :).
Why didn't you post an example of this misinterpretation? It's probably just me, but I have never heard sombody saying that it's XHTML that makes the difference in filesize.
Yes, I've heard many people say XHTML is smaller in file size but essentially they probably mean things like removal of multiple <font> elements and using CSS instead of tables, even so I agree with Anne about the common myths that circulate.
Most often I see people use the argument that "well-structured sites that use CSS for presentation and clean XHTML for structure only" are more bandwidth-friendly than table-based designs that use CSS only to style fonts.
That is true, of course, but I too see some people make claims as indicated above by Anne, and I think it's good that someone is addressing that issue now, at least.
In fact, you could argue the complete opposite. XHTML has certain required attributes, for example, that HTML wasn't saddled with. And a good thing too!
Compression, people!
"Redundant" information (whether prescribed by XHTML or otherwise) compresses very well. If you have enabled mod_deflate or mod_gzip, the raw file size is irrelevant.
This only redoubles the force of Anne's argument. Not only is the raw size of the XHTML file not shorter than the HTML file, but it wouldn't matter if it were, as they would both compress down to something about the same size.
It's interesting that your examples don't show HTMLs short falls, TABLEs and FONTs are classic cases. It is true that anything XHTML can do, HTML can do smaller ( by not using end tags ), but HTML isn't correct. HTML is based on a loose form of SGML, XML is based on a strict SGML. XHTML is the next evolution of markup languages. By enforcing strict rules, everybody will be able able to communicate knowing that the recipent will be able to correctly understand you. The web isn't about backward compatibility anymore, it's about foreward compatibility.
In a short example like that, yes, XHTML is longer (mainly due to the doctype and xhtml declaration).
The benefit comes with the combination of XHTML and CSS, allowing you to eliminate font tags and tables, amongst other things.
This combination destroys file sizes in comparison to complex table layouts, not to mention being forwards compatible.
...but HTML isn't correct. HTML is based on a loose form of SGML, XML is based on a strict SGML.
I have no idea what that means. XML is dialect of SGML. HTML has a syntax which is every bit as precisely-defined as XHTML.
The difference is that the XML Spec defines the behaviour of XML parsers more strictly. In particular, it requires them to fail on ill-formed content.
No such requirement is imposed on SGML parsers, and HTML has, by tradition, been handled by ultra-liberal "tag-soup" parsers."
Unless your document is being handled by an XML parser, you achieve absolutely nothing with XHTML (certainly not forward-compatibility, as Hixie famously argued).
The benefit comes with the combination of XHTML and CSS, allowing you to eliminate font tags and tables, amongst other things.
So use HTML 4 with CSS and table-less layouts. What the heck does that have to do with XHTML?
Robert Wellock is entirely right about reasons for the "XHTML saves bandwidth" canard.
When the Web Standards movement started picking up steam, they chose to go with a very simple marketing message. A) HTML is crap, forget about it. B) XHTML + CSS2 is a package deal. After four years of hammering away on this, people have hopelessly conflated the concept of HTML 4.01 with tables, font elements, and other legacy nastiness. This comment on the new MSIE team blog and comment #7 above are just two examples of this mentality.
The upside of this marketing campaign is that there is now high awareness of modern web design techniques. The downside is that people are choosing markup languages based on faith and superstition rather than "what's the best tool for the job?"
Whoops, I should have said, "high awareness among people who read Zeldman on occasion," not high awareness in general.
You can actually reduce that HTML example further if you don't mind the page screwing up in almost every browser known to Man:
<title/Test/<h1/Test/
PS: when you say "Valid XHTML" for the comments, you might want to note that you don't accept CDATA :)
Another point: If you take care for your mime types and send real HTML to HTML browsers and real XHTML to XHTML browsers you have to add a Vary-header. And every Vary-header makes caching more difficult for proxies — which costs more bandwidth.
Comment 7 and comment 8 are doing a good job on explaining the terminology confusion I mentioned in the post.
So use HTML 4 with CSS and table-less layouts. What the heck does that have to do with XHTML?
If all of your elements are not closed (i.e. missing </p>
, </li>
, etc.), you run the risk of CSS styles being applied to elements they are not supposed to. Consider it “style bleed” because the user agent does not know where to stop applying the style.
Aaron, it does. The user agent does know where it should apply the styles to. What you are referring to is another myth.
I like XHTML because it's neat, it's tidy and I can bung it through an XML parser. This can be really fricking handy; I used one when generating an index for a custom search engine.
It's also extensible, using XML namespaces. This isn't a myth.
It's also extensible, using XML namespaces. This isn't a myth.
And how many people make use of this?
For 99% of the internet population using standards as pushed by WaSP, HTML 4 is good enough. Just because an end tag is optional doesn't mean you must discard it. XHTML and CSS are not married to eachother. Last I checked, nested tables can still be valid in XHTML and CSS is allowed for all aspects of HTML.
You can have, for all intents and purposes a perfectly valid HTML + CSS website that closes all of its tags (p
, li
, etc.) that looks exactly the same as an XHTML + CSS version... but without extra slashes everywhere in your br
and img
tags. The size is probably barely noticable between the two for low volume sites like yours and mine, but if you've got a site like ESPN or other extreme high volume site, that could mean a big savings in bandwidth for something almost no one will notice.
P.S. It's really dissapointing that this commenting system is unable to handle something as simple as auto-paragraphing while avoiding paragraphing blockline elements.
So in summary, if I get this right:
Conclusion: For your average site use HTML (or XHTML if you need it or want to) with CSS and compress.
Am I right?
ChrisR
I would say for every website. I agree with the rest.
I would say for every website.
Unless your name is Charl and you like to combine namespaces with SVG! ;-)
Your example is about as contrived as it can get. Try this instead: http://www.stopdesign.com/articles/throwing_tables/
I have thought about this, but my whole reason to use nice XHTML + CSS is because it's smaller. It's the combination that does it. It's nonsense to create a page, without any mark-up, but while preserving semantics in XHTML per se. HTML still does a great job at that. It's the combination of all this that makes it interesting, not unstyled content (on purpose, not because of your 4.0 browser).
Myeah, just my 2¢...