Anne van Kesteren

XHTML was good for the web

21 May 2005

XHTML has been a very good term to convince others of different — often better — methods to achieve goals. Thanks to its similarity to HTML and pedantic followers a lot of websites have been converted to what we now call ‘semantic markup’. Nobody really understood what they were doing, but it was clear for most people that HTML consisted of superfluous TABLE and FONT elements and that for XHTML you needed to use CSS which can be cached and therefore saves bandwidth. Now, we know better.

The initial switch might have been too difficult if everyone was told to use semantic markup and CSS instead of what they were doing now. After all, XHTML is a shorter term and most people considered it to contain the ‘entire package’. However, what we see nowadays is that telling people to use semantic markup isn’t all that hard for them to grasp. I’m participating in a couple of forums and when someone comes with some table-for-layout-related-question he mostly gets an answer like: “Search Google for ‘semantic markup’.” Another example answer: “Visit A List Apart, HTML Dog and come back after two weeks of studying those sites.”

And contrary to popular believe: this actually works. People have no problem with learning more interesting things, abandoning frames and Adobe GoLive (the horror); all without mentioning XHTML. And when it’s brought up, the XML media type issue is raised and people either ignore that or choose the smart route. The idea that XHTML is better for mobile devices and ‘forward compatibility’ has diminished. The former thanks to Opera’s mobile device which renders even the most ugly tag soup site quite nicely because their customers demand support for it and the latter thanks to sending XHTML as text/html being forward compatible is proven an unattainable pipe dream from the few who believe in it.

As a result, I switched to HTML4 with the launch of my new weblog system.

Because I love playing with markup, I removed all the optional tags — not elements — in the process. This means I don’t have to write <html>, <head>, <body> and their end tags. In the DOM — and that’s what it’s all about — the elements are still there of course so I can style them if desired. I also omit </p>, </li> et cetera and all is still valid and works.

The backend is well-formed XML, but not entirely valid with my schema. Perhaps I should start to allow data URIs in posts… Because my backend data looks remarkably like HTML — it’s a restricted XHTML variant — I use some simple replacements functions. In the end it’s probably safer to use some DOM functions or even XSLT, however, PHP is a hack and so is my weblog.

Disclaimer: there are no real benefits in dropping optional tags except for saving bandwidth that could be done using other, better methods. There are of course benefits with using HTML in general as stated above and explained in the archives here and the rest of the web.

Comments

Can't you also change the tag names to uppercase, just for the sake of it? And also remove optional quotation marks around attribute values.
A little off topic: I think the post summaries for each post displayed at the archives could fit in a <META name=description>... just a thought.
Posted by zcorpan at 7:47PM
it could be me but what you (zcorpan) is saying sounds as bad as things could be to my opinion.First of all I find meta tags useless, second Uppercase tags is something I don't see the benefit of and third remove quotation marks??
What's next? going back to using tables for layout?
Posted by Yorian at 7:58PM
This is just wonderful. And pity that this is an advanced topic for too many people writing markup :(
I think you may have a follower here — I've got this after all (but dropping .html will improve it even further...).
Posted by Rimantas at 8:01PM
Use gzip for compression if bandwidth is so important. Removing open and/or end tag is ugly.
Posted by minghong at 8:18PM
What do we know better? We definitely know that CSS saves bandwidth compared to the same visualization using presentational markup - must not hold true for a single page, but will hold true for a large domain.
Posted by Jens Meiert at 9:00PM
Actually I think using ÜBERCASE tags would give the perfect finish to that retro look
But I'm afraid Anne is not concerned with looks
Posted by ghola at 9:39PM
Too bad you cannot specify meta data profiles (like XFN™) without a <head> properly. ☺
Posted by ACJ at 11:50PM
Uppercase tags do indeed take a byte less than lowercase tags...
Posted by Frenzie at 2:02AM
From a playing around/testing the limits-perspective, I appreciate your approach: it's something different. However, what I'm afraid of is that, since you have many readers, some of them won't understand all the nuances of this, but rather react like:
```
Wow, can I get away with this code and it still works?
```
I think many developers that are less skilled or less interested in interface code will look at your source code and then remove lots of things in their HTML code (but, of course, won't understand the reason for doing so, why it's ok etc), and end up delivering code that won't be valid nor semantic.
The advantage with strict HTML/XHTML is that it's easer to teach those developers who aren't as experienced within that area; the stricter guidelines they get, the lesser chance they will mess something up.
Is this the reaction you were hoping to get from someone? :-)
Posted by Robert Nyman at 4:56AM
>Zcorpan: Look closely, you'll see that the tags allowed in the comments are all written in uppercase! :-)
But the one not backward thing that's bugging me is this: I've never seen any kdb tag... Anne, could that be for forward-compatibility's sake?
Posted by Korbo at 5:29AM
Robert, you're spot on, there. You pinpointed the exact problem that stems from HTML and the solution lies, as we all Should already know, with XHTML.
Key phrase is: the stricter guidelines they get, the lesser chance they will mess something up
Anne's site may still validate, but as far as the concept of "lead by example" goes, Anne has (imo) become a terrible source and a terrible example. People are lazy, and Anne is one of the few people in the world that actually will go the extra mile to figure out why things work the way they do, up to the finest detail. Most people, by far, are nothing like that, and seeing this site's source will only cause them to create more tag soup, not knowing what it is they're doing, and not having any strict guidelines to help them stay at least a bit on the right track.
Posted by Faruk Ates at 5:32AM
I wish the whole web was XMLized, 'cause then it would be so easy to waltz through it using XSLT or XQuery (which I'd love to learn but find XSLT useful enough already). I'd like to ask you — Wouldn't it be great if the whole net were usable with XML tools? Why don't you let your site be available to any ol' XML parser? Wouldn't it be better for the net, while not removing any benefits from you?
Posted by Devon at 5:54AM
[...] I find meta tags useless [...]

Search engines may display the contents of the META description in the search results.
The HEAD element does not appear in the DOM in Opera if the <head> start tag is missing. This is a bug with Opera. Not that it matters since you don't style it and you don't use it in any scripts.
Posted by zcorpan at 6:12AM
What's next? going back to using tables for layout?

This is already happening. I visit a web design forum frequented by professionals, a few of which have dumped their DIV-based code for tables. I was shocked, but the reasons they gave definitely made sense. (Mostly to do with backward compatibility.) They also argued that site-wide changes were still easy, because they were using templates. CSS was still used, but obviously no DIVs. And like Anne says, no need to worry about handhelds now we have Opera 8's brilliant small-screen resizing feature.
Next can only come the return of FONT tags! :-)
Posted by Chris Hester at 6:38AM
Robert, if they take over my source code, they are still valid. And if they don’t go to the validator, we won’t help them.
Posted by Anne at 2:23PM
I know meta tags are in fact still usefull, though I'm pretty radical with those things I must say. I try to avoid suing most of the meta tags since quite a few important search engines don't use those anymore.
To give you an impression about the "radical" thing, if IE7 comes out it will propably have about replaced IE6 within quite short time for the biggest part. I myself plan to not be backward compatible for IE 6 and lower.
Getting back to the uppercase for tags, I wasn't aware of the fact that uppercase takes less bytes than lower case. So my fault there...
About going back to tables, well not everyone is perfect ;)
Posted by Yorian at 4:33PM
Faruk, Anne is perfect example, that you have to know your stuff and to make informed decisions.
There is no good if ignorant developers jump XHTML bandwagon unaware of mime types, document.write, case sensitivity, comments and other stuff. It is much worse — because one will be doing absolutely wrong thing with impression of being right and superrior. And in XHTML you have to go five extra miles to do it right.
If you know this, you are able to choose the best tool for the job.
What we need is to teach semantic thinking, not HTML vs. XHTML thinking.
Posted by Rimantas at 7:12PM
Google, MSN Search, AltaVista and Yahoo all use META descriptions.
Posted by zcorpan at 7:33PM
@Robert & @Faruk
I believe on the contrary that people will learn from what they see here. I don't think readers of this weblog are total noobs, they come here to learn and that's a very good opportunity to learn more about HTML.
I'm pretty sure many (like me) will have to run the new source code through the validator (yes, disregarding the claim at the top of the file) to make sure that validates, and when it does they'll want to learn.
Anyway people don't need an excuse to write bad code. And maybe if HTML had been explained more seriously from the start, the web wouldn't be so broken.
I'm not going to imitate Anne (anytime soon) but I'll gladly keep learning from his experience, and that's a good thing, ain't it?
Posted by ghola at 9:11PM
@Anne, @Rimantas, @ghola,
If someone rip your code, they will make changes to it (not your problem, I know). And I think many skilled people that regularly visit Anne are experienced, responsible people.
But I just imagine this scenario when some developer (it might be me) tells another developer, that doesn't know too much about interface code, too read Anne's log. Then he/she will look at the source and go: "Yay, I don't even need the HTML element, or close the P tags" etc, without knowing the reason why, and the resulting code will be terrible.
In the end, I definitely agree with Rimantas that the most important thing is valid semantic code, be it HTML or XHTML. If the scenario allows it, I prefer XHTML for being even more strict (and minimizing the possibility for my fellow developers to write sloppy code).
@Devon, I would really like an XMLized web, I love the different options to handle and re-use XML, and XSLT is one of my flings! :-)
Posted by Robert Nyman at 9:58PM
Anne’s Weblog about Markup & Style considered harmful? Really?
Posted by ghola at 10:39PM
Uppercase tags do indeed take a byte less than lowercase tags...

In which character encoding exactly? I just made a simple test case about this, which turned out the opposite of what you said. What did I do wrong? :)
Posted by Krijn Hoetmer at 11:04PM
@Rimantas,

Clearly you are ignoring the fact that not everybody cares to know their stuff and make informed decisions. In fact, I think most people actually know that they don't know their stuff in full detail. They just don't care.
The problem is that these people can run into Anne's log by any means, and will start to adopt this behaviour without caring to know why or how. And then their pages won't validate and they won't care anymore, too, thinking "oh, probably just some tag I forgot to close or whatever". And what do we see, then? Exactly what has been the very start of this all: people writing sloppy markup and not caring.
Semantics? They'll just as easily be lost along the way.
The whole XHTML-vs-HTML thing isn't about the actual X in XHTML, it's about strictness, proper coding habits and writing exemplary code. Anne is giving the wrong example to many unknown visitors, and in doing so he's become an excellent trademark for HTML itself: the wrong example.
Semantics are most important, and can be achieved just as well with HTML, but don't expect people to care about all the finer details. Stricter rules and guidelines are the only way to get them to care about fine details, and semantics are, for a large part, an issue of fine detail.
Posted by Faruk Ates at 11:21PM
Faruk, I lost you completely after the copy and paste bit caused people to stop care about validation. Actually, I didn’t get that either.
Posted by Anne at 11:28PM
Anne, that only shows that you're lost on the issue itself, something I'm not surprised about.
Just because people read your blog doesn't mean that they also care about validating. They can, however, steal your code as an example, and knowing that you're "doing it right" will give them the feeling that when using your code, they're doing it right as well. But not if, in the most obvious of all possibilities, they're actually making an XHTML site.
Should people know better than to copy HTML markup and use it as XHTML? Yes. Will they? No. Are you going to convince everyone? No. Is XHTML more reliable? Yes.
Posted by Faruk Ates at 11:39PM
Like people will know about XHTML, but not about validation. Your remarks are getting more hilarious. Doing it right involves media types. A layer almost nobody can cope with. Doing it right involves character encoding. A layer almost nobody can cope with. Doing it right involves semantics. A layer almost nobody can cope with. Doing it right also involves CSS. There are so much things to learn, validation will be on that path.
And my site being an XHTML site is not at all the most obvious of all possibilities. Most sites are written in HTML. Most sites’ markup sucks compared to mine.
And I hereby grant you the right to spam further, but shall refrain from further commentary as it’s getting an order of magnitude less interesting every time we have this discussion.
Posted by Anne at 12:09AM
Not to come between you guys, but to lighten up this conversation a little bit; some people (at my internship) actually copy <div id="container"> ... </div> and entire stylesheets, just 'because it centers the page' :) How's that for stupidity?
By the way, I wrecked my test page. I really don't understand what's going on here.. Anybody?
Posted by Krijn Hoetmer at 12:28AM
If ever we needed proof that web designers != programmers, we've found it with this comment: Uppercase tags do indeed take a byte less than lowercase tags…
I was hoping this fellow was joking, but it seems not. A character takes up 8 bits, one byte. Doesn't matter what the character is. (At least in ASCII. I'm sure Anne can talk your ear off about Unicode and whatnot.)
Posted by Michael Newton at 1:49AM
Oh come on. There is nothing wrong with using HTML 4.01. If you are serving text/html, using HTML 4.01 makes perfect sense.
Is my site harmful, too? There are even upper-case tags!
And no, XHTML as text/html is not any more reliable. Sorry.
What annoys me is pages like one of the recent linkroll entries giving me the yellow screen of death in Camino, because the author expects UAs to know about XHTML character entities.
Posted by Henri Sivonen at 1:56AM
I think the only situation where upper/lower case makes a difference in terms of bytes is if you uppercase ‘i’ according to the Turkish rules so that it becomes ‘İ’ and compare them as UTF-8 bytes (one byte and two bytes respectively).
Posted by Henri Sivonen at 2:01AM
The whole XHTML-vs-HTML thing isn't about the actual X in XHTML, it's about strictness, proper coding habits and writing exemplary code.

I couldn't disagree more.

Progress means getting away from the limitations of the tag-soup parser. Forget, even, extending the language via namespaces (XForms, MathML, ...). Consider, say, XHTML 2's change in the syntax of the <p> element. This makes for a big improvement in the semantics of <p> (which are fairly broken in HTML).

But you can't implement a change like that in HTML (or faux-XHTML, which is meant to be parsed by the tag-soup parser). Changes like that require a new parser — like the XML parser. Which is to say, they require the "X" in XHTML.

Anne's coding habits are as "proper" as they come (though I do like to goof around with his primitive attempts to convert XHTML to HTML using a few string substitutions).

They can, however, steal your code as an example, and knowing that you're "doing it right" will give them the feeling that when using your code, they're doing it right as well.

Bull. If they're not validating their code, there's very little chance that they are doing it right, whether they are copying from you or from Anne. It's not who you steal from; it's what you do with what you've stolen.
Posted by Jacques Distler at 2:19AM
I don't want to take this too far now, but I just have to say this:
```
It's not who you steal from; 
it's what you do with what you've stolen.
```
Well, that was Faruk's and my whole point. People who don't care about validating steal code from someone that's better than them (in this scenario, Anne), mix around with it (and don't validate it) and then still think it's ok, because they've stolen it from an experienced source.
Have you never met web developers out there that don't care about doing it right as much as we do? In a semi-attempt to be correct, they usually steal code from someone well-known and think they're done with it. I see this every week (if not every day).
Most of these people won't understand the reasons why some end tags are optional. If we can show/tell them to close every tag, i.e. being consistent and the code more easy for them to grasp, I think they would have an easier time to understand it all.
Posted by Robert Nyman at 3:19AM
For those who are still trying, uppercase or lowercase makes no difference in size. It was actualy a joke which some of us were (how do you say this?) cooperating with.
The exception Henri mentioned is something I'm not aware of.
Posted by Yorian at 4:48AM
Well, that was Faruk's and my whole point. People who don't care about validating steal code from someone that's better than them (in this scenario, Anne), mix around with it (and don't validate it) and then still think it's ok, because they've stolen it from an experienced source.

In that case, it doesn't matter a whit whether they steal HTML4 code from Anne, or XHTML from Faruk, what they produce with it will be crap in either case.

I don't see why crap XHTML is somehow "better" than crap HTML.

If you were going to argue that X(HT)ML, with its simplified syntax, is easier to teach to a 9 year old child than HTML, then I'd be halfway inclined to agree with you. I say halfway, because issues like MIME-type, character-encodings, namespaces, etc. can be pretty gnarly. And an XML parser is much less forgiving of mistakes.
But if you're going to talk about people learning by 'viewing-source,' they're much better off cutting and pasting Anne's HTML code than Faruk's XHTML and then wondering why, when it's served as application/xhtml+xml (as Faruk does), it blows up in their faces.
Posted by Jacques Distler at 6:29AM
I am amazed.
Robert, are you advocating that everybody dumb down everything just because newbies might get complex things wrong? Beginners are going to make mistakes and if they're ever to evolve they will have to do some learning. That's life I guess
This is somewhat of a cutting-edge weblog (which used to have a cutting-edge design, but let's not get into that, I'm still in mourning about it) and an experiment Anne is sharing with us (the world). Should nobody experiment? Should everybody use a defined set of safe templates?
Posted by ghola at 7:52AM
@Jaques, @ghola,
Yes, I was talking about it from a teaching point of view. The MIME type issue etc weren't part of the argument. The teaching also applies to introducing strict HTML with closing every tag there too like P and LI tags (IMG, BR and LINK being the exceptions then).
I know this is a cutting-edge weblog and I do appreciate the code from an experimenting aspect (as I wrote in my first comment). But what I wanted to stress is that there are many people out there, definitely not newbies in developing but people that don't care about the finer details of validating their code, and if you open up Pandora's box to them with optional end tags it will just go downwards from there.
I meet them in almost every project I work in, and rarely is the one who's writing the interface code an expert when it comes to it. From all the jobs I've had and all the projects I've been in, I've seen a shortage of interface developers, they usually end up with some system developer, who drew the shortest straw, writing the interface code.
I think I dare taking it so far as to saying that here in Sweden, very few company web sites and/or CM systems validate/generate validating code, and that is because they had no experienced interface developer, but just thought a system developer would do. These system developers would then steal code from someone else (someone "famous"), think it's ok and feel very content with themselves (without knowing why some things are optional and the inner workings of interface code).
Posted by Robert Nyman at 12:06PM
I mentioned this before, some months ago, butI think, I could tell this again: As someone who Works mostly with XML and not with HTML or XHTML, these two standards HTML and XHTML only seem to me as somekind of a webfrontend, a format for presentation. And the fact that Anne could change his blogs code almost immideatly from XHMTL to some wired but valid HTML shows, that the Frontend doesn't matter anyway.
As Webdevelopers get more and more common to work with template enginges like PHP/Smarty or somekind of template based Content Management Systems, the actual sourcecode of webfrontends get less and less important, as we can a)change sourcecode for masses of content with a mimimum of work and b) syndicate the content in other Formats, like RSS, Atom, RDF, XTM or whaterever comes around.
Soon we will see the rise of SVG Websites and in two or three years every Blog will have a X-HTML-Skin and an SVG-Skin, I'm sure of this. This will make it even clearer, HTML is a presentationl format, no matter how much work you put in semantic markup. What's really important is how you store your data. The more precisely you structure your information in your backend, the more options you'll have to use these information in any form of presentation or syndication. And within your backend you will use some XML Standard to represent your textual content.
Posted by ben at 4:14PM
Yorian wrote:
First of all I find meta tags useless

Most of them are. But META-description is not: Some search engines use it for page summaries in some cases. But more important: Opera uses it when creating bookmarks. This is a great feature for organizing and identifying bookmarks when a good description is used on a page.
Robert Nyman wrote:
I think many developers that are less skilled or less interested in interface code will look at your source code and then remove lots of things in their HTML code (but, of course, won't understand the reason for doing so, why it's ok etc), and end up delivering code that won't be valid nor semantic.

But that might also happen with the best semantically, validating (and well-formed) (X)HTML. It does not matter if someone breaks copied HTML or XHTML markup; well, breaking XHTML might be a tiny bit worse because of non-tagsoup parsers (speaking from a site visitor’s view, not an author’s view).
I do not mind Anne for using proper, “minimized” HTML (and a “minimal” design). There is nothing bad about “pushing the limits” with a technology, experimenting, showing its features, etc. In the end, Anne does not use SHORTTAG – which indeed would hurt – but valid, semantically rich HTML.
Chris Hester wrote:
I visit a web design forum frequented by professionals, a few of which have dumped their DIV-based code for tables. I was shocked, but the reasons they gave definitely made sense. (Mostly to do with backward compatibility.)

Backward compatibility to what? Netscape 3 and 4? Okay, there are two or three Netscape 4s still around in large companies or on old university computer systems. But nevertheless: LOL!
```
:-)
```
Yorian wrote:
if IE7 comes out it will propably have about replaced IE6 within quite short time for the biggest part.

I do not think so. After IE6 came out, IE5 was around a long time. With most people, the Internet Explorer updates when they buy a new computer (and therefore a new Windows version; Longwait^H^H^H^Hhorn [Anne, why no DEL and INS elements?]). Only a tiny fraction of people on the Web care about browsers and update more or less regularly (Web designers, for example, but still not all of them). Of course, Microsoft will push IE7 (when it is finally out, which might last still a long time, IMHO), but many people are concerned about Windows Update, because they heard it might be spying their hard disk for illegal software. (After all, most of the worms and trojans and viruses around are using security holes that were already fixed with some patch from there.)
And IE7 will not be available for all Windows versions. As far as I know, only the Windows XP version is sure. A Windows 2000 version might come. But there also still lots of Windows 95 and 98 systems around.
Ah, and by the way: I prefer being rather pessimistic about the features and bug fixes of IE7. It is just a feeling, but Microsoft has proven its incompetence and its “own standards” too often. Perhaps they might surprise me and I will be happy. I hope so. But till then I prefer being pessimistic.
Anne: You need a “quote” function for your comments. I do not mind writing HTML, but quoting is quite laborious. (This is one of the many reasons, why I like Usenet discussions better than Weblogs [or Web forums], BTW.)
Posted by Lars Kasper at 1:42PM
Lars,
But that might also happen with the best semantically, validating (and well-formed) (X)HTML

Absolutely. There's no way one can have interface code that can be ripped and tweaked without breaking, if the person who does don't have the skills or knowledge.
What I was merely going for is that it's easier to teach developers that every element has to be closed, since it's more consistent that way and makes it easier for them to grasp it, as opposed to introducing exceptions that might confuse them.
Posted by Robert Nyman at 3:21PM
Robert Nyman wrote:
What I was merely going for is that it's easier to teach developers that every element has to be closed, since it's more consistent that way and makes it easier for them to grasp it, as opposed to introducing exceptions that might confuse them.

Of course, you are right.
But: Even if you tell that every element has to be closed, the ways how to close elements differ, because of lots of empty elements around. So this is the point where I do not follow the arguments that XHTML is so much easier to teach than HTML: People must still learn what empty elements (like IMG, BR, INPUT) are and to close them differently (in the same tag) than the others.
(And – in real world – people have to learn XHTML and HTML. Simply because of the fact that they will encounter both frequently on the Web and will have to work with it.)
Posted by Lars Kasper at 8:31PM
Lars,
Absolutely, the ideal scenario would be if people would really learn it all, and the why's and how's.
Posted by Robert Nyman at 1:56AM
Your minimalistic approach to HTML is simply a mind-blowing experience. I had no idea you could strip it out by so much.
Posted by Paul Goscicki at 11:33PM