Anne van Kesteren

XML versus XHTML

Lots of people (and books) think that XHTML is just a step between now and the future. From their point of view, the future is XML. The first error here is that XHTML is XML, the second has to do with semantics (if I'm not mistaken). In books all over the world you can read (quite bad written I think, since I began to understand how it works, when I was experimenting myself) that your custom AUTHOR element helps Google with indexing your site. Because Google would 'know' what the element contains, according to these books.

Of course that is a big mistake. Google doesn't know the difference between BOOK|AUTHOR (using CSS3 for namespace differentiating in this post) and WEB|AUTHOR, actually, Google doesn't know the element at all. Google does know the HTML elements, like Hx or STRONG. Many people know these elements as well. That means those elements get some kind of 'value' and it also implies that XHTML has more semantics than XML and will not be replaced therefore.

Wrong? Correct? Comment!


  1. Correct. XHTML will _never_ be 'replaced' by XML. When you use XHTML, you add well-defined structural and semantic meaning to your document.

    Posted by Thijs van der Vossen at

  2. Hey Anne, I know we traded messages about this offline (my inbox since corrupted and deleted), and what I understand is that, as you say, xhtml is XML (provided it is delivered as such etc.).

    This I suppose is reflected in your blog here, which validates as XML but for all I can see looks like xhtml*, but for the addition in your doctype. (*Last time I checked)

    I'm very new to understanding XML (though strangely I have worked with it in the past), but after what you told me I started thinking of xhtml as a sort of set of accepted xml tags (rules?) defined by it's dtd. It has great semantic value because it is a widely used (and abused!) standard, having built on HTML.

    Is this an accurate representation?

    (by the way, your blog never 'remembers' me when I click remember in the 'Add a comment' page).

    Posted by Mike at

  3. Any XML can be just as semantic as XHTML. 'Semantic' is meaningful. So <paragraph> is just as semantic as, in fact more semantic than <p>.

    XHTML won't go away because it's a commonly understood form of XML. Not only will Google be able to understand parts of your document, so will user agents.

    Obviously, screen readers depend on an understood meaning so that elements can be read properly.

    And in visual browsers, XHTML elements have default styles that browsers tend to stick to. If we all had our own form of XML in place of HTML, our CSS would have to be much more detailed (having to specify the display property for everything for example).

    While XML remains, a need for a common XML format remains.

    Posted by Patrick Griffiths at

  4. I think XML, in its barest form, is wholly unsuitable for web authoring. What made HTML such a clever idea is the fact that it consisted of a relatively small number of easy-to-learn elements. XHTML is an application of XML, designed to offer all the simplicity of HTML, but with the addition of XML's power.

    I do not see XHTML as a transitory thing. I am sure it will develop, so that it will resemble HTML less and less, but it will definitely endure.

    Note to Anne: I finally took the si-blog dynamic. As per your repeated request, you can now post comments! As a result, I will be posting more often (mainly because I can do it anywhere through the admin panel I built), so you can edit your Externals page description. Also, I concur that there is a problem with your Remember cookie.

    Posted by Simon Jessey at

  5. As Patrick Griffiths says, XML can be as semantic as you want it to be. The problem is: who defines the semantics? This is what makes XHTML a very special case of XML. It consists of elements whose semantic meaning, although vague at times, is well defined and understood by user agents and search engines both.

    A couple of years ago, I believed that the ultimate web site would be made with XML, XSLT. and CSS. I could define my own semantics and tailor tags to show exactly what things were. The only thing that saved me from making a complete fool of myself at the time was that my browser had such poor XSLT support.

    Now I realize that it isn't enough that the semantics of my markup makes sense to me; it must also make sense to the applications that will use it.

    Will XHTML be forever? Probably not. I think specialized XML-based element collections will evolve for some common areas. MathML and SVG are examples of this, but maybe in a few years there will be a LibraryML for books, an AutomotiveML for vehicle-related stuff and so on. If this happens, it will have to be limited to a few areas, because user agents will have to understand the underlying semantics. Maybe we'll have to invent a definition language that describes the semantics? :-)

    Posted by TOOLman at

  6. XHTML will endure, and not be replaced by some random XML vocabulary. This is because XHTML is an XML vocabulary with known semantics, and this knowledge is powerful, valuable and not to sneer about.

    But I do think we will see more and more XML vocabularies rise to the surface, and get mixed into XHTML and also used externally, like MathML, RSS and Atom. The more a vocabulary is used, the greater the chance Google and other User Agents will understand it.

    XHTML 2.0 will be richer than all (X)HTML versions we have as of now, but it will still not cover all the semantics people urge to explain about their documents. Therefore, it will be more and more common to extend XHTML documents with custom and widely adopted namespaces, which might have meaning to some, and not to others. The beauty is that this can be done without harming anyone, as long as we stick to a base format as XHTML.

    RSS and Atom (as XML vocabularies) tries to exist alongside XHTML, but will not replace it. Neither will any other vocabularies, imho.

    Posted by Asbjørn Ulsberg at

  7. XHTML will last an extremely long time as others have pointed out its eXtensibilty aspect will increase in the future as will the language modularisation options and integration.

    Google is inherently geared towards understanding HTML thus it won't really understand introduced XML "elements" at this moment in time.

    Again, we are still waiting until the XML Processor usage becomes more mainstream and legacy HTML-Only User Agents are condemned to the scrapheap before we can really move forward - in a mainstream sense.

    Posted by Robert Wellock at

  8. xhtml is the future for web pages. xml is the future for many other things such as web applications, backends, etc, but not for what is sent to browsers. xhtml is nice right now. there don't need to be any more revolutions.

    css, in its current implementation in modern browsers, leaves a bit to be desired, including a lot of stuff coming in css3 with no browser support. as css and css support get better, xhtml need not. xhtml is clean, flexibe, and already has semantic meaning associated with it. plus it's backwards-compatible with html to a certain extent.

    it's good enough markup for web pages.

    Posted by vlad at

  9. I agree with most of you (especially Asbj�rn Ulsberg). The way I view it is like this: XHTML is implementation of XML. XHTML is just a standard to describe fairly simple document contents and their structure in.

    More specialized standards will arise to describe other kinds of content: mathematics, forms, graphics, chess games, schedules, etc. Software will be made that understands these dialects of XML. It is my opinion that only documents written in standards like these can have 'semantic value'.

    What must be said, though, is that XHTML is far from perfect, even for simple documents. Some common document elements are missing, some nesting rules are crap. Maybe someday a better standard will be devised.

    Posted by Menno at

  10. is ... in fact more semantic than


    Well, I fail to see this one.

    Would the word "paragraph" have less meaning if it were in French, or German, or Dutch? As long as the tag used for a purpose is used to "mean" what it "means" in the language being used, then it has appropriate "meaning". That it is designated with a "p" in HTML is useful for English speakers, but would it have less "meaning" if it were, say, tag c147? Of course not.

    XHTML is good because, as has been observed, it is a "lightweight language', simple in structure and with few tags, which are in common use.

    Posted by Michael at

  11. XHTML is the renew of HTML. A XHTML document is both HTML document and XML document. The most important differnence form XHMTL to HTML is extinsible. I think the future document on the web will look like this , XHTML + MathML + SVG .

    Posted by OOO at