Anne van Kesteren

XML Schema

XML Schema is developed by the W3C like other XML things. When I first played with XML I thought it was a replacement for DTDs written in XML and it is indeed something similar. It's not entirely a replacement for them, since you can create entities using a DTD, but XML Schema offers a similar solution (thanks Jurriaan). Of course, it's debatable if using elements doesn't pollute the markup and if using entities is actually necessary, but that might be something for another post or something to comment about.

One of the advantages of XML Schema is the ability to describe the element contents and attribute contents in much detail. You could even have the element EXAMPLE occuring twice in a document while the first time it's of the type xs:float and the second time it's xs:string. Basically, this means that XML Schema makes the element name, namespace combination even more complex by basing the semantics of the element on it's location within the XML document.

Neat stuff, for sure.

XML Schema has several built-in datatypes which should be prefixed with xs: to use them. If you have for example assigned the datatype xs:float to a certain element all the following are equivalent: 2.33, 02.33, 0.233E1 et cetera. If that element was remapped to the xs:string datatype all previous mentioned samples would be different. Not that difficult, really. Let's describe the following element:

<title lang="en">Example title.</title>

The following schema can describe it. Note that this isn't a full schema, just a part of it describing the above mentioned TITLE element:

<xs:element name="title">
 <xs:complexType>
  <xs:simpleContent>
   <xs:extension base="xs:string">
    <xs:attribute ref="lang"/>
   </xs:extension>
  </xs:simpleContent>
 </xs:complexType>
</xs:element>

This element is a "complex type" because it has an attribute. Otherwise it would be "simple type", since it's content model is simple too. The xs:string datatype implies, obviously, that the content should be string. As you can see the attribute is defined elsewhere in the shema. That could be done quit simply using:

<xs:attribute name="lang" type="xs:language"/>

Putting <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> before and </xs:schema> after and you have made yourself a schema.

Of course, there is a lot more to explore in this 800 pound specification (actually existing of two specifications and a non-normative document), but I'll tell you about that some other time.

Comments

  1. Anne, could you please explain things easier for somebody who has never even looked at XML Schema? I think I understood, but it required some thought! Thanks :)

    Posted by Mark Wubben at

  2. Hmm, I tried to keep it as easy as possible. I'll look at it tomorrow or maybe later tonight.

    Posted by Anne at

  3. There are also some downsides to XML Schema. Mostly that it's to complex and because of that there aren't many validators (wich is true, i'm still looking for a way to validate against an .xsd file in PHP). mr. expat Jim Clark sees more in RELAX NG in most situation (wich doesn't mean XML Schema should be abandoned).

    Posted by Jurriaan at

  4. http://www.php.net/manual/en/function.dom-domdocument-schemavalidate.php and http://www.php.net/manual/en/function.dom-domdocument-relaxngvalidate.php

    thank you php5

    Posted by Meint at

  5. W3C Schemas are OK. I think they are easy to understand; although, lots of people hate XML Schema part 1. However, XML Schema part 2's primitive and derived built-in datatypes have seen a lot of use. Many other successful specifications - like RELAX NG, RDF and XPath 2.0 (although the latter's spec has not yet been finalized) - commonly use XML Schema Part 2 since it is useful, easy to support, and based on the successful DTD types of SGML and XML (and primatives in languages like Java). However for document validation, RELAX NG , Schematron, and even DTDs seem more useful and widespread. I think it should be split off into its own standard, since that's what the community seems to be gravitating toward.

    Posted by Jimmy Cerra at

  6. First of all: It was nice to meet you! I hope I can do a project with you at Q42 and to have someone to argue with about the stupid little details of the semantics of HTML. :)

    Now about this article. I think you should declare the title element like this:

    <xs:element name="title">  <xs:complexType mixed="true">   <xs:attribute ref="lang"/>  </xs:complexType> </xs:element>

    This makes your schema forwards compatible with f.e. ruby annotations. (Which was probably one of the reasons to use an element in the first place.)

    Posted by Sjoerd Visscher at

  7. Hey, where did those line-breaks go? It looked gooed in the preview.

    test hoi

    Posted by Sjoerd Visscher at

  8. O yeah, Wordpress eat's them. Nice to meet you too, by the way :-)

    Posted by Anne at

  9. The last time I worked with XML Schema was months and months ago. However, it is a very nice specification. Personally I prefer it heavily to DTD. Schema is actually quite easy for basic stuff, but it can get very involved if you really want to utilise its power.

    I think we actually need a good online XML Schema tutorial though. Last I looked (maybe the situation changed now) there weren't many good ones available on the Internet, and trying to figure everything out from the specification is not equally easy for everybody. (I happened to learn myself mainly for other people's schemas along with the W3C docs as a reference.)

    The XML Schema Tutorial at W3Schools is a very basic introduction, but doesn't go nearly far enough.

    Posted by Charl van Niekerk at

  10. Just to show that I am a fellow nitpicker: if you use lang="nl" in your example, you also should give a Dutch text...

    Posted by Jeroen at

  11. Heh, how does en sounds?

    Posted by Anne at

  12. Sjoerd: replace your line breaks with <br />’s and it will be ok.

    Anne: about your question about the need for entities, in Mozilla &brandFullName; is used to write the product's name to cope with name changes (including e.g. a Netscape-branded version of Firefox).

    At work, a colleague of mine who is somewhat more senior in the whole documentation thing than I am expressed a need for a similar construct. I'm not yet entirely convinced that it has huge advantages over a global search/replace (because there's always a context which might need to be adaptated as well, so you need to go over each instance of it anyway), but I guess it could be useful. At least it's easier to globally search for, because it will be pretty much a unique string, and if it is used consistently a name change will always be applied, even though the context might be broken by it.

    But I agree that people should just use unicode for plain and simple character entities :). On the other hand, the keyboard input methods are currently still a bit lacking in the curly quotes department, so for those character entities might be an easy solution, although then, too, I'd be in favour of unicode (accented characters like á are no problem, fortunately, so as far as I'm concerned &aacute; should be banned from this planet :)).

    ~Grauw

    Posted by Laurens Holst at

  13. A little more on-topic: yes I like XML Schema too. I've never used DTD, actually, only schema.

    ~Grauw

    Posted by Laurens Holst at