Combining the power of W3C XML Schema and Schematron

By Eddie Robertsson
Febuary 11, 2002

Abstract

This article shows how to combine W3C XML Schema and Schematron by inserting Schematron rules in the appinfo element of the W3C XML Schema.

Table of contents

Introduction
    Dependant attributes
    Interleaving of elements
    Co-occurrence constraints
    Dependancy between XML documents
Introduction to Schematron
    Schematron hierarchy
        Assertions
        Rules
        Patterns
    Schematron processing
Embedded Schematron Rules
    Dependant attributes
    Interleaving of elements
    Co-occurrence constraints
    Dependancy between XML documents
Processing
Summary
Acknowledgements

Introduction

After the W3C ratified W3C XML Schema as a full recommendation on May 2nd 2001 it has become clear that this is the most popular XML Schema language for developers. Many believed that W3C XML Schema would solve all the problems that existed with validation of XML documents but this was never the goal of W3C XML Schema. In the purpose section of the specification it is clearly stated that:

"However, the language defined by this specification does not attempt to provide all the facilities that might be needed by any application. Some applications may require constraint capabilities not expressible in this language, and so may need to perform their own additional validations."

When W3C XML Schema is not powerful enough there are other options for developers. One option is to find a different XML Schema language that can express all the needed constraints. Another option is to add extra code to your application to check the things not expressible in the W3C XML Schema language. A third option, made available through one of W3C XML Schema's extension mechanisms, is to combine W3C XML Schema with another XML Schema language.

This article will provide an explanation and several examples of how Schematron rules can easily be embedded within W3C XML Schemas. Schematron has its strengths where W3C XML Schema has its weaknesses (co-occurrence constraints) and its weaknesses where W3C XML Schema has its strengths (structure and data types). In the examples provided W3C XML Schema is used as far as possible and then the embedded Schematron rules are used to express what cannot be done with W3C XML Schema alone.

The following four areas, which W3C XML Schema does not fully address, will be covered: dependant attributes, interleaving of elements, co-occurrence constraints and relationships between different XML documents. A short introduction to Schematron is provided but the reader will need a basic understanding of W3C XML Schema to benefit from the article.

Dependant attributes

The W3C XML Schema allows attributes to be declared on elements and the occurrence of the attributes can be controlled to be either optional or required. However, in some cases this is not enough and what is really needed is to define that the attributes have some form of dependency between them, for example that one of two attributes must appear but not both.

Interleaving of elements

The introduction of the all group was a feature that many were waiting for. The idea of the all group is that it allows the child elements in the group to appear in any order. Unfortunately the all group is not as useful as many had hoped because of some restrictions put on the declaration to simplify validation.

Co-occurrence constraints

A co-occurrence constraint is a constraint between components in an XML instance document. W3C XML Schema has limited support for this through the identity constraint functionality, which can specify that one element or attribute's value should refer to another element or attribute. In many cases this is not enough and it would be useful to express constraints like, for example, that if an element State has the value of NSW then the element Country must be Australia. Another example would be that if attribute currentTime="3am" on element Calendar then attribute currentState on element Person should be 'Sleeping' unless element Calendar's attribute currentDay="Friday" in which case attribute currentState should be 'At party'.

Dependancy between XML documents

Most XML Schema languages lack functions for applying constraints between XML instance documents. In many cases this is useful and it could for example be that one document contains a database with specific items and then other documents refer to these items. In this case it would be very useful to validate the each item referenced actually exist in the database document.

Introduction to Schematron

The Schematron schema language differs from most other XML schema languages in that it is a rule-based language that uses path-expressions instead of grammars. This means that instead of creating a grammar for an XML document a Schematron schema will make assertions applied to a specific context within the document. If the assertion fails, a diagnostic message that is supplied by the author of the schema can be displayed.

One of the advantages taking this rule-based approach is that in many cases the Schematron rules can easily be created by modifying the wanted constraint written in plain English. For example, a simple content model can in plain English be written like this: "The Person element should in the XML instance document have an attribute Title and contain the elements Name and Sex and the Name element should appear before the Sex element. If the value of the Title attribute is 'Mr' then the value of the Sex element must be 'Male'".

In this sentence the context in which the assertions should be applied are clearly stated as the Person element while we have four different assertions:

  1. The context element (Person) should have an attribute Title
  2. The context element should contain two child elements, Name and Sex
  3. The child element Name should appear before the child element Sex
  4. If attribute Title has the value 'Mr' then the element Sex must have the value 'Male'

In order to implement the path-expressions used in the rules in Schematron, the W3C XPath language (XPath) is used with various extensions provided by XSLT (Extensible Stylesheet Language Transformations). Since the path-expressions are built on top of XPath and XSLT it is also trivial to implement Schematron using XSLT stylesheet, which is shown in the section Schematron processing below.

It has already been mentioned that Schematron makes various assertions based on a specific context in a document. Both the assertions and the context make up two of the four layers in Schematron's fixed four-layer hierarchy that consists of phases (top-level), patterns, rules (defines the context) and assertions.

Schematron hierarchy

In this introduction only three of these layers (patterns, rules and assertions) will be covered since these are important for using embedded Schematron rules in W3C XML Schemas. A full specification of Schematron and its various use cases can be found at the Schematron website.

In short the three layers covered in this section are constructed so that each assertion is grouped into rules and each rule define a context. Each rule is then grouped into patterns, which are given a name that is displayed together with the error message (there is really more to patterns than just a grouping mechanism but for this introduction this is sufficient).

The above example specified a very simple content model (see below) that will be used to explain the three layers in the hierarchy.

<Person Title="Mr">
   <Name>Eddie</Name>
   <Sex>Male</Sex>
</Person>

Assertions

The bottom-layer in the hierarchy is the assertions, which are used to specify the constraints that should be checked within a specific context in the XML instance document. In a Schematron schema the typical element that is used to define assertions is, assert. The assert element has a test attribute, which is a modified XPath expression 1 . In the above example there was four assertions made on the document in order to specify the content model, namely:

  1. The context element (Person) should have an attribute Title
  2. The context element should contain two child elements, Name and Sex
  3. The child element Name should appear before the child element Sex
  4. If attribute Title has the value 'Mr' then the element Sex must have the value 'Male'

Written using Schematron assertions this would be:

<assert test="@Title">The element Person must have a Title attribute.</assert>
<assert test="count(*) = 2 and count(Name) = 1 and count(Sex)= 1">The element Person should have the child elements Name and Sex.</assert>
<assert test="*[1] = Name">The element Name must appear before element Sex.</assert>
<assert test="(@Title = 'Mr' and Sex = 'Male') or @Title != 'Mr'">If the Title is "Mr" then the sex of the person must be "Male".</assert>

For people familiar with XPath these assertions are easy to understand but even for people with limited experience using XPath this is rather straightforward. The first assertion simply tests for the occurrence of an attribute Title. The second assertion tests that the total number of children is equal to two and that the number of Name and Sex child elements is one. The third assertion tests that the first child element is Name and the last assertion test that if the Title is 'Mr' then the sex must be 'Male'.

If the condition in the test attribute is not fulfilled the content of the assertion element will be displayed to the user. So, for example, if the third condition was broken (*[1] = Name) then the following message would be displayed:

The element Name must appear before element Sex.

Each of the above assertions has a condition that is evaluated but the assertion does not define where in the XML instance document this condition should be checked. For example, the first assertion test for the occurrence of the attribute Title but it is not specified on which element in the XML instance document this assertion should be applied. The next layer in the hierarchy, the rules, specifies this location (the context) in the XML instance document.

Rules

The rules in Schematron are declared by using the rule element that has a context attribute. The value of the context attribute is the same modified XPath expression as for the test attribute on the assertions. Like the name suggest, the context attribute is used to specify the context in the XML instance document where the assertions should be applied. In the above example the context was specified to the Person element so a Schematron rule with the Person element as context would simply be:

<rule context="Person"></rule>

Since the rules are used to group together all the assertions that share the same context the rules are designed so that the assertions are declared as children of the rule element. For the above example this means that the complete Schematron rule would be:

<rule context="Person">
   <assert test="@Title">The element Person must have a Title attribute.</assert>
   <assert test="count(*) = 2 and count(Name) = 1 and count(Sex) = 1">The element Person should have the child elements Name and Sex.</assert>
   <assert test="*[1] = Name">The element Name must appear before element Age.</assert>
   <assert test="(@Title = 'Mr' and Sex = 'Male') or @Title != 'Mr'">If the Title is "Mr" then the sex of the person must be "Male".</assert>
</rule>

This means that all the assertions in the rule will be tested on every Person element in the XML instance document. If the context should not be all the Person elements it is easy to change the XPath to define a more restricted context. The value Database/Person would for example set the context to be all the Person elements that have Database as its parent.

Patterns

The third layer in the hierarchy is the pattern, declared using the pattern element, which is used to group together different rules. The pattern element also has a name attribute that will be displayed in the output when the pattern is checked. For the above assertions you could for example have two patterns, one for checking the structure and one for checking the co-occurrence constraint. Since patterns group together different rules Schematron is designed so that groups are declared as children of the pattern element. This means that the above example, using the two patterns, would look like this:

<pattern name="Check structure">
   <rule context="Person">
      <assert test="@Title">The element Person must have a Title attribute.</assert>
      <assert test="count(*) = 2 and count(Name) = 1 and count(Sex) = 1">The element Person should have the child elements Name and Sex.</assert>
      <assert test="*[1] = Name">The element Name must appear before element Age.</assert>
   </rule>
</pattern>
<pattern name="Check co-occurrence constraints">
   <rule context="Person">
      <assert test="(@Title = 'Mr' and Sex = 'Male') or @Title != 'Mr'">If the Title is "Mr" then the sex of the person must be "Male".</assert>
   </rule>
</pattern>

The name of the pattern will always be displayed in the output regardless of whether the assertions fail or succeed and if the assertion fails the output will also contain the content of the assertion element. However, there is also additional information displayed together with the assertion text to help the user locate the source of the failed assertion. For example, if the co-occurrence constraint above was violated by having Title='Mr' and Sex='Female' then the following error message would be generated by Schematron:

From pattern "Check structure":

From pattern "Check co-occurence constraints":
   Assertion fails: "If the Title is "Mr" then the sex of the person must be "Male"." at
      /Person[1]
         <Person Title="Mr">...</>

So, the pattern names are always displayed while the assertion text is only displayed when the assertion fails (or succeeds for report assertions). The additional information starts with an XPath that shows the location of the context element in the instance document (in this case the first Person element) and then on a new line the start tag of the context element is displayed.

The assertion to test the co-occurrence constraint is not trivial and in fact this rule could be written in a simpler way by using an XPath predicate when selecting the context. Instead of having the context set to all Person elements the co-occurrence constraint can be simplified by only specifying the context to be all the Person elements that also have the attribute Title='Mr'. If the rule was specified using this technique the co-occurrence constraint could be described like this:

<rule context="Person[@Title='Mr']">
   <assert test="Sex = 'Male'">If the Title is "Mr" then the sex of the person must be "Male".</assert>
</rule>

So, by moving some of the logic from the assertion to the actual context the complexity of the rule has been decreased. This is a technique that often is very useful when writing Schematron schemas.

This concludes this introduction about patterns and now all that is left to do is to wrap the patterns in the Schematron schema in a schema element and specify that all the Schematron elements used should be defined in the Schematron namespace, http://www.ascc.net/xml/schematron. This means that the complete Schematron schema for the example would be:

<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
   <sch:pattern name="Check structure">
      <sch:rule context="Person">
         <sch:assert test="@Title">The element Person must have a Title attribute</sch:assert>
         <sch:assert test="count(*) = 2 and count(Name) = 1 and count(Sex) = 1">The element Person should have the child elements Name and Sex.</sch:assert>
         <sch:assert test="*[1] = Name">The element Name must appear before element Sex.</sch:assert>
      </sch:rule>
   </sch:pattern>
   <sch:pattern name="Check co-occurrence constraints">
      <sch:rule context="Person">
         <sch:assert test="(@Title = 'Mr' and Sex = 'Male') or @Title != 'Mr'">If the Title is "Mr" then the sex of the person must be "Male".</sch:assert>
      </sch:rule>
   </sch:pattern>
</sch:schema>

The Schematron schema language can also validate XML instance documents that use namespaces. Each namespace used in the XML instance document should be declared in the Schematron schema. The element used to declare namespaces are the ns element which should appear as a child of the schema element. The ns element has two attributes, uri and prefix, which are used to define the namespace uri and namespace prefix. So, if the XML instance document in the example had been defined in the namespace www.topologi.com/example then the Schematron schema would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://www.ascc.net/xml/schematron">
   <sch:ns uri="www.topologi.com/example" prefix="ex"/>
   <sch:pattern name="Check structure">
      <sch:rule context="ex:Person">
         <sch:assert test="@Title">The element Person must have a Title attribute</sch:assert>
         <sch:assert test="count(ex:*) = 2 and count(ex:Name) = 1 and count(ex:Sex) = 1">The element Person should have the child elements Name and Sex.</sch:assert>
         <sch:assert test="ex:*[1] = ex:Name">The element Name must appear before element Sex.</sch:assert>
      </sch:rule>
   </sch:pattern>
   <sch:pattern name="Check co-occurrence constraints">
      <sch:rule context="ex:Person">
         <sch:assert test="(@Title = 'Mr' and ex:Sex = 'Male') or @Title != 'Mr'">If the Title is "Mr" then the sex of the person must be "Male".</sch:assert>
      </sch:rule>
   </sch:pattern>
</sch:schema>

Note that all XPath expressions that test element values now include the namespace prefix ex.

Schematron processing

One of the major advantages with Schematron is that you do not need a specially written Schematron processor in order to validate the XML instance documents. Since Schematron is built using XPath and XSLT functions all you need is an XSLT processor. The Schematron processing then works in two steps (see Figure 1):

  1. The Schematron schema is first turned into a validating XSLT stylesheet by transforming it with an XSLT stylesheet provided by Academica Sinica Computing Centre. These stylesheets (schematron-basic.xsl, schematron-message.xsl and schematron-report.xsl) can be found at the Schematron website and the different stylesheets generate different output. For example, the schematron-basic.xsl is used to generate simple text output like in the example above.
  2. This validating stylesheet is then used on the XML instance document and the result will be a report that depends on the rules and assertions in the original Schematron schema.

This means that it is very easy to set up a simple Schematron processor because the only thing needed is an XSLT processor together with one of the Schematron stylesheets. Here is an example of how to validate the example used above where the XML instance document is called Person.xml and the Schematron schema is called Person.sch. The example use Saxon as an XSLT processor:

C:\>saxon -o validate_person.xsl Person.sch schematron-basic.xsl

C:\>saxon Person.xml validate_person.xsl

From pattern "Check structure":

From pattern "Check co-occurrence constraints":
   Assertion fails: "If the Title is "Mr" then the sex of the person must be "Male"." at
      /Person[1]
         <Person Title="Mr">...</>

Embedded Schematron Rules

One of the really good things about W3C XML Schema is that it is very easy to extend and one way to do so is to use the annotation functions. The annotation element can have two child elements, namely documentation and appinfo. The documentation element is mainly intended to provide humans with information about the schema while the appinfo element is intended for applications. The appinfo element is defined so that it can have any well-formed XML content from any namespace. Since a Schematron schema is using XML syntax this is the perfect place to embed rules from Schematron.

Almost all elements defined by the W3C XML Schema specification can have the annotation child element and the most logic place to put the Schematron rules are on the element declaration where the Schematron rule applies. This means that the W3C XML Schema element declaration and the Schematron rule that apply to the element are declared at the same place. However, since the Schematron rule add more code to the already verbose W3C XML Schema, you can just as easy include all the Schematron rules in, for example, the annotation element for the schema element itself. This may improve readability of the schema by concentrating the Schematron rules at the beginning of the W3C XML Schema.

Here is a very simple W3C XML Schema that only define one element:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="Root" type="xs:string">
   </xs:element>
</xs:schema>

Now, if a Schematron rule should have the Root element as its context this rule could be added as an embedded Schematron rule within the annotation element of the declaration like this:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="Root" type="xs:string">
      <xs:annotation>
         <xs:appinfo>
            <sch:pattern name="Test constraints on the Root element" xmlns:sch="http://www.ascc.net/xml/schematron">
               <sch:rule context="Root">
                  <sch:assert test="test-condition">Error message when the assertion condition is broken...</sch:assert>
               </sch:rule>
            </sch:pattern>
         </xs:appinfo>
      </xs:annotation>
   </xs:element>
</xs:schema>

As can be seen from the example all embedded Schematron rules must be added on the pattern level and all Schematron elements must be declared in the Schematron namespace, http://www.ascc.net/xml/schematron. The rules are embedded on a pattern level because this way the pattern name will be included in the output which helps identify which rule was broken if there is a validation problem in the XML instance document.

Now that we know how to write Schematron schemas and we have seen an example of an embedded Schematron rule in a W3C XML Schema we can have a look at how to solve the different problems stated in the introduction.

Dependant attributes

To illustrate this an example where we have a socket element that has the two attributes hostName and hostAddress. The requirement is that these two attributes are mutually exclusive so that if one is present the other cannot be present and vice versa. It is also required that at least of the attributes must appear.

W3C XML Schema will be used to declare the socket element and also that the socket element can have two attributes, hostName and hostAddress. The closest we can get to the above constraint in W3C XML Schema is to declare both attributes as optional since neither hostName nor hostAddress is required. This schema could look like the following:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="socket">
      <xs:complexType>
         <xs:attribute name="hostName" type="xs:string" use="optional"/>
         <xs:attribute name="hostAddress" type="xs:string" use="optional"/>
      </xs:complexType>
   </xs:element>
</xs:schema>

A Schematron rule can now be embedded on the socket element to add the constraints that could not be expressed. In this case the constraint is divided into two assertions, which allows for a separate error messages for each assertion:

  1. Both hostName and hostAddress cannot be present at the same time
  2. At least one of hostName and hostAddress must be present

The W3C XML Schema with embedded Schematron rules for this example would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="socket">
      <xs:annotation>
         <xs:appinfo>
            <sch:pattern name="Mutually exclusive attributes on the socket element" xmlns:sch="http://www.ascc.net/xml/schematron">
               <sch:rule context="socket">
                  <sch:report test="@hostName and @hostAddress">On a socket element only one of the attributes hostName and hostAddress are allowed, not both.</sch:report>
                  <sch:assert test="@hostName | @hostAddress">One of the attributes hostName or hostAddress must be present on the socket element</sch:assert>
               </sch:rule>
            </sch:pattern>
         </xs:appinfo>
      </xs:annotation>
      <xs:complexType>
         <xs:attribute name="hostName" type="xs:string" use="optional"/>
         <xs:attribute name="hostAddress" type="xs:string" use="optional"/>
      </xs:complexType>
   </xs:element>
</xs:schema>

This schema would validate that the following two instance documents are valid

<?xml version="1.0" encoding="UTF-8"?>
<socket hostAddress="192.168.200.76"/>

<?xml version="1.0" encoding="UTF-8"?>
<socket hostName="pc100"/>

while the following two instance documents are invalid:

<?xml version="1.0" encoding="UTF-8"?>
<socket hostAddress="192.168.200.76" hostName="pc100"/>

<?xml version="1.0" encoding="UTF-8"?>
<socket/>

Interleaving of elements

The constraints put on the all group that each element declared in its content must have its maxOccurs attribute fixed to 1 simplifies the processing but limits the usefulness. For example, the following content model is not allowed:

<xs:element name="Root">
   <xs:complexType>
      <xs:all>
         <xs:element name="child1" type="xs:string" minOccurs="5" maxOccurs="5"/>
         <xs:element name="child2" type="xs:string" minOccurs="2" maxOccurs="2"/>
         <xs:element name="child3" type="xs:string" minOccurs="0"/>
         <xs:element name="child4" type="xs:string" minOccurs="3" maxOccurs="7"/>
      </xs:all>
   </xs:complexType>
</xs:element>

By changing the all group to choice and make the choice group itself optional and repeatable a content model where the different child elements can appear in any order are created. If, for example all the child1 elements should be grouped together in the instance document, the minOccurs constraint can be kept as it is. If the child elements do not have to be grouped together the minOccurs constraint can be set to 1 to allow for a full mixture of the elements:

<xs:element name="Root">
   <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
         <xs:element name="child1" type="xs:string" minOccurs="1" maxOccurs="5"/>
         <xs:element name="child2" type="xs:string" minOccurs="1" maxOccurs="2"/>
         <xs:element name="child3" type="xs:string" minOccurs="0"/>
         <xs:element name="child4" type="xs:string" minOccurs="1" maxOccurs="7"/>
      </xs:choice>
   </xs:complexType>
</xs:element>

Unfortunately this also removes the occurrence constraints on the children. Sometimes this is not a very important requirement and in this case the above will probably be enough. If, however, the occurrence constraints are important it is trivial to add a Schematron rule to check this. The following schema will illustrate:

<xs:element name="Root">
   <xs:annotation>
      <xs:appinfo>
         <sch:pattern name="Extended_all" xmlns:sch="http://www.ascc.net/xml/schematron">
            <sch:rule context="Root">
               <sch:assert test="count(child1) = 5">You must have exactly 5 child1 elements.</sch:assert>
               <sch:assert test="count(child2) = 2">You must have exactly 2 child2 elements.</sch:assert>
               <sch:assert test="count(child3) &lt;= 1">You can only have one child3 element.</sch:assert>
               <sch:assert test="count(child4) &gt;= 3 and count(child4) &lt;= 7">You must have at least 3 child4 elements but you canít have more than 7.</sch:assert>
            </sch:rule>
         </sch:pattern>
      </xs:appinfo>
   </xs:annotation>
   <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
         <xs:element name="child1" type="xs:string" minOccurs="1" maxOccurs="5"/>
         <xs:element name="child2" type="xs:string" minOccurs="1" maxOccurs="2"/>
         <xs:element name="child3" type="xs:string" minOccurs="0"/>
         <xs:element name="child4" type="xs:string" minOccurs="1" maxOccurs="7"/>
      </xs:choice>
   </xs:complexType>
</xs:element>

This schema would validate a true mixture of all the child elements. If the child elements should be grouped together the only change would be to preserve the minOccurs constraint on each child element (5 for child1, 2 for child2 and 3 for child4). However, since child4's occurrence is a range a new Schematron rule is needed to assert that all child4 elements are grouped together. The new schema would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="Root">
      <xs:annotation>
         <xs:appinfo>
            <sch:pattern name="Extended_all" xmlns:sch="http://www.ascc.net/xml/schematron">
               <sch:rule context="Root">
                  <sch:assert test="count(child1) = 5">You must have exactly 5 child1 elements.</sch:assert>
                  <sch:assert test="count(child2) = 2">You must have exactly 2 child2 elements.</sch:assert>
                  <sch:assert test="count(child3) &lt;= 1">You can only have one child3 element.</sch:assert>
                  <sch:assert test="count(child4) &gt;= 3 and count(child4) &lt;= 7">You must have at least 3 child3 elements but you canít have more than 7.</sch:assert>
               </sch:rule>
               <sch:rule context="Root/*">
                  <sch:assert test="not(preceding-sibling::*[1][name() != name(current())][preceding-sibling::*[name() = name(current())]])">All <sch:name/> elements must be grouped with the other <sch:name/> elements.</sch:assert>
               </sch:rule>
            </sch:pattern>
         </xs:appinfo>
      </xs:annotation>
      <xs:complexType>
         <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element name="child1" type="xs:string" minOccurs="5" maxOccurs="5"/>
            <xs:element name="child2" type="xs:string" minOccurs="2" maxOccurs="2"/>
            <xs:element name="child3" type="xs:string" minOccurs="0"/>
            <xs:element name="child4" type="xs:string" minOccurs="3" maxOccurs="7"/>
         </xs:choice>
      </xs:complexType>
   </xs:element>
</xs:schema>

The new rule has each of the child elements of Root as its context so this rule will apply to all the children of Root. The assertion in the rule uses the preceding sibling axis to assert that all the child elements must be grouped together. In this case it would have been enough to apply this rule to child4 (since it's the only element that have an occurrence range) but it is just as easy to apply the same rule for all the children.

Co-occurrence constraints

The number of examples for co-occurrence constraints is more or less unlimited and one example was used in the Introduction to Schematron section above. In that example the co-occurrence constraint was that if the Title attribute on element Person had the value 'Mr' then the value of the Sex sub-element must be 'Male'. Instead of defining everything using a Schematron schema, this example will show how to do the structure in W3C XML Schema and the co-occurrence constraint with a Schematron rule. The W3C XML Schema for this simple example is straightforward:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="Person">
      <xs:complexType>
         <xs:sequence>
            <xs:element name="Name" type="xs:string"/>
            <xs:element name="Sex">
               <xs:simpleType>
                  <xs:restriction base="xs:string">
                     <xs:enumeration value="Male"/>
                     <xs:enumeration value="Female"/>
                  </xs:restriction>
               </xs:simpleType>
            </xs:element>
         </xs:sequence>
         <xs:attribute name="Title" type="xs:string" use="required"/>
      </xs:complexType>
   </xs:element>
</xs:schema>

This schema defines the structure of the XML instance document and the only thing the Schematron rule needs to define is the co-occurrence constraint. The complete schema with an embedded Schematron rule would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="Person">
      <xs:annotation>
         <xs:appinfo>
            <sch:pattern name="Co-occurrence constraint on attribute Title" xmlns:sch="http://www.ascc.net/xml/schematron">
               <sch:rule context="Person[@Title='Mr']">
                  <sch:assert test="Sex = 'Male'">If the Title is "Mr" then the sex of the person must be "Male".</sch:assert>
               </sch:rule>
            </sch:pattern>
         </xs:appinfo>
      </xs:annotation>
      <xs:complexType>
         <xs:sequence>
            <xs:element name="Name" type="xs:string"/>
            <xs:element name="Sex">
               <xs:simpleType>
                  <xs:restriction base="xs:string">
                     <xs:enumeration value="Male"/>
                     <xs:enumeration value="Female"/>
                  </xs:restriction>
               </xs:simpleType>
            </xs:element>
         </xs:sequence>
         <xs:attribute name="Title" type="xs:string" use="required"/>
      </xs:complexType>
   </xs:element>
</xs:schema>

Dependancy between XML documents

By using the document() function in XSLT it is also possible to apply constraints between XML instance documents and not just within a single document. To illustrate this we use two simple XML instance documents where one document contain a single Person element with a name sub-element and one document that contain a single Car element with an Owner attribute. The W3C XML Schemas for these documents would be:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="Person">
      <xs:complexType>
         <xs:sequence>
            <xs:element name="Name" type="xs:string"/>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
</xs:schema>

and

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="Car">
      <xs:complexType>
         <xs:attribute name="Owner" type="xs:string" use="required"/>
      </xs:complexType>
   </xs:element>
</xs:schema>

The instance documents would be:

<?xml version="1.0" encoding="UTF-8"?>
<Person>
   <Name>Eddie</Name>
</Person>

and

<?xml version="1.0" encoding="UTF-8"?>
<Car Owner="Eddie"/>

Now we want to make sure that the value of the Owner attribute in Car.xml must match the value of Person/Name in Person.xml. This can be done by inserting a Schematron rule in the W3C XML Schema that defines the Car document:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   <xs:element name="Car">
      <xs:annotation>
         <xs:appinfo>
            <sch:pattern name="Car owner must link to a person" xmlns:sch="http://www.ascc.net/xml/schematron">
               <sch:rule context="Car">
                  <sch:assert test="document('Person.xml')/Person/Name = @Owner">The owner of the car must match the name of the person in Person.xml.</sch:assert>
               </sch:rule>
            </sch:pattern>
         </xs:appinfo>
      </xs:annotation>
      <xs:complexType>
         <xs:attribute name="Owner" type="xs:string" use="required"/>
      </xs:complexType>
   </xs:element>
</xs:schema>

The document() function will bring in the elements from the Person.xml file and the assertion will make sure that the value of the Owner attribute match the value of the Person/Name element.

Processing

A W3C XML Schema processor will not recognize and perform the validation constraints expressed by the embedded Schematron rules. In fact, since the Schematron rules are embedded within the appinfo element they will be completely ignored by the processor. This means that in order to use the Schematron rules for validation they need to be extracted from the W3C XML Schema document and concatenated into a Schematron schema. Since both the W3C XML Schema and the Schematron rules are using XML syntax a very good tool for this is XSLT.

Writing an XSLT stylesheet for extracting the Schematron rules and merging them into a complete Schematron schema is not very difficult. Based on the ideas in my initial stylesheet, Francis Norton developed the current XSD2Schtrn.xsl stylesheet that extracts Schematron rules not only within a single W3C XML Schema but also from imported and included schemas.

The result from the script is a complete Schematron schema that can be validated using the two-step XSLT process described in the Introduction to Schematron section above. This means that validation results are available from both Schematron validation and W3C XML Schema validation and if needed the results can be merged into one report. The whole process is described in the following picture:

Most W3C XML Schema processor APIís also come with an XSLT processor (for example, MSXML4 and Xerces) so there is usually no need to include extra packages in the application. As can be seen in the picture, there are two distinctive paths in the processing which means that if timing is important the two paths could be implemented as separate processes and executed in parallel.

A batch file that would (using XSV and Saxon) validate against W3C XML Schema and embedded Schematron rules can look like this:

echo Running XSV validation on Person.xml...

   xsv Person.xml

echo Creating Schematron schema from appinfo in Person.xsd...

   saxon -o Person.sch Person.xsd XSD2Schtron.xsl

echo Running Basic Schematron validation on file Person.xml...

   saxon -o validate.xsl Person.sch schematron-basic-eddie.xsl
   saxon Person.xml validate.xsl

So, first is the XML instance document validated against the W3C XML Schema with XSV and then it is validated with the embedded Schematron rules using Saxon. An output example could look like this:

Running XSV validation on Person.xml...

<?xml version='1.0'?>
<xsv docElt='{None}Person' instanceAssessed='true' instanceErrors='0' rootType='[Anonymous]' schemaErrors='0' schemaLocs='None -> Person.xsd' target='file:/E:/Work/XMLSchema/XML-DEV/Schtrn+W3C/Person.xml' validation='strict' version='XSV 1.203.2.16/1.106.2.8 of 2001/
10/28 17:39:15' xmlns='http://www.w3.org/2000/05/xsv'>
<schemaDocAttempt URI='file://C:/Person.xsd' outcome='success' source='schemaLoc'/>
</xsv>

Done.

Creating Schematron schema from appinfo in Person.xsd...
Running Basic Schematron validation on file Person.xml...

   From pattern "Check structure":

   From pattern "Check co-occurrence constraints":
Assertion fails: "If the Title is "Mr" then the sex of the person must be "Male"." at
   /Person[1]
   <Person Title="Mr">...</>

The Topologi Schematron Validator is a graphical validator that can validate W3C XML Schemas with embedded Schematron rules. It uses MSXML4 as its W3C XML Schema API and uses the XSLT processor in MSXML4 to extract and validate the embedded rules.

Summary

W3C XML Schema and Schematron complement each other very well and there seems little that cannot be declared and validated by combining the two. This article shows that it is not hard to extract the embedded Schematron rules and validate the resulting Schematron schema using a three-step XSLT process. The examples shown can be downloaded in a zip-file that also contains Saxon and XSV so you can try them out yourself (for Windows).

As with most things there are both advantages and disadvantages when using this technique. Some of the advantages are:

Some of the disadvantages are:

It should also be noted that, at this stage, the Schematron rules can only applied on specific elements in the XML instance document. It is not possible to apply a Schematron rule to a complex type definition in W3C XML Schema.

Embedded Schematron rules would be even more powerful if they could be applied on all elements with a specific type instead of just all elements with a certain name and context. Depending on how much of the PSVI2 that will be available in the next version of XPath this is something that can be possible in the future.

If you do not mind adding two more XSLT processes to the processing chain this is in fact possible to do with the help of the typeTagger. The basic idea is that it annotates the XML instance document with extra attributes containing, among other things, the element type information from the W3C XML Schema.

The ability to combine embedded Schematron rules is not unique to W3C XML Schema and in fact it should be possible in all schema languages the uses XML syntax and have an extensibility mechanism. RELAX-NG is an example of another schema language with this functionality and the latest version of Sunís MSV have the ability to validate Schematron-like rules embedded in a RELAX-NG schema.

Acknowledgements

I would like to thank Rick Jelliffe for taking the time to review this paper.


1. The test attribute allows XPath expressions to be combined in or groups (using the | operator and parentheses for grouping. Back to text

2. PSVI is short hand for Post Schema Validation Infoset which is a modified version of the original document's infoset, providing additional information such as default values, datatypes, etc. Back to text


Copyright © 2002, Eddie Robertsson
This is a draft paper that can be used privately but do not repost publicly.