Schematron 1.5 to Schematron 1.6 to ISO Schematron

Guide for Implementors

Rick Jelliffe, 21-06-03

This note summarizes the changes from Schematron 1.5 to Schematron 1.6, and the expected changes from Schematron 1.6 to draft ISO Schematron.

1.5 to 1.6

The changes are all the changes mooted in the Schematron 1.5 spec, appendix G.

1. Assertions allow value-of

Schematron 1.5 did not allow value-of in assertions. This was to enforce a distinction between diagnostics and assertions (which are intended to make positive statements of expectation.) Many users requested this change.

2. More flexibility with key

Schematron 1.5 only allowed the key element as part of rules. Schematron 1.6 will also allow key under the schema at the same position as phase elements. ISO Schematron will only allow the new form.

Also, the name of the attributes are changed to follow XSLT names.

So the old @path attribute is deprecated in favour of @use. And the old @context attribute that was floating around is deprecated in favour of @match.

In ISO Schematron, the @path and @context will not be available.

<!ATTLIST key
     context CDATA #IMPLIED
    match CDATA #IMPLIED
	name NMTOKEN #REQUIRED
	path %PATH; #IMPLIED
	use %PATH; #IMPLIED	
	icon %URI; #IMPLIED
>

4. Variable statement let

Schematron 1.5 is cumbersome when expressing "datatype" kinds of constraints. It is powerful enough to parse a string into components, but frequently a string must be reparsed several times causing very verbose and error-prone expressions.

Schematron 1.6 will include a let statement that allows binding of variables within the scope of a rule. The variable value will be available using a $ delimiter, and can be implemented using XSLT variables. Presumably it would only be available when using XSLT or EXLST as the query language.

This feature is adopted from XCSL (XML Constraint Specification Language), with the kind blessing of XCSL's developer José Carlos Leite Ramalho

<!ELEMENT rule (assert | report | let | key | extends)+>

<!ELEMENT let EMPTY>
<!ATTLIST let
    name CDATA #REQUIRED
    value CDATA #REQUIRED
  >
  

  

Let statements are allowed anywhere in a rule.

The result of the let statement is query-language dependent. In the default case of XSLT, they are text values, and they can be invoked in assertion tests using the $ prefix.

Implementors not using XSLT directly, can implement this by evaluating the value expression for each context, then substituting the result value (presumably delimited as a string) into the assertion tests of the rule. This is no more complicated than value-of elements, by the way, except for parsing the XPath for the $ and avoiding substitutions in literals in Xpaths.

5. Abstract Patterns

Schematron was invented to able to declare and detect abstract patterns. This allows a document type to be declared in terms of rhetorical structures rather than physical structures. For example, to say this is a table and the row element name is tr or every paragraph has a heading to which it relates.

This would again be implemented using XLST variables. So we could say (the syntax is not fixed yet)

 <pattern isa="table">
	<param formal="row" actual="tr"/>
	<param formal="cell" actual="td"/>
 </pattern> 

 <pattern abstract="true" name="table">
	<rule context="$row">
		<assert test="$cell"
		>A <name/> should have at least one cell</assert>
	</rule>
</pattern>

So let statements allow clearer expressions in the test values, while abstract patterns allow clearer expression with fewer elements.

Also, with abstract patterns, it then becomes possible to do document to document mappings, because we can identify structures of related information items independently of their serialization and naming conventions. The role attribute can be used for this.

Abstract patterns can be done by a preprocessor stage, like macros substitutions.

From 1.6 to ISO Schematron

Draft ISO Schematron will be almost the same as 1.6. However, it will have an inclusion facility to match other DSDL languages, and my intent is to couch it in terms of a framework.

It will be recast as five sections (some similar to WXS and RELAX NG):

1. Inclusions

ISO Schematron adds an inclusion facility, the same as RELAX NG's but in the schematron namespace. No semantic understanding of the fragments is involved: you just replace the include elements with the element being referenced.

Document validation must occur after the inclusion has been performed. This is a challenge, because it probably means DSDL cannot validate ISO Schematron until it gets an include processor.

Because we have inclusions, I am tentatively getting rid of abstract rules (hence the extends element). I tend to think that the combination of let, include and abstract patterns makes it unnecessary. Also, I am not happy about it analytically: a pattern is a real structre, a rule is only a declarative convenience. So abstract patterns are real, but abstract rules falls into the trap of providing convenience for declarations rather than convenience for meaningful modeling. hmmm.

2. Key

I expect rule/key will be removed. Anyone using it, please let me know. I made it part of rules because I wanted to keep pattern modularity. However, since we have keys at the top-level, that is not a consideration now.

3. Schematron as a Framework

The ISO Schematron standard will position Schematron as a framework (the elements) which potentially allows different query languages. This will probably be done by adding to the schema element an attribute

    use  NMTOKEN "XSLT" 

which allows the query/expression language to be stated. Anticipated values are

XSLT
XSLT 1.n, as currently used, this is the default
EXSLT
XSLT 1.n with the EXSLT extensions
XPATH
This is for implementations just using a simple XPath library. The element key would not be available.
XPATH2
The schema uses the mooted XPath2 spec.
XSLT2
The schema uses the mooted XSLT spec.
XQUERY
The schema uses the mooted XQuery spec.

This helps resolve or clarify a couple of issues: first, that some implementers have just used an XPath library; second, that we have to cope with different versions of XPath notably XPath2; third that there has been implementation experience using non-XPath query languages (Schemarama from Becket and Miller); and fourth to clarify that the Schematron idea is not just using XPaths but the particular configuration of assertions into rules into patterns into phases.

I expect there will be other schema languages which just add an assertion element to an element or attribute declaration (e.g. Eric van der Vlist's Examplotron), but this (though useful) is not Schematron: the key idea of Schematron is the pattern—an abstract structure which is expressed in terms of an element (the context) but may not actually have anything to do with that element.

This recasts Schematron as a general rule framework. Also, by removing the query language and just talking about the elements, it should be easy to specify Schematron formally, which is desirable.

So the binding of a particular query language to the Schematron framework must provide the following:

Implementors do not need to implement support for multiple query languages. The reason is to allow graceful upgrade to XSLT 2 without having to go through the ISO process again. Also, specifing XSLT 2 might raise the spectre of PSVIs in people's mind and scare them off.

4. Schematron Results

By request, I will probably be including a schema for the information results of running Schematron. It will probably look like the conformance language maybe with some ideas from ZVON or Jing.

Implementors do not need to implement this. However, it will allow a test suite fairly readily, which is handy.

5. Result Evaluation Function

The big reason for the results being explicitly specified is that then the different uses of Schematron can be specified as evaluation functions on the results.

In particular, simple validation is defined as a function that returns true if there are no failed assertions or successful reports.

But Schematron can also be used for screen-scraping (find me any examples of pattern X): this then becomes another result evaluation function. These other uses may not fit in as "validation" of course, so I won't explore them, but I don't want ISO Schematron to exclude these uses.

6. Top-level parameters

This allows access to parameters provided, for example, on the VSCL command line. These parameters are name/value pairs, and follow XSLT. The value is typically a string.

The mechanism would be simply text substitution in queries of tokens starting with $ that match a parameter name.

7. Schematron defined over all information items

Eric van der Vlist brought up a good point last year: does Schematron work on all information items as context or just the elements? I had been assuming that it only used elements as context: this was to keep non-XSLT implementations really straightforward (just iterate over the elements in one pass for the contexts).

So the XLST implementations went further than I expected would be necessary. But Schematron 1.5 and 1.6 are really defined only to use elements as context nodes.

However, working with abstract patterns, it has become clear that this is too restrictive for good modelling. So ISO Schematron will, unless there are well-reasoned screams to the contrary, return to the pre-1.5 days (i.e. return to reflecting what the XSLT implementations do) of being silent on which information items can be used for contexts: it becomes merely a property of the query language being used for rule/@context attributes. Consequently, the default XSLT patterns can potentially match any kind of information item in the XPath data model. I worry that this will blow out some kinds of implementations, or require smarter parsing of the rule/@context queries to know which kind of information items to include in the tree traversal.

8. Editorial

The new spec must emphasize more that patterns are switches, with rules as exclusive cases. Probably an explanation why would be good too (to simplify the expressions, to allow seives.) Also, the way the information on keys will be clearer.

Of course, the idea of the pattern needs clarifying. There has been a lot of approaching Schematron from the POV of "this is simpler than doing it directly with XSLT" which is usually true, but that is not the point of (abstract) patterns: they are an analytical mechanism that I believe corresponds to a real (though abstract) structures in documents, just as much as "datatypes" or "types" or "grammars" are real.

9. Namespace

In ISO Schematron, schemas must use the Schematron namespace. I am not sure whether we will need a new namespace. The reason is to keep things crystal-clear w.r.t. VCSL.

For 1.5, it was sort-of voluntary for backwards compatability with earlier versions. The move to an ISO standard means that Schematron has to be much more solid on these kinds of issues.

There is some debate on whether the <ns> element represents best practise. I think some others involved in DSDL feel it is bad practise: all namespaces should be declared using the xmlns mechanism. Futhermore, the W3C TAG almost put out a finding saying that xmlns MUST be used; I protested and they withdrew, but they may be right and the move to ISO standard is the place to do it, if it should be done.

My reasons for using an explicit ns element are threefold:

"But won't this confuse people?" Yes, I have had a couple of reports of confusion over this over the years. However, xpointer now follows me in providing explicit inline declarations. And people are confused anyway: I think the total confusion is lessened and by this. Futhermore the namespaces spec does not mandate that embedded items use the surrounding namespace declarations, so I believe it is not contrary to the rules. Actually, I believe it is best practise that each language makes up its own optimal rules for handling this case.

Unresolved for ISO Schematron

Things not planned for ISO Schematron

Working DTD for ISO Schematron

The RELAX NG Concise schema for this will be much more satisfactory. It can capture the interaction between pattern/@abstract="true" and the content model of pattern, for example.

This content model does not include <sch:include>.


<!-- Data types -->
<!ENTITY % URI "CDATA">
<!ENTITY % PATH "CDATA">
<!ENTITY % EXPR "CDATA">
<!ENTITY % FPI "CDATA">
<!-- Element declarations -->
<!ELEMENT schema ((title)?, (ns)*, (p)*, (phase | %key )*, (pattern)+, (p)*, (diagnostics)?)>
<!ELEMENT active (#PCDATA | %dir | %emph | %span)*>
<!ELEMENT assert (#PCDATA | %name | %emph | %dir | %span)*>
<!ELEMENT dir (#PCDATA)>
<!ELEMENT emph (#PCDATA)>
<!ELEMENT diagnostic (#PCDATA | value-of | emph | dir | span)*>
<!ELEMENT diagnostics (diagnostic)*>
<!ELEMENT key EMPTY>
<!ELEMENT let EMPTY>
<!ELEMENT name EMPTY>
<!ELEMENT ns EMPTY>
<!ELEMENT p (#PCDATA | dir | emph | span)*>
<!ELEMENT pattern (((p)*, (rule)*) | param*)>
<!ELEMENT param  EMPTY>
<!ELEMENT phase ((p)*, (active)*)>
<!ELEMENT report (#PCDATA | name | emph | dir | span)*>
<!ELEMENT rule (assert | report | let )+>
<!ELEMENT span (#PCDATA)>
<!ELEMENT title (#PCDATA | dir)*>
<!ELEMENT value-of EMPTY>
<!-- Attribute declarations -->
<!ATTLIST %schema;
	xmlns %URI; #FIXED  "http://www.ascc.net/xml/schematron"
	xmlns:sch %URI; #FIXED "http://www.ascc.net/xml/schematron"
	id ID #IMPLIED
	fpi %FPI; #IMPLIED
        ns %FPI; #IMPLIED
        schemaVersion CDATA #IMPLIED
	defaultPhase IDREF #IMPLIED
	icon %URI; #IMPLIED
	version CDATA "ISO FPI"
	xml:lang NMTOKEN #IMPLIED
	use NMTOKEN "XLST"
>
<!ATTLIST active
	pattern IDREF #REQUIRED
>
<!ATTLIST assert
	test %EXPR; #REQUIRED
	role NMTOKEN #IMPLIED
	id ID #IMPLIED
	diagnostics IDREFS #IMPLIED
	icon %URI; #IMPLIED
	subject %PATH; #IMPLIED
	xml:lang NMTOKEN #IMPLIED
>
<!ATTLIST dir
	value (ltr | rtl) #IMPLIED
>
<!-- the implied value is the inherited value -->
<!ATTLIST diagnostic
	id ID #REQUIRED
	icon %URI; #IMPLIED
	xml:lang NMTOKEN #IMPLIED
>
<!ATTLIST key
    match CDATA #REQUIRED
	name NMTOKEN #REQUIRED
	use %PATH; #REQUIRED
	icon %URI; #IMPLIED
>
<!ATTLIST let
    name CDATA #REQUIRED
    value CDATA #REQUIRED
  >
<!ATTLIST name
	path %PATH; "."
>
<!-- Schematrons should implement '.' 
               as the default value for path in sch:name -->
<!ATTLIST p
	xml:lang CDATA #IMPLIED
	id ID #IMPLIED
	class CDATA #IMPLIED
	icon %URI; #IMPLIED
>
<!ATTLIST param
	name NMTOKEN #REQUIRED
	value CDATA #REQUIRED
>
<!ATTLIST pattern
    isa NMTOKEN #IMPLIED
    abstract ( true | false ) "false"
	name CDATA #REQUIRED
	see %URI; #IMPLIED
	id ID #IMPLIED
	icon %URI; #IMPLIED
>
<!-- Schematrons should implement 'false' 
               as the default value for @abstract in sch:pattern -->
<!ATTLIST ns
	uri %URI; #REQUIRED
	prefix NMTOKEN #IMPLIED
>
<!ATTLIST phase
	id ID #REQUIRED
	fpi %FPI; #IMPLIED
	icon %URI; #IMPLIED
>
<!ATTLIST span
	class CDATA #IMPLIED
>
<!ATTLIST report;
	test %EXPR; #REQUIRED
	role NMTOKEN #IMPLIED
	id ID #IMPLIED
	diagnostics IDREFS #IMPLIED
	icon %URI; #IMPLIED
	subject %PATH; #IMPLIED
	xml:lang CDATA #IMPLIED
>
<!ATTLIST rule
	context %PATH; #IMPLIED
	role NMTOKEN #IMPLIED
	id ID #IMPLIED
> 
<!ATTLIST value-of
	select %PATH; #REQUIRED
>