|
ISO Document Schema Definition Language
(DSDL)
Rick Jelliffe, Topologi,
2002-04-06
ISO DSDL, ISO/IEC 19757,
is a proposed ISO standard to bring choice and power to users
of XML and SGML in validation and schema-based post-processing.
The steady innovation of the last few years in XML has resulted
in a group of technologies from several different sources
which each provide powerful and useful contributions. But
without a common framework, the pieces of the puzzle cannot
be combined into a satisfactory whole.
CONTRIBUTING TECHNOLOGIES
ISO DSDL is made from several parts:
- A framework for supporting different schema modules,
perhaps influenced by RELAX Namespaces (details only in
Japanese, as yet), XPipe,
XML Pipeline
Definition Language, and my Connect.
I suppose that Namespaces in XML would be referenced as part of the framework.
Note that already some major software is being designed
along lines which will help DSDL implementation: notably
the Apache Xerces Xerces Native Interface which
is a framework for communicating
a "streaming" document information set and constructing
generic parser configurations. It is quite likely that there
will be some discussions at W3C on this question: for their
family of technologies the framework question arises because
of issues such as "should validation occur before or after
XInclude inclusion?" which, unanswered, leave potential
users in the air.
- Namespace-aware processing with DTD syntax, for
a discussion of which see these XML-DEV thread s and CL-XML's approach;
- RELAX NG (called
Grammar-oriented schema languages);
- Schematron
(called Path-based integrity constraints);
- W3C XML Schemas
Datatypes (called Primitive data type semantics);
- W3C XML Schemas
Structures (called Object-oriented schema languages)
- Information item manipulation, which would presumably
provide as modules many of the features of SGML and SGML
Extended Facilities unbundled; the details have not emerged
yet, but candidate features to reconstruct could
include attribute defaulting and SGML's LINK (perhaps using
NIST's ATTS), architectural
forms (perhaps based on Dave Megginson's XAF, NIST's
APEX
and John Cowan's AFNG proposal), entity inclusions (perhaps
based on W3C XBase, XLink and XInclude), a SHORT REFERENCE-like facility (perhaps something
like Regular Fragmentations) and entity and other declarations
(perhaps something like Topologi's Named Information Item Declaration Language).
Presumably, the technologies that are already completed (RELAX
NG, Schematron, XML Schemas) could be ISO Standardized very
quickly. The other parts would involve more community interaction
and discussion, and so take longer.
There are some potentially important use-cases which may
have to wait. XML Schemas Key and Uniqueness constraints are
limited; better constraints can be expressed in my Schematron
but not with the same declarative usefulness; progress on
this would probably be deferred until a W3C or OASIS technology
comes to the fore. Similarly, Schematron's phases mechanism
is very useful and powerful in reconstructing DTD's conditional
marked sections; but it is probably ahead of its time in public
perception, so I would not expect it to make standards-makers'
80/20 point (it will still be available in Schematron however.)
Merely factoring out functionality from DTDs is not "layering".
Layering happens when all the functions have real specifications
and the order of their application can be specified or defaulted.
Layering certainly does not happen by either factoring out
existing functionality to be taken care of by potential specifications
(= vapor-layers) or by treating processing order as irrelevant.
WHAT IS WRONG WITH XML SCHEMAS?
Well, sometimes...nothing! XML Schemas started with the aim
of being a universal schema language. However, in the absense
of a modular architecture, there are simply to many domains-of-use
of XML for any single language to be universal.
While the developers of various schema languages obviously
do so because of some perceived deficiency in the available
schema languages (Schematron and the precursors of RELAX NG
— RELAX and TREX — were notably created in response
to XML Schemas drafts, for example), ISO DSDL is about supporting
plurality and allowing competition, not to promote one technology
above another.
Different stakeholders, in particular the W3C, will naturally
be promoting the particular technologies they have created;
however, these technologies are designed with particular use-cases
in mind (notably the ubiquity of the WWW, an emphasis on XML
for messages and data, and XHTML as their supported language
for prose) which are not universal. So it is appropriate for
ISO to provide a framework to support a plurality of schema
modules, allowing rigorous description of the particular processes
a document goes through to be validated, and allowing smaller
and more reliable schema systems.
An interesting sidenote is that because modern schema languages
are specified in XML instance syntax, it is often a matter
of simple transformations for one schema language to implement
or simulate another. Thus Sun's Multi-Schema Validator (MSV) uses a RELAX-like abstract grammar
internally, and converts DTDs and XML Schemas into that internal
form. Already there is a trend for tools to support multiple
schemas, and this may continue. (Anyway, if you use each language
conservatively, you won't lock yourself in to a particular
schema technology; you could change schema languages while
keeping the same document structures.)
TOPOLOGI & DSDL
Of the technologies mentioned above, the Topologi Collaborative Markup Editor (in beta at time of writing)
supports
- XAR
- XML DTDs
- Schematron
- RELAX NG
- XML Schemas
- Named Information Item Declaration Language
Topologi's free Schematron Validator is a Windows application that supports
- XML DTDs
- Schematron
- XML Schemas
- Schematron embedded in XML Schemas
Topologi's approach is that users benefit from products which
do not lock them in to particular technologies: SGML versus
XML, Mac versus Linux versus Windows, database versus documents,
file systems versus repositories, DTDs versus RELAX versus
XML Schema versus Schematron.
If one piece of the Jigsaw puzzle does not meet expectations,
you should not have to throw away the whole puzzle! And, similarly,
if one interface between components is not optimal, it is
best if the existing components can switch (e.g. from XML
Schemas to RELAX) without requiring new components, training,
and so on.
When looking at XML products, it is useful to consider "Am
I really getting the risk-minimization of loose coupling,
or does it force me to completely buy-in to schema languages
and technologies which have not proven themselves yet?" By
suporting plurality, our customers are not forced to adopt
technology they do not need, and have the agility to change
when needed.
RELATED MATERIAL
For an excellent discussion on why XML processing should
be more modular, see Simon St. Laurent's Toward a Layered Model for XML and XPDL.
For a concrete proposal on combining this approach with UML
to make more readable and tight standards, see my DISARM
(Document Information Set Articulated Reference Model). For
an list of the 16 thin layers of XML see my Goldilocks and SML. For a off-the-cuff discussion on whether
software layers should correspond to implementation layers,
see this XML-DEV thread.
For a good introduction to document architectures, see Josh
Lubell's Architectures in an XML World.
For one approach to supporting multiple schemas in a distributable
package, see my proposal for an XAR (XML Application Archive)
format DZIP.
For an example of an approach which, if it gets a user base
and multiple implementations, I hope would also be considered
for inclusion in DSDL, see Examplotron.
Another schema language paradigm, based on ordering, is that
of my Hook.
There is another major strand of schema languages available,
which already has ISO standardization: ASN.1. Users who have requirements for ultra-compact data transmission
should consider ASN.1, at least for that part of their data
lifecycle.
|