Document-Type Interchange Package (DZIP2)
Rick Jelliffe, Topologi, 2003-01-15
DZIP2 is an implemented format used by Topologi to allow the packaging and interchange of metadata, schemas and scripts for XML document types. We are making its spec available for other software creators to use, and to propose it as the basis for an XAR (XML Application Archive format). (DZIP 2 improves on DZIP 1, which has been prototyped in a Topologi product, with the criterion that it should take an engineer a half day to implement a basic DZIP 2 system.)
What It Is
A simple format for bundling and distributing the various schemas, scripts and metadata needed by generic XML desktop applications.
- Brain-friendly: It is just ZIP with some standard naming conventions for particular files inside it. One convention is: If there is a file index.html that is the human-readable documentation.
- User-friendly: Users can download and install a single file with all the different configuration and metadata files they need to work on a document type.
- Integrator-friendly: Integrators can readily create a DZIP file with any value-added files just using the existing utilities on most developers' PCs. Configuration files for multiple vendors' products can co-exist, so the same DZIP file can serve a complete system.
- Infrastructure-friendly: Compatible with various manifest and catalog formats.
- Vendor-friendly: We found it took less than a half day to add DZIP support.
What It Is Not
- For Data: A way to package XML documents: DZIP is only aimed at configuration files such as schema files. Imagine the kinds of things that an XML desktop application might require when you click on File>New....
- For Complex Relationships: Manifest formats (sometimes called packaging formats) allow very sophisticated relationships to be expressed. However, DZIP contents itself with merely providing very flexible and convenient ways to distribute basic configuration or schema files for a single document type.
- Super Efficient: ZIP is not a format with best-of-breed archiving, indexed access, signing, encryption, compression, localization, etc. DZIP is appropriate when the convenience factor outweighs the need for high-end performance. However, ZIP does provide adequate compression, checksums, random access, permissions.
- For Pay-by-Use: DZIP is targeted mostly at the needs of the publishing industry. Consequently, aspects such as digital rights are not at issue. An integrator could charge for providing or maintaining a DZIP file as part of a service, but there is no special mechanism built in for copy protection, billing, user tracking, licensing, etc.
- Anti-WWW: DZIP does not assume any model of how the user gets hold of the DZIP file. An application could dynamically source it over the WWW, a vendor might provide it as part of a shrink-wrapped application, or an integrator might deploy it over a corporate intranet.
There is no standard format for packaging and distributing the various schemas, scripts and metadata needed by generic XML applications.
This kind of format is required when online access to each component individually over the WWW is inappropriate: when the user may be offline, when the scripts are to be purchased and installed on the user's computer, when the resources over-ride the default resources retrievable using the WWW.
Scenario: My company, Topologi is creating markup tools which need to be readily configurable and flexible. It is beyond the expertise and patience of most end-users to configure a large XML or SGML application: we needed to provide some facility where the end-user can make one menu selection and have all the appropriate configuration files loaded. DZIP provides a simple way to implement and deploy this.
Following is a screen shot of a mechanism in Topologi products which opens DZIP files one-click configuration.
Scenario: An integrator, Allette Systems, has long experience in building SGML and XML systems for customers; yet they report it is still often tedious to do because a complete XML system often requires installation and configuration of several different components, each of which may require multiple files. Furthermore, current inflexible systems require more maintenance effort than clients should have to carry. A DZIP system would be simpler to deploy and potentially a single DZIP file could be made which includes the resources for each of the components in the system, despite being from different vendors.
1. A DZIP package is a ZIP file.
Rationale: ZIP is a common format, widely available on PCs and supported in Java. This is compatible with the OASIS Interchange Package rules, which do not specify the format.
2. A DZIP file is organized so that
- The root of the ZIP archive holds the basic files. Their roles can
be determined from their standard names or extensions: for example a
file called index.xhtml (or index.html, index.htm,
default.xhtml, default.html index.htm, or
filing those *.txt) contains documentation about that DZIP
file and document type. (Such a file could also be the RDDL directory,
see below.) The index file should keep all its subsiduary resources
in a subdirectory (this is how IE saves HTML pages). Catalog file can
be put here too.
- Note that this means that there can only be one file with a .dtd (.sch, etc) extension in the XAR's top-level directory. There is no provision for multiple alternative schemas at the current time, except that Schematron does support phases.
- The subdirectory sgml/ is reserved to contain SGML-specific
versions of the configuration files.
- The subdirectory vendor/ is reserved to contain subdirectories using domain names. These are for use for products from the vendors who use that format: for example, vendor/topologi.com/ would be the reserved location in which topologi systems would look for specific configuration files.
Rationale: Just taking Zipping up a DTD and an HTML file and giving it the correct extension is enough to create a DZIP2 archive: straightforward. (DZIP mark 1 used a more complex directory system, but this is more than is needed.)
Simplicity demands that there should be no mapping tables if a resource only has a single name. By enforcing that resources should be in subdirectories, the root directory is kept clean, and DZIP packagers are not constrained to follow any naming or organization convention.
Plurality demands that configuration for different vendors' products can co-exist. This is not so much so that a single DZIP can support many different applications of the same class, but rather that an integrator can deploy the configuration files for all the components in a production chain for a particular client.
3. A DZIP package may have an OASIS catalog in its root directory. The catalog has the name *.soc or CATALOG. If present, the catalog should be used to map names.
Rationale: This is compatible with the OASIS Interchange Package rules. Note that a simple user-agent may ignore the OASIS catalog, and bear any consequent failure.
4. There should only be one DTD, one WXS schema, one RELAX schema, one Schematron schema, one CSS stylesheet, one XSLT stylesheet: i.e., not two DTDs, two WXS schemas, etc.
Rationale: A document may have different stylesheets, or several different DTDs are possible. However, providing more than one requires more vendor support, to show the user choices. A DZIP file may of course have other DTDs or files in deeper, private levels.
5. The following prefixes have significance:
XML markup declarations
XML parameter entity declarations
A safe prefix for submodules, not available as the root of anything
A CSS stylesheet
.xsl or .xslt
An XSL stylesheet
A Schematron schema
.rng (was .rlx)
A RELAX NG schema
A RELAX NG Compact Syntax schema
An Examplotron schema
.xsd or .xsi
A W3C XML Schema schema
An OASIS-Open Catalog
6. In the root directory, a file "dzip_icon16.gif" is a 16x16 icon useful for adding to GUIs by user agents: it is some representation of the document type.
Rationale: For better user agents.
7. The dzip file is named using a convention:
- a user agent will use all text before the first "-" as
a name that can be presented to the user, e.g. in a menu item without
having to open the DZIP archive;
- the version number is any string after the first "-", but
which a user agent may attach some policy to (e.g., to select the one
with the largest string value);
- The extension .dz2 is used. (I use this to keep any future adoption of an XAR extention .xar free.)
A DZIP archive should follow the following name conventions: name "-" version "." extension where the string before the first hyphen is a simple name suitable for menu display in applications, the version is a string which (using lexicographical comparison) allows simple versioning, and the extension is dz2 (ultimately xar). In the simple name, any underscores can be converted to spaces for display purposes.
For example, docbook-v1-2-3.dz2
A DZIP archive can contain templates. These are partially completed XML documents which reduce the initial load on the user. Templates are any file found in a template/ subdirectory directly inside the archive, regardless of their extension.
Similar naming rules as the DZIP archive itself apply. For example, if there is a file in docbook-1-2-3.xar in template/article-01.xml, then an editing application can offer the user an article template when creating a new DOCBOOK document.
An XAR-aware application may make use of a processing instruction in an instance document to select or remember which XAR is being used. This is a very simple facility, merely using the PI target XAR and then the simple name of the XAR. For example:
<?XAR docbook ?>
An XAR-aware application will use the name to locate the appropriate XAR. For example, the DZIP file docbook_v1-2-3.dz2. The name is not a URL and does not have the version details that a full filename needs to have.
The other technologies to consider in this area are as follows.
CATALOGs, a table notation for mapping between names and locations
of entities, with an XML version. In a DZIP file, if there is
a CATALOG file in the root level it will be treated as an OASIS
Catalog file; this file would give relative (i.e. to the root) system
identifiers for public identifiers that the document type uses.
- RDDL Resource
Directory Description Language, an XHTML notation for representing the
various resources associated with a namespace URI. In a DZIP file, an
index.html file in the root will be treated as giving general documentation
for the document type; this file could be a RDDL (XHTML) document, as
is a manifest format by the Open-EBook consortium. In XPackage terminology,
DZIP is a package archive. Given any DZIP file, an XPackage
package description instance can be automatically generated,
as a manifest. Such an XPackage package description instance
can be included in a DZIP file, though there is no particular naming
convention at the moment to identify which file is the XPackage.
- DIME, Direct Internet Message Encapsulation, is a proposal from Microsoft researchers for encapsulating multiple payloads together, naming them with IDs and allowing efficient access. Presumably a DZIP file could be unbundled and sent using DIME, however DIME itself does not provide the conventions on which DZIP relies.
For further information, see Wrap Your App by Leigh Dodds, XML.COM