Document-Type Interchange Package (DZIP)
Rick Jelliffe, Topologi, 2001-11-27
DZIP is an implemented format used by Topologi to allow the packaging and interchange of metadata, schemas and scripts for XML document types. We are making its spec available for other software creators to use, and to propose it as the basis for an XAR (XML Application Archive format).
What It Is
A simple format for bundling and distributing the various schemas, scripts and metadata needed by generic XML desktop applications.
- Brain-friendly: It is just ZIP with some standard naming conventions
for particular files inside it. One convention is If there is a file
index.html that is the human-readable documentation.
- User-friendly: Users can download and install a single file
with all the different configuration and metadata files they need to
work on a document type.
- Integrator-friendly: Integrators can readily create a DZIP
file with any value-added files just using the existing utilities on
most developers' PCs. Configuration files for multiple vendors' products
can co-exist, so the same DZIP file can serve a complete system.
- Infrastructure-friendly: Compatible with various manifest
and catalog formats.
- Vendor-friendly: We found it took less than a half day to add DZIP support.
What It Is Not
- For Data: A way to package XML documents: DZIP is only aimed
at configuration files such as schema files. Imagine the kinds of things
that an XML desktop application might require when you click on File>New....
- For Complex Relationships: Manifest formats (sometimes called
packaging formats) allow very sophisticated relationships to
be expressed. However, DZIP contents itself with merely providing very
flexible and convenient ways to distribute basic configuration or schema
files for a single document type.
- Super Efficient: ZIP is not a format with best-of-breed archiving,
indexed access, signing, encryption, compression, localization, etc.
DZIP is appropriate when the convenience factor outweighs the need for
high-end performance. However, ZIP does provide adequate compression,
checksums, random access, permissions.
- For Pay-by-Use: DZIP is targeted mostly at the needs of the
publishing industry. Consequently, aspects such as digital rights are
not at issue. An integrator could charge for providing or maintaining
a DZIP file as part of a service, but there is no special mechanism
built in for copy protection, billing, user tracking, licensing, etc.
- Anti-WWW: DZIP does not assume any model of how the user gets hold of the DZIP file. An application could dynamically source it over the WWW, a vendor might provide it as part of a shrink-wrapped application, or an integrator might deploy it over a corporate intranet.
There is no standard format for packaging and distributing the various schemas, scripts and metadata needed by generic XML applications.
This kind of format is required when online access to each component individually over the WWW is inappropriate: when the user may be offline, when the scripts are to be purchased and installed on the user's computer, when the resources over-ride the default resources retrievable using the WWW.
Scenario: My company, Topologi is creating markup tools which need to be readily configurable and flexible. It is beyond the expertise and patience of most end-users to configure a large XML or SGML application: we needed to provide some facility where the end user can make one menu selection and have all the appropriate configuration files loaded. DZIP provides a simple way to implement and deploy this.
Following is a screen shot of a selection mechanism in a Topologi product under development, which opens DZIP files one-click configuration.
Scenario: An integrator, Allette Systems, has long experience in building SGML and XML systems for customers; yet they report it is still often tedious to do because a complete XML system often requires installation and configuration of several different components, each of which may require multiple files. Furthermore, current inflexible systems require more maintenance effort than clients should have to carry. A DZIP system would be simpler to deploy and, potentially, a single DZIP file could be made which includes the resources for each of the components in the system, despite being from different vendors.
1. A DZIP package is a ZIP file.
Rationale: ZIP is a common format, widely available on PCs and supported in Java. This is compatible with the OASIS Interchange Package rules, which do not specify the format.
2. A DZIP file is organized so that
- Files in the root of the ZIP archive give information about that
archive, in particular that a file called index.xhtml (or index.html,
index.htm, default.xhtml, default.html index.htm, or filing those
*.txt) contains documentation about that DZIP file and document type.
(Such a file could also be the RDDL directory, see below.) A Catalog
file can be put here too.
- Files in the second level (i.e., */*) contain configurations files
for the document type, detected by their extensions.
- Files in other levels contain vendor-specific code. It is advisable to use the path of your domain to provide namespacing: vendors may care to make vendor/domainname/ such as vendor/topologi.com/ well-known locations in which their systems will look for specific configuration files.
Rationale: The index file of the first level may require its own CSS stylesheets, DTDs etc. Therefore it is inappropriate to look in the root directory for configuration files. Instead, we look in the second level.
Simplicity demands that there should be no mapping tables if a resource only has a single name. By enforcing that resources should be in subdirectories, the root directory is kept clean, and DZIP packagers are not constrained to follow any naming or organization convention.
Plurality demands that configuration for different vendor's products can co-exist. This is not so much so that a single DZIP can support many different applications of the same class, but rather that an integrator can deploy the configuration files for all the components in a production chain for a particular client.
3. A DZIP package may have an OASIS catalog, in its root directory. The catalog has the name *.soc or CATALOG. If present, the catalog should be used to map names.
Rationale: This is compatible with the OASIS Interchange Package rules. Note that a simple user-agent may ignore the OASIS catalog, and bear any consequent failure.
4. There should only be one DTD, one XS Schema, one RELAX schema, one Schematron schema, one CSS stylesheet, one XSLT stylesheet.
Rationale: A document may have different stylesheets, or several different DTDs. However, providing more than one requires more vendor support, to show the user choices. A DZIP file may of course have other DTDs or files in deeper, private levels.
5. The following prefixes have significance:
XML markup declarations
XML parameter entity declarations
A safe prefix for submodules, not available as the root of anything
A CSS stylesheet
.xsl or .xslt
An XSL stylesheet
A Schematron schema
A RELAX NG schema
.xsd or .xsi
A W3C XML Schema schema
6. In the root directory, a file "dzip_icon16.gif" is a 16x16 icon useful for adding to GUIs by user agents: it is some representation of the document type.
Rationale: For better user agents.
7. The dzip file is named using a convention:
- a user agent will use all text before the first "-" as
a name that can be presented to the user, e.g. in a menu item without
having to open the DZIP archive;
- the version number is any string after the first "-", but
which a user agent may attach some policy to (e.g., to select the one
with the largest string value);
- The extension .dzp is used. (I use this to keep any future adoption of an XAR extention .xar free.)
The other technologies to consider in this area are as follows.
CATALOGs, a table notation for mapping between names and locations
of entities, with an XML version. In a DZIP file, if there is
a CATALOG file in the root level it will be treated as an OASIS
Catalog file; this file would give relative (i.e. to the root) system
identifiers for public identifiers that the document type uses.
- RDDL Resource
Directory Description Language, an XHTML notation for representing the
various resources associated with a namespace URI. In a DZIP file, an
index.html file in the root will be treated as giving general documentation
for the document type; this file could be a RDDL (XHTML) document, as
is a manifest format by the Open-EBook consortium. In XPackage terminology,
DZIP is a package archive. Given any DZIP file, an XPackage package
description instance can be automatically generated, as a manifest.
Such an XPackage package description instance can be included
in a DZIP file, though there is no particular naming convention at the
moment to identify which file is the XPackage.
- DIME, Direct Internet Message Encapsulation, is a proposal from Microsoft researchers for encapsulating multiple payloads together, naming them with IDs and allowing efficient access. Presumably a DZIP file could be unbundled and sent using DIME, however DIME itself does not provide the conventions on which DZIP relies.
For further information, see Wrap Your App by Leigh Dodds, XML.COM