Rick Jelliffe, Topologi,
2003-01-15
DZIP2 is an implemented format used by Topologi to allow
the packaging and interchange of metadata, schemas and scripts
for XML document types. We are making its spec available for
other software creators to use, and to propose it as the basis
for an XAR (XML Application Archive format). (DZIP
2 improves on DZIP 1, which has been
prototyped in a Topologi product, with the criterion that
it should take an engineer a half day to implement a basic
DZIP 2 system.)
What It Is
A simple format for bundling and distributing the various
schemas, scripts and metadata needed by generic XML desktop
applications.
- Brain-friendly: It is just ZIP with some standard
naming conventions for particular files inside it. One convention
is
If there is a file index.html that is the
human-readable documentation.
- User-friendly: Users can download and install a
single file with all the different configuration and metadata
files they need to work on a document type.
- Integrator-friendly: Integrators can readily create
a DZIP file with any value-added files just using the existing
utilities on most developer's PCs. Configuration files for
multiple vendor's products can co-exist, so the same DZIP
file can serve a complete system.
- Infrastructure-friendly: Compatible with various
manifest and catalog formats.
- Vendor-friendly: We found it took less than a half
day to add DZIP support.
What It Is Not
- For Data: A way to package XML documents: DZIP
is only aimed at configuration files such as schema files.
Imagine the kinds of things that an XML desktop application
might require when you click on File>New....
- For Complex Relationships: Manifest formats (sometimes
called packaging formats) allow very sophisticated
relationships to be expressed. However, DZIP contents itself
with merely providing very flexible and convenient ways
to distribute basic configuration or schema files for a
single document type.
- Super Efficient: ZIP is not a format with best-of-breed
archiving, indexed access, signing, encryption, compression,
localization, etc. DZIP is appropriate when the convenience
factor outweighs the need for high-end performance. However,
ZIP does provide adequate compression, checksums, random
access, permissions.
- For Pay-by-Use: DZIP is targeted mostly at the
needs of the publishing industry. Consequently, aspects
such as digital rights are not at issue. An integrator could
charge for providing or maintaining a DZIP file as part
of a service, but there is no special mechanism built in
for copy protection, billing, user tracking, licensing,
etc.
- Anti-WWW: DZIP does not assume any model of how
the user gets hold of the DZIP file. An application could
dynamically source it over the WWW, a vendor might provide
it as part of a shrink-wrapped application, or an integrator
might deploy it over a corporate intranet.
Motivation
There is no standard format for packaging and distributing
the various schemas, scripts and metadata needed by generic
XML applications.
This kind of format is required when online access to each
component individually over the WWW is inappropriate: when
the user may be offline, when the scripts are to be purchased
and installed on the user's computer, when the resources over-ride
the default resources retrievable using the WWW.
Scenario: My company, Topologi is creating markup tools
which need to be readily configurable and flexible. It is
beyond the expertise and patience of most end-users to configure
a large XML or SGML application: we needed to provide some
facility where the end user can make one menu selection and
have all the appropriate configuration files loaded. DZIP
provides a simple way to implement and deploy this.
Following is a screen shot of a selection mechanism in a
Topologi product under development, which opens DZIP files
one-click configuration.
Scenario: An integrator, Allette Systems, has long experience
in building SGML and XML systems for customers; yet they report
it is still often tedious to do because a complete XML system
often requires installation and configuration of several different
components, each of which may require multiple files. Furthermore,
current inflexible systems require more maintenance effort
than clients should have to carry. A DZIP system would be
simpler to deploy and, potentially, a single DZIP file could
be made which includes the resources for each of the components
in the system, despite being from different vendors.
Description
1. A DZIP package is a ZIP file.
Rationale: ZIP is a common format, widely available on PCs
and supported in Java. This is compatible with the OASIS Interchange
Package rules, which do not specify the format.
2. A DZIP file is organized so that
- The root of the ZIP archive holds the basic files. Their
roles can be determined from their standard names or extensions:
for example a file called index.xhtml (or index.html,
index.htm, default.xhtml, default.html
index.htm, or filing those *.txt) contains
documentation about that DZIP file and document type. (Such
a file could also be the RDDL directory, see below.) The
index file should keep all its subsiduary resources in a
ubdirectory (this is how IE saves HTML pages). Catalog file
can be put here too.
- The subdirectory sgml/ is reserved to contain
SGML-specific versions of the configuration files.
- Subdirectories under vendor/ using domain names
are reserved for use for products from the vendors who use
that format: for example, vendor/topologi.com/
would be the reserved location in which topologi systems
would look for specific configuration files.
Rationale: Just taking Zipping up a DTD and an HTML file
and giving it the correct extension is enough to create a
DZIP2 archive: straightforward. (DZIP mark 1 used a more complex
directory system, but this is more than is needed.)
Simplicity demands that there should be no mapping tables
if a resource only has a single name. By enforcing that resources
should be in subdirectories, the root directory is kept clean,
and DZIP packagers are not constrained to follow any naming
or organization convention.
Plurality demands that configuration for different vendor's
products can co-exist. This is not so much so that a single
DZIP can support many different applications of the same class,
but rather than an integrator can deploy the configuration
files for all the components in a production chain for a particular
client.
3. A DZIP package may have an OASIS catalog, in its root
directory. The catalog has the name *.soc or CATALOG. If present,
the catalog should be used to map names.
Rationale: This is compatible with the OASIS Interchange
Package rules. Note that a simple user-agent may ignore the
OASIS catalog, and bare any consequent failure.
4. There should only be one DTD, one XS Schema, one RELAX
schema, one Schematron schema, one CSS stylesheet, one XSLT
stylsheet.
Rationale: A document may have different stylesheets, or
several different DTDs possible. However, providing more than
one requires more vendor support, to show the user choices.
A DZIP file may of course have other DTDs or files in deeper,
private levels.
5. The following prefixes have significance:
- .dtd
- XML markup declarations
- .ent
- XML parameter entity declarations
- .mod
- A safe prefix for submodules, not available as the root
of anything
- .css
- A CSS stylesheet
- .xsl or .xslt
- An XSL stylesheet
- .sch
- A Schematron
schema
- .rng (was .rlx)
- A RELAX NG schema
- .rnc
- A RELAX NG (compact syntax) schema
- .eg
- A Examplotron schema
- .xsd or .xsi
- A W3C XML Schema schema
- .cat
- An XML OASIS-Open Catalog
In an sgml/ subdirectory, the following extensions
have significance:
- .dtd
- SGML markup declarations
- .cat
- An SGML OASIS-Open Catalog
6. In the root directory, a file "dzip_icon16.gif" is a 16x16
icon useful for adding to GUIs by user agents: it is some
representation of the document type.
Rationale: For better user agents.
7. The dzip file is named using a convention:
name-version.dzp
where
- a user agent will use all text before the first "-" as
a name that can be presented to the user, e.g. in a menu
item without having to open the DZIP archive;
- the version number is any string after the first "-",
but which a user agent may attach some policy to (e.g.,
to select the one with the largest string value);
- The extension .dz2 is used. (I use this to keep
any future adoption of an XAR extention .xar free.)
Naming
A DZIP archive should follow the following name conventions:
name "-" version "." extension where the string before
the first hyphen is a simple name suitable for menu display
in applications, the version is a string which (using lexicographical
comparison) allows simple versioning, and the extension is
dz2 (ultimately xar). In the simple name,
any underscores can be converted to spaces for display purposes.
For example, docbook-v1-2-3.dz2
Templates
A DZIP archive can contain templates. These are partially
completed XML documents which reduce the initial load on the
user. Templates are any file found in a template/
subdirectory directly inside the archive, regardless of their
extension.
Similar naming rules as the DZIP archive itself apply. For
example, if there is a file in docbook-1-2-3.xar
in template/article-01.xml, then an editing application
can offer the user an article template when creating
a new DOCBOOK document.
XAR PI
An XAR-aware application may make use of a processing instruction
in an instance document to select or remember which XAR is
being used. This is a very simple facility, merely using the
PI target XAR and then the simple name of the XAR.
For example:
<?XAR docbook ?>
An XAR-aware application will use the name to locate the
appropriate XAR. For example, the DZIP file docbook_v1-2-3.dz2.
The name is not a URL and does not have the version details
that a full filename needs to have.
Other Technology
The other technologies to consider in this area are as follows.
- OASIS
CATALOGs, a table notation for mapping between names
and locations of entities, with an XML
version. In a DZIP file, if there is a CATALOG
file in the root level it will be treated as an OASIS Catalog
file; this file would give relative (i.e. to the root) system
identifiers for public identifiers that the document type
uses.
- RDDL Resource Directory
Description Language, an XHTML notation for representing
the various resources associated with a namespace URI. In
a DZIP file, an index.html file in the root will be treated
as giving general documentation for the document type; this
file could be a RDDL (XHTML) document, as a manifest.
- XPackage is a manifest format
by the Open-EBook consortium. In XPackage terminology, DZIP
is a package archive. Given any DZIP file, an XPackage
package description instance can be automatically
generated, as a manifest. Such an XPackage package description
instance can be included in a DZIP file, though there
is no particular naming convention at the moment to identify
which file is the XPackage.
- DIME,
Direct Internet Message Encapsulation, is a proposal
from Microsoft researchers for encapsulating multiple payloads
together, naming them with IDs and allow efficient access.
Presumably a DZIP file could be unbundled and sent using
DIME, however DIME itself does not provide the conventions
on which DZIP relies.
For further information, see Wrap
Your App by Leigh Dodds, XML.COM
|