About Us  Contact  Legal  Accessibility Topologi
The Alternative Datatype Library

This page describes a set of primitive datatypes that as an alternative to the W3C types, especially for publishing. No facet mechanism is used.

Another way to look at these might be as parse specifications, like regular fragmentations, which allow a value to be split into an infoset which can have its own schema.

Exact Quantities

This type allows exact quantities, such as lengths and money. The W3C XML Schemas syntax currently does not allow exact numbers, nor numbers tied to units, nor number/unit combinations not separate by whitespace.

  • The syntax is ( (currency|symbol)? s* "-"? digit+ (("."|"-"|",") digit+)? s* (nmtoken s*)?)+ which allows 1.1 -1.1pt 2 24.005 cm or $ 5-50
  • The value space is an ordered list of symbol/number/unit triples, where the symbol and unit may be absent.

The number is a base 10 number; implementations may convert the number to their own precision if they must, however the intent is that the value space of a number is not the closest floating point number, but the number specified.


A picture is a simple way to represent common kinds of information, without needing the full power of regular expressions. The W3C types do not provide a way for naming particles in an expression.

  • The lexical space is ( letter+ (delimiter letter+)* ) where each group of letters must be the same, e.g., yyyy-mm-dd.
  • The number of letters in each group indicates the number of letters expected in the data.
  • The value space is a list made up of information set items for each group of letters.

This is an absolute time.

  • The lexical space is ( YYYY-MM-DD )? s~+ (hh:mm:ss)? s~+ zone? where the zone might be UTC. (s~+ means "one or more, if there is a left and right string", to simplify the expression.) This is a kind of 8601 date.
  • If no zone is specified then the time zone is "UTC". If no hours are specified, then the hours are 00:00:00. If no date is specified, then the formal default date value is "0000-00-00", which the system will may reinterpret by context.
Three-valued logic

This allows simple three-valued logics to be expressed. The W3C types only allow two-value logic and only true/false or yes/no.

  • The value space is true, false, other.
  • The lexical space must be declared in each case. For example yes, no, unknown or ja, nein, inherit.
Script Type

This type sets up simple rules which are almost enough to cope with LISP/C-family languages and many others that follow similar conventions. The W3C regular expressions cannot express nesting delimiter matching.

  • All Unicode delimiters with symmetrical swapping pairs must have their matching pair and be nested. So aaa(bbb[ccc]ddd)eee is valid but aaa(bbb[ccc)ddd]eee is not.
  • There are two kinds of literals: starting and ending with " and starting and ending with '. No delimiters or comments are recognized inside literals.
  • Comments appear between /* and */ or from // to the end of a line (note that because newlines are normalized to spaces in XML attributes, a // in an attribute value makes the rest of the attribute value into a comment.

The value space (infoset) is a tree, with comments stripped.


  Copyright 2002-04 Topologi Pty. Ltd. ABN 74 096 635 102 Webmaster