This page describes a set of primitive datatypes that as
an alternative to the W3C types, especially for publishing.
No facet mechanism is used.
Another way to look at these might be as parse specifications,
like regular fragmentations, which allow a value to be split
into an infoset which can have its own schema.
Exact Quantities
This type allows exact quantities, such as lengths and money.
The W3C XML Schemas syntax currently does not allow exact
numbers, nor numbers tied to units, nor number/unit combinations
not separate by whitespace.
- The syntax is ( (currency|symbol)? s* "-"? digit+
(("."|"-"|",") digit+)? s* (nmtoken s*)?)+ which allows
1.1 -1.1pt 2 24.005 cm or $ 5-50
- The value space is an ordered list of symbol/number/unit
triples, where the symbol and unit may be absent.
The number is a base 10 number; implementations may convert
the number to their own precision if they must, however the
intent is that the value space of a number is not the closest
floating point number, but the number specified.
Pictures
A picture is a simple way to represent common kinds of information,
without needing the full power of regular expressions. The
W3C types do not provide a way for naming particles in an
expression.
- The lexical space is ( letter+ (delimiter letter+)*
) where each group of letters must be the same, e.g.,
yyyy-mm-dd.
- The number of letters in each group indicates the number
of letters expected in the data.
- The value space is a list made up of information set items
for each group of letters.
Time
This is an absolute time.
- The lexical space is ( YYYY-MM-DD )? s~+ (hh:mm:ss)?
s~+ zone? where the zone might be UTC. (s~+
means "one or more, if there is a left and right string",
to simplify the expression.) This is a kind of 8601 date.
- If no zone is specified then the time zone is "UTC". If
no hours are specified, then the hours are 00:00:00. If
no date is specified, then the formal default date value
is "0000-00-00", which the system will may reinterpret by
context.
Three-valued logic
This allows simple three-valued logics to be expressed. The
W3C types only allow two-value logic and only true/false or
yes/no.
- The value space is true, false, other.
- The lexical space must be declared in each case. For example
yes, no, unknown or ja, nein, inherit.
Script Type
This type sets up simple rules which are almost enough to
cope with LISP/C-family languages and many others that follow
similar conventions. The W3C regular expressions cannot express
nesting delimiter matching.
- All Unicode delimiters with symmetrical swapping pairs
must have their matching pair and be nested. So aaa(bbb[ccc]ddd)eee
is valid but aaa(bbb[ccc)ddd]eee is not.
- There are two kinds of literals: starting and ending with
" and starting and ending with '. No delimiters or comments
are recognized inside literals.
- Comments appear between /* and */ or from // to the end
of a line (note that because newlines are normalized to
spaces in XML attributes, a // in an attribute value makes
the rest of the attribute value into a comment.
The value space (infoset) is a tree, with comments stripped.
|