Automatic transformation of XML namespaces/User options

From Wikiversity
Jump to: navigation, search

User option can specify either:

  1. to validate an XML document;
  2. to run the pipeline (see below) of transformations.

The following user options are supplied for both pipeline and validation:

  1. the source XML document;
  2. a (possibly empty) set of RDF files (RDF files which are downloaded before main loop);
  3. recursive retrieval order: an ordered subset of the three-element set {see also, sources, targets}; (Remark: This set can be empty to turn off recursive retrieval.)
  4. whether to abort processing an RDF file if there is any error in it. (Implementation node: It is easier to program when this option is false.)

If the pipeline should be run, receive also the following user options:

  1. the RDF file with a workflow (see below);
  2. either the workflow URL (see below);
  3. What to do if after running an automatic transformation there are elements or attributes which are not in the target namespaces of the transformer: ignore (and leave these elements and attributes as it), erase these elements and attributes, or fail with an error.
  4. Script and transformer classes (see the example below).

If the validation should be run, receive also the following user options:

  1. Deep-first or breadth first validation.
  2. Whether presence of unknown namespaces makes the document invalid.

If recursive download is on, it also should be specified whether to search depth-first or breadth-first.

The user specifies whether to prefer a transformer in the RDF describing the source namespace or in the RDF describing the target namespace. This influences only what (and in what order) RDF resources are loaded. After an RDF is loaded it is subject to “first loaded” rule below.

RDFs loaded earlier take precedence over RDFs loaded later. (Note: This facilitates uniformity of transformations and validations in the sense that different transformers and validation rules are not applied to the same namespaces.)

Note that objects IRI in user options should be paired with the IRI of the RDF file where it is defined. (Thus we resolve the case if the same IRI is used in two or more different RDF files.)

Script and transformer classes[edit]

Script and transformer classes are specified for pairs of namespaces (see the example below).

Script class definition specifies either a particular script or a class of scripts. Likewise for transformers.

Remark: It is OK to specify either a script or a class, because a class of scripts cannot be a script. (Can we prove it?)

TODO: In a below chapter specify how these are used. Also specify how belonging to a class is determined.

Workflow[edit]

RDF grammar trees[edit]

  •  :Transformer
    • 1..1 :source
    • 1..1 :object
  •  :Script
    • 1..1 :source
    • 1..1 :object
  •  :Auto
    • 1..* :url

It is erroneous for a single object to be of more than one of the following classes: :Transformer, :Script, :Auto.

For a single object it must be specified no more than one :workflow predicate. The subject of the :workflow predicate must be a list whose elements are :Transformer, :Script, or :Auto.

Any validity errors in workflow are fatal.

It is recommended that instead of a workflow, we can just specify a set of IRIs (namespaces). In this case the workflow consists of one automatic transformation with the specified target namespaces.

Semantic[edit]

It is meant to process giving the source XML document to the first element of the workflow list, the result to the second element of the workflow list, and so on till the last element of the workflow list.

A workflow list can refer to a transformation (meaning to use a script of this transformation which the implementation chooses), to a particular script, or to an automatic transformation with a given set of destination URLs.

See below in this specification about meanings of scripts, transformations, and automatic transformation.

If there is given no user option about which workflow to use, the workflow represented by the object :defaultWorkflow is used, if it is available. If it is not specified which workflow to use and there are no :defaultWorkflow object, it is a fatal error.

Example[edit]

@prefix : <http://portonvictor.org/ns/workflow/> . # update the namespace in a final version of this document

# TODO: Should we allow :transformerClass and :scriptClass in regular RDF files (not only in options)?

[
  :sourceNamespace <http://example.org/nsA> ;
  :targetNamespace <http://example.org/nsB>
] :useTransformerClass <...> .

[
  :sourceNamespace <http://example.org/nsX> ;
  :targetNamespace <http://example.org/nsY>
] :useScriptClass <...> .

# To refer to a particular transformer or script use :useTransformer or :useScript instead of :useTransformerClass or :useScriptClass

# FIXME: 1. :targetNamespace may be missing; 2. we may need to refer to a particular transformer or script not a class.

:defaultWorkflow :workflow (
  [ a :Transformer ; :source <http://example.org/info.rdf> ; :object <http://example.org/object> ]
  [ a :Auto ;
    :url <http://example.org/ns1> ;
    :url <http://example.org/ns2> 
  ]
  [ a :Script ; :source <http://example.org/info2.rdf> ; :object <http://example.org/object2> ]
) .