Automatic transformation of XML namespaces/Transformations/Automatic transformation

From Wikiversity
Jump to: navigation, search

An automatic transformation happens, when the next element of the workflow refers to a :Auto object.

Figuring out the next enriched script[edit]

Rationale. The idea is to select the enriched script with highest precedence. If there are several enriched scripts with the same precedence and this precedence is a singleton, then select an order based on grouping.

I call available chains starting from a given namespace a finite sequence of enriched scripts such that the set of source namespaces of the next enriched script intersects the set of target namespaces of the previous enriched script, and the set of target namespaces of the last enriched script is either null or intersects the set of target scripts of the current automatic transformer.

  1. If there is a path in the digraph of executed scripts from a namespace in the document to a destination namespace or null namespace, then select the highest precedence enriched script which is a first step of such a path and transform the XML document by this script. If there are several such enriched scripts, chose one as described below.
  2. Otherwise chose an enriched script among first transformers of the available chains (starting from any namespace in the current XML document) with the highest precedence ending with either a destination namespace or with true :ignoreTarget, among the available enriched scripts having as the source one of current document namespaces. Transformers having the same source and target should be skipped. (Note: Such transformers nevertheless may be useful as user-specified transformers.) Add the chosen enriched script to the digraph of used enriched scripts. If there are several such enriched scripts, chose one as described below.
  3. If both item 1 and item 2 produce no results, then there are no next enriched script.

If there are several such executable enriched scripts of the same precedence which is a singleton class, choose the executable enriched script for which there exists an available chain with highest minimal preservance and among them of the highest priority. Note that only the first element of the chain is actually used, the rest are for calculation of priorities only. If they are not a known member of to some known singular precedence, then terminate transforming or at least give a warning and choose the script with the first script of the highest priority chain.

A sequence of enriched scripts is built in order to elaborate a transformation in the best way. The sequence is characterized by priority. The priority may be calculated taking into account preservance, stability, and preference of scripts. One possible formula for priority which may (but is not required) be used is to choose a path for which the product of all completeness, stability, and preference values in the path is maximal.

Rationale: Highest preservance is taken as minimum among the path to surely factor out tag stripping transformations (such as HTML -> plain text) even if competing with many steps of transformations with higher preservances.

Remark: If an enriched script in the chain duplicates has the same source and target as an earlier used transformation, then use the earlier used enriched script. (Rationale: Keep consistency.)

The chosen sequence of enriched scripts must have no loops.

If it is unable to determine the next enriched script in the precedence order, then the processor should either give an error, or give a warning and choose the order arbitrarily.

Automatic transformation process[edit]

Automatic transformation consists of applying to the source document every script in turn, as described by the below algorithm.

After the pipeline is finished, at user option, fail if there are namespaces not in the destination list. At user option, erase all tags (and their descendants) and attributes not in the destination list.

After every transformation step and before starting the transformation, XML well-formedness should be checked. Also XML validity should be checked.

The algorithm of automatic transformation[edit]


Rationale: A phase is a series of transformations which ends either when the target namespace of the last transformation is the destination namespace of the transformation (a complete end of the transformation) or no particular namespace (as in XInclude) (and so we cannot analyzing continuing the transformation without running actual XML transformations).

A phase is an algorithm which determines an enriched script by by the following loop:

  1. Build the list of XML namespaces based on the actual current XML document.
  2. Figure next enriched script (as described in "Figuring out the next enriched script" section above).
  3. If there is no next enriched script, download the next RDF file.
  4. Exit from the loop if there are neither available enriched scripts, nor next RDF file.

The main loop[edit]

The main loop of an automatic transformation consists of repeatedly:

  1. If all namespaces in the document are in the destination namespaces list, then exit from the loop.
  2. Calculate next phase.
  3. If the next phase didn't return an enriched script, exit from the loop.
  4. Apply all enriched scripts in the phase. Apply the enriched script returned by the phase, also add this enriched script to the digraph of executed scripts.

Rationale: The following algorithm allows not to download the destination namespace RDF at all, in the case if user requires to load destination namespaces last and the transformation happens to succeed without loading destination namespace. It also can do the reverse thing (that is load only destination namespace and don't load source namespaces).

Rationale: Requiring all namespaces in the current document (not only for the root element) allows to apply precedences (what is important for example for correct XInclude processing).