Automatic transformation of XML namespaces/Transformations/Automatic transformation

From Wikiversity
Jump to: navigation, search

An automatic transformation happens, when it is specified to do an automatic transformation, or when doing a workflow and the next element of the workflow refers to a :Auto object.

Figuring out the next enriched script[edit]

Rationale. The idea is to select the enriched script with highest precedence. If there are several enriched scripts with the same precedence and this precedence is a singleton, then select an order based on grouping.

I call a transformer universal if it is marked as :universal true and the "universal precedence" is known to be less than or equal than the precedence of this transformer.

  1. Assign to variable available_chains the set of finite sequences of enriched scripts such that:
    1. the set of source namespaces of the next enriched script is superset of the set target namespaces of the previous enriched script;
    2. the set of source namespaces of the first enriched script intersects the set of namespaces of the current XML document.
    3. the set of target namespaces of the last enriched script intersects the set of target scripts of the current automatic transformer or the latest enriched script is universal.
    4. (in addition to two above rules) the sequence is not a subsequence of other available chain (Consequently transformers having the same source and target should be skipped. Remark: Such transformers nevertheless may be useful as user-specified transformers.)
  2. If available_chains is empty set, then there is no next enriched script.
  3. If there is is a nonzero number of paths in available_chains which are entirely in the set of executed scripts, assign available_chains the set of such paths.
    1. Optionally (TODO: add to user options) if there is more than one first steps of such executed scripts, give a warning or an error.
  4. Choose an enriched script among first transformers of in available_chains with the highest precedence.
  5. Disable "all other namespaces". in the generalized queue. [FIXME: Explain: What are "all other namespaces". Also: What to disable if a namespace is both in the current XML document and in other (in seeAlso or e.g. in a destination of a transformer)?]
  6. If there is no first transformer of in available_chains with the highest precedence, then load the next asset and repeat. If there is no next asset, fail with an error.
  7. Enable all element in the generalized queue again.
  8. If there are several such enriched scripts and their precedence is a singleton class, choose the enriched script for which there exists an available chain with highest minimal preservance and among them of the highest priority (see below). Note that only the first element of the chain is actually used, the rest are for calculation of priorities only. If the first elements of the chains are not a known member of to some known singular precedence, then terminate transforming or at least give a warning and choose the script with the first script of the highest priority chain.
  9. Add the chosen enriched script to the digraph of used enriched scripts.

A sequence of enriched scripts is built in order to elaborate a transformation in the best way. The sequence is characterized by priority which should be minimized. The priority may be calculated taking into account preservance (), stability (), and preference () of scripts. Possible formulas for priority which may (but is not required) be used to choose a path:

TODO: Require .

Rationale: Highest preservance is taken as minimum among the path to surely factor out tag stripping transformations (such as HTML -> plain text) even if competing with many steps of transformations with higher preservances.

Remark: If an enriched script in the chain duplicates has the same source and target as an earlier used transformation, then use the earlier used enriched script. (Rationale: Keep consistency.)

The chosen sequence of enriched scripts must have no cycles.

If it is unable to determine the next enriched script in the precedence order, then the processor should either give an error, or give a warning and choose the order arbitrarily.

Automatic transformation process[edit]

Automatic transformation consists of applying to the source document every script in turn, as described by the below algorithm.

After the pipeline is finished, at user option, fail if there are namespaces not in the destination list. At user option, erase all tags (and their descendants) and attributes not in the destination list.

After every transformation step and before starting the transformation, XML well-formedness should be checked. Also XML validity should be checked.

The algorithm of automatic transformation[edit]

Phases[edit]

Rationale: A phase is a series of transformations which ends either when the target namespace of the last transformation is the destination namespace of the transformation (a complete end of the transformation) or no particular namespace (as in XInclude) (and so we cannot analyzing continuing the transformation without running actual XML transformations).

A phase is an algorithm which determines an enriched script by by the following loop:

  1. Build the list of XML namespaces based on the actual current XML document.
  2. Figure next enriched script (as described in "Figuring out the next enriched script" section above).
  3. If there is no next enriched script, download the next RDF file.
  4. Exit from the loop if there are neither available enriched scripts, nor next RDF file.

The main loop[edit]

The main loop of an automatic transformation consists of repeatedly:

  1. If all namespaces in the document are in the destination namespaces list, then exit from the loop.
  2. Calculate next phase.
  3. If the next phase didn't return an enriched script, exit from the loop.
  4. Apply all enriched scripts in the phase. Apply the enriched script returned by the phase, also add this enriched script to the set of executed scripts.

Rationale: The following algorithm allows not to download the destination namespace RDF at all, in the case if user requires to load destination namespaces last and the transformation happens to succeed without loading destination namespace. It also can do the reverse thing (that is load only destination namespace and don't load source namespaces).

Rationale: Requiring all namespaces in the current document (not only for the root element) allows to apply precedences (what is important for example for correct XInclude processing).