Automatic transformation of XML namespaces/RDF resource format

From Wikiversity
Jump to: navigation, search

RDF resource format[edit]

The RDF file is valid when both it conforms to the grammar forest with :Namespace and :Transformer roots.

rdfs:seeAlso predicates[edit]

When reading an RDF file, it should process triples of the forms:

<http://our-uri> rdfs:seeAlso (IRI1 IRI2 ...) .

This should add the IRIs to the list of RDF files to be downloaded (in the order of recursive retrieval described elsewhere in this specification).

Rationale: We may want the same transformation in source and target namespaces. So we want do split it into a separate file loaded by rdfs:seeAlso directive.

IRIn may be either an IRI or blank node like this:

[
  :iri <...>;
  :mode :transform;
  :mode :validate
]

When it's a blank node, the URL is to be loaded only for specified modes. (In the example above, it would be loaded for both modes, that is just like a plain IRI.)

Scripts[edit]

A script is something which accepts an input (some XML text, in this specification) and generates an output (a text and/or a program exit status). (A script may be a Unix command, Web service, etc.)

A script is represented as an RDF node with certain properties.

This specification provides the following classes of scripts:

  • command line
  • script in a specified programming language
  • A Web service

Validator kind (see below) is either entire document (:entire) or by parts (:parts).

Command line[edit]

  •  :CommandLine
    • {1..1} :commandString [xsd:string] (command line)
    • {0..1} :transformerKind (transformer kind)
    • {0..1} :validatorKind (validator kind)
    • {0..1} :OkResult (result denoting OK)
    • {0..1} :completeness (completeness)
    • {0..1} :stability (stability)
    • {0..1} :preference (preference)

Example:

 :script
   a :CommandLine ;
   :commandString "xmllint --format" .

In this example, XML is just validated and reformatted, not really modified.

Script in a specified programming language[edit]

Script for a named programming language (see below):

  •  :NamedScript
    • {1..1} :language (IRI) (programming language)
    • {0..1} :minVersion (minimum version)
    • {0..1} :maxVersion (maximum version)
    • {1..1} :scriptURL (URL of the script)
    • {0..1} :OkResult (result denoting OK)
    • {0..1} :completeness (completeness)
    • {0..1} :stability (stability)
    • {0..1} :preference (preference)
    • {0..1} :transformerKind (transformer kind)
    • {0..1} :validatorKind (validator kind)

A Web service[edit]

  •  :WebService
    • {1..1} :form (IRI) (request IRI)
    • {1..1} :method (HTTP method)
    • {1..1} :xmlField [xsd:string] (field for XML)
    • {0..1} :transformerKind (transformer kind)
    • {0..1} :validatorKind (validator kind)

Validity constraint: :validatorKind must be present only for validators. :transformerKind must be present only for transformers. :validatorKind must be present for validators. :transformerKind must be present for transformers.

RDF describing a namespace[edit]

Namespaces are described as instances of :Namespace class.

Their format tree:

  •  :Namespace
    • {0..*} :validator (validator)
      • _ (script node)

Example:

 <http://purl.org/dc/terms/> 
   a :Namespace ;
   dc:description <http://...> ;
   # Other Dublin Core metadata.
 
   :link [
     :url <http://www.rddl.org/> ;
     :role <http://www.rddl.org/> ;
     :nature <http://www.w3.org/1999/xhtml> ;
     :purpose <http://www.rddl.org/purposes#schema-validation>
   ] ;
   
   :validator [
     a lang:Python ;
     :minVersion "2.1" ;
     :maxVersion "3.2" ;
     :scriptURL <http://example.org/script.py> ;
     :OkResult "OK" ;
     :completeness 0.9 ;
     :stability 0.9 ;
     :preference 0.9 ;
     :validatorKind :entire
   ] ;

A :validator is specified in the same way as :script-data (see below), except that :transformerKind parameter is ignored. The validator may have :OkResult to specify what output of the validator signifies a valid document. In absence of :OkResult for a named script and :CommandLine valid document is signified by successful command return value (0 on Unix) and for :WebService the value of :OkResult defaults to empty string.

In :validator the property :language may also refer to a namespace URL of some XML scheme (such as http://www.w3.org/2001/XMLSchema). In this case :OkResult is ignored.

A human readable description of a namespace should be specified with Dublin Core parameters.

The :link nodes are like resources in RDDL (but with our namespace instead of RDDL namespace).

A namespace description may provide :validate parameter to specify how to validate the documents whose root element is of our namespace. The :validate parameter has a subparameter :nature which should be understood accordingly RDDL specification.

There may be multiple :validate parameters in order to allow to use schema of different natures.

link parameter with subparameters :role and :nature is backward compatible with RDDL and should be understood in accordance with the RDDL specification.

Also a namespace may be a member of the following classes: :NotGrouped, :GroupedWithDescendants, :GroupedAll. See grouping examples.

RDF describing a transformer[edit]

Note: Transformers should be run in a secure sandbox, so that they would be unable to damage or read user's files. Also the time of the entire operation should be limited. (Rationale: If we are going to limit particular parts of the entire process rather it as a whole, then we would be unable to limit parts of operations done by sandboxed application, and the entire stuff would make no sense.) We may also limit the total amount of data transferred through the network, if the operating system supports it. (We can't limit a specific operation inside the sandbox.)

Implementation note: Such sandboxing can be implemented for example with SELinux for Linux. It is tempting to use Java security manager, but as of start of 2014 year, Java security is too buggy and therefore should not be used.

Their formal tree:

  •  :Transformer
    • {1..*} :sourceNamespace (source namespaces)
    • {0..*} :targetNamespace (target namespaces)
    • {0..1} :targetNamespacesSet (always has the object :allNamespaces)
    • {1..1} :precedence (precedence)
    • {0..*} :script (script)
      • _ (script node)

Here is an example of an XSLT transformer:

 <...>
   a :Transformer ;
   dc:description <http://...> ;
   # Other Dublin Core metadata.
   :sourceNamespace <...> ;
   :targetNamespace <...> ;
   :precedence <...> ;
   :script [
     a lang:XSLT ;
     :minVersion "2.0" ;
     :scriptURL <http://example.org/scripts/foo.xslt> ;
     :transformerKind :entire ;
     :argument [
       :name "debug" ;
       :value false
     ] ;
     :argument [
       :name "other" ;
       :value 123
     ] ;
     #:initial-context-node ... ; # See XSLT 2.0 spec.
     initial-template "first" ;
     initial-mode: "first" ;
     completeness 0.9 ;
     stability 0.9 ;
     preference 0.9
   ] .

Both :sourceNamespace and :targetNamespace parameters are not required.

It is recommended but not required that objects of predicates :sourceNamespace and :targetNamespace are of :Namespace class.

A transformer may have no target NS. Example: XInclude. In this case every NS in consideration can act as the target.

We need to define precedences for different kinds of transformers, for example we would probably have the precedence “include” for XInclude and other cross-document facilities, “macro” for macroses, or precedence “formatting” for a transformer generating XSL formatting objects or SVG.

Common arguments[edit]

All transformers are subclasses of the class :Transformer. All transformers accept the following parameters:

  •  :transformerKind may be :entire, :sequential, :upDown, :downUp. It is used accordingly the section “Order kinds of of document transformers”.
  •  :completeness, :stability and :preference specify a number 0..1.0. :completeness describes how much of the transformer is implemented. :stability describes how reliable is the transformer (that is whether it is likely to crash or produce meaningless results), :preference is to denote other factors for calculating priority (see below).
  •  :targetNamespacesSet predicate with :allNamespaces object denotes that all namespaces are target namespace for this transformer. It is disallowed to have both :targetNamespace and :targetNamespacesSet predicates for a transformer.

Priority of a chain of transformations is calculated using completeness, stability, and preference of the links of the chain. The recommended algorithm is to multiply all completenesses, stabilities, and precedences of all links and then sum them.

All validators are subclasses of the class :validator. All valdators accept the following parameters:

  •  :validatorKind may be :entire or :pars. It is used as described in the Validation chapter.

Particular types of transformers[edit]

XSLT, Java, Python, Ruby, et al[edit]
 :script [
   a lang:Python ;
   :minVersion "2.1" ;
   :maxVersion  "3.2" ;
   :scriptURL <http://example.org/script.py>
 ]

This example means that the script http://example.org/script.py is run by Python interpreter of at least 2.1 up to 3.2 version.

  • named script
    • {0..1} :minVersion "2.1" (xsd:string) (minimum version)
    • {0..1} :maxVersion "3.2" (xsd:string) (maximum version)
    • {1..1} :scriptURL (script URL)
    • {0..1} :arguments (only for XSLT) (script arguments)
    • {0..1} :initialTemplate (only for XSLT) (the initial template for XSLT)
    • {0..1} :initialMode (only for XSLT) (specifies the initial mode for XSLT)

Recommendation: If several suitable versions of the interpreter are available, use the maximal allowed version.

The following languages should be available:

  • XSLT
  • Python
  • Java
  • Ruby
  • Perl
  • TODO
Web service.[edit]
 :script
   a :WebService ;
   :form <http://example.org/form> ;
   :method "post" # or "get" ;
   :xmlField "text" .

This sends POST request to http://example.org/form which should return an XML document.

List of transformers[edit]

Transformers are ordered like this:

<http://our-uri> :transformersList (IRI1 IRI2 ...) .

Describing precedences[edit]

:Precedence is an RDF-S class, whose members are RDF-S classes.

It is recommended (but not required) that precedences are members of :Precedence class.

 <http://example.org/precedences/macro>
   a :Precedence
   rdfs:subClassOf <...> ;
   :higherThan <...> ;
   :higherThan <...> ;
   :lowerThan <...> .

The predicates :higherThan and :lowerThan can apply to precedences.

The following rules are used to deduce which entities have “higher than” precedence relative an other entity:

  • If :higherThan parameter is specified inside a :Precedence description then the described entity is of higher precedence than the referred to entity.
  • If :lowerThan parameter is specified inside a :Precedence description then the referred to entity is of higher precedence than the described entity.
  • If an class A has higher precedence than an entity B and the entity B has higher precedence than an class C, then the class A has higher precedence than the class C.

The entities are related by “higher than” relation if and only if this relation can be deduced from the above rules (for all currently loaded RDF resources).

If a circle of precedences is encountered this is a fatal error.

A precedence is singleton when it is declared to be a member of :Singleton class as in the following example:

 <http://example.org/MyPrecedence> a :Singleton .