Automatic transformation of XML namespaces/Recursive downloading the information

From Wikiversity
Jump to: navigation, search

When we speak about downloading next RDF file we mean downloading the next RDF file in the depth-first or breadth-first order (as specified by user options).

The tree of RDF files (which we traverse in the depth-first or breadth-first order) is defined as the following:

The branches of a node (RDF file) are these RDF files which are linked through rdf:seeAlso in the order of the list, but (using a priority queue instead of a regular queue) at user option these branches which correspond to current sources and/or targets are processed before (if so specified by user options) the rest of branches. (see https://softwareengineering.stackexchange.com/a/358937/45576).

So, we may load first “sources” and “targets” and only then "see also". Rationale: our purpose to find a connection between sources and targets, while following "see also" may "lead us astray" from our purpose. (For example it could lead to an infinite loop despite the task is accomplishable with loading "sources" and "targets".)

Upon reading an RDF file, the list of the namespaces, the list of transformations, and the list of relations between precedences should be updated, accordingly the grammar described above.

Implementation note: If our task is validation, updating the list of transformations and precedences is not necessary. If our task is transformation, updating the list of namespaces is not necessary.

Note: Retrieving some existing documents designated by namespace URLs gets an RDF Scheme. Examples: http://www.w3.org/1999/02/22-rdf-syntax-ns# and http://purl.org/dc/elements/1.1/. In this case :namespaceInfo in the RDFS points to a namespace information which should be downloaded instead.

As an other alternative, the RDF file pointed by the namespace URL may contain namespace and transformer info.

Several namespaces may refer to one RDF due using # in URLs.

Backward compatible feature: If at the URL we download from there is an RDDL document, download the RDF specified as an RDDL resource with xlink:role="http://portonvictor.org/ns/trans/". (This is not done recursively, that it an RDDL document linked from an other RDDL document is not downloaded.)

If a valid RDF document was not retrieved from a URL (say because of a 404 HTTP error or because invalid XML code), it should not be retrieved again (during the pipeline). However it is not forbidden to repeat failed because of timeout HTTP attempts.

Recursive retrieval may be limited to certain URLs. For example, it may be limited only to file:// URLs.