Automatic transformation of XML namespaces/Recursive downloading the information
Below there is a description of priority queue object.
We maintain a list of elements. With each list element are associated: priority (an integer).
The following methods are available:
- append(element, priority) - adds an element to the end of the list
- pop_front() - remove and return the first of elements with maximum priority; throw an exception if all elements are disabled (or the list is empty)
- pop_back() - remove and return the last of elements with maximum priority; throw an exception if all elements are disabled (or the list is empty)
A traversal algorithm is an object with two methods: put(obj) to add an object to the set of "discovered" objects and get() which either return the next object or signals an exception if there are no more such object.
Modified breadth-first search
It is the breadth-first search with priority queue instead of queue.
Modified depth-first search
It is the depth-first search with priority queue instead of stack.
When we speak about downloading next RDF file we mean downloading the next RDF file in the depth-first or breadth-first order (as specified by user options).
The tree of RDF files (which we traverse in the depth-first or breadth-first order) is defined as the following:
The branches of a node (RDF file) are
- the current XML document namespaces in order of first occurrence in document order; [TODO: We may consider more sophisticated loading order (based on precedences).]
- at user option these branches which correspond to current sources and/or targets (if so specified by user options);
- these RDF files which are linked through rdf:seeAlso in the order of the list.
So, we may load first “sources” and “targets” and only then "see also". Rationale: our purpose to find a connection between sources and targets, while following "see also" may "lead us astray" from our purpose. (For example it could lead to an infinite loop despite the task is accomplishable with loading "sources" and "targets".)
Upon reading an RDF file, the list of the namespaces, the list of transformations, and the list of relations between precedences should be updated, accordingly the grammar described above.
Implementation note: If our task is validation, updating the list of transformations and precedences is not necessary. If our task is transformation, updating the list of namespaces is not necessary.
Note: Retrieving some existing documents designated by namespace URLs gets an RDF Scheme. Examples: http://www.w3.org/1999/02/22-rdf-syntax-ns# and http://purl.org/dc/elements/1.1/. In this case :namespaceInfo in the RDFS points to a namespace information which should be downloaded instead.
As an other alternative, the RDF file pointed by the namespace URL may contain namespace and transformer info.
Several namespaces may refer to one RDF due using # in URLs.
Backward compatible feature: If at the URL we download from there is an RDDL document, download the RDF specified as an RDDL resource with xlink:role="http://portonvictor.org/ns/trans/". (This is not done recursively, that it an RDDL document linked from an other RDDL document is not downloaded.)
If a valid RDF document was not retrieved from a URL (say because of a 404 HTTP error or because invalid XML code), it should not be retrieved again (during the pipeline). However it is not forbidden to repeat failed because of timeout HTTP attempts.
Recursive retrieval may be limited to certain URLs. For example, it may be limited only to file:// URLs.