Automatic transformation of XML namespaces/Recursive downloading the information

From Wikiversity
Jump to: navigation, search

TODO: We should support asynchronous downloading of several files at once. Or shall we (for simplicity) download strictly in sequence?

TODO: We should create an ordered list of referenced RDF resources.

When we speak about downloading next RDF file we mean downloading the next RDF file in the depth-first or breadth-first order (as specified by user options). The order of edges which point to the next URL for the downloading algorithms are: first “see also”, “sources”, “targets” (in the order specified by the user options) and then the order of their start nodes in the RDF file.

Upon reading an RDF file, the list of the namespaces, the list of transformations, and the list of relations between precedences should be updated, accordingly the grammar described above.

Implementation note: If our task is validation, updating the list of transformations and precedences is not necessary. If our task is transformation, updating the list of namespaces is not necessary.

Note: Retrieving some existing documents designated by namespace URLs gets an RDF Scheme. Examples: http://www.w3.org/1999/02/22-rdf-syntax-ns# and http://purl.org/dc/elements/1.1/. In this case :namespaceInfo in the RDFS points to a namespace information which should be downloaded instead.

As an other alternative, the RDF file pointed by the namespace URL may contain namespace and transformer info.

Several namespaces may refer to one RDF due using # in URLs.

Backward compatible feature: If at the URL we download from there is an RDDL document, download the RDF specified as an RDDL resource with xlink:role="http://portonvictor.org/ns/trans/". (This is not done recursively, that it an RDDL document linked from an other RDDL document is not downloaded.)

If a valid RDF document was not retrieved from a URL (say because of a 404 HTTP error or because invalid XML code), it should not be retrieved again. However it is not forbidden to repeat failed because of timeout HTTP attempts.

Recursive retrieval may be limited to certain URLs. For example, it may be limited only to file:// URLs.