WikiJournal Preprints/Aggregation of scholarly publications and extracted knowledge on COVID19 and epidemics

This project aims to use modern tools, especially Wikidata (and Wikpedia), R, Java, textmining, with semantic tools to create a modern integrated resource of all current published information on viruses and their epidemics. It relies on collaboration and gifts of labour and knowledge.

Availability of resources

All code available at github.com/petermr/openVirus
Open notebook as supplementary material 1

Background

The world faces (and will continue to face) viral epdemics which arise suddenly and where scientific/medical knowledge is a critical resource. Despite over 100 Billion USD on medical research worldwide much knowledge is behind publisher paywalls and only available to rich universities. Moreover it is usually badly published, dispersed without coherent knowledge tools. It particularly disadvantages the Global South. This project aims to use modern tools, especially Wikidata (and Wikpedia), R, Java, textmining, with semantic tools to create a modern integrated resource of all current published information on viruses and their epidemics. It relies on collaboration and gifts of labour and knowledge.

Goals

to collect all freely visible scientific/medical publications on COVID19, viral epidemics and transform them to uniform form.
to use Natural Language Processing (NLP) and textmining so machines can extract meaning from the articles.
to build dictionaries of terms related to viruses and viral epidemics for (a) search (b) classification (c) understanding.
to collect knowedge and publish it in WikiJournal of Medicine (a peer-reviewed OA journal with an emphasis on review)

Results

Discussion

Methods

Contributor framework

This is a digital knowledge-based project (i.e. no laboratory or clinical work). It is open to all who are prepared to contribute components of the system.

Some examples of the skills and knowledge required within the project:

Wikimedia (esp. Wikipedia, Wikidata, Wiki technology, WikiJournal)
Scholarly publications including preprints
Scraping web pages and building metadata
SPARQL/RDF , XML, JSON
Textmining , supervised and unsupervised
Virology
Epidemiology
Computation
Societal aspects of disease (e.g. public health policy).
Language translation (with a scientific emphasis)
Git and Github
Open collaborative projects

Our initial framework is based on simple dictionaries and ontologies (e.f. RDF, XML), public sources of scientific articles (especially preprints and country-specific inclusivity (e.g. Latin America , Redalyc, SciELO)). Current software is mainly Java, R, Node, Python but as the data are exposed as text files a variety of tools can be used).

Work organisation

We will list tasks on github.com/petermr/openVirus/issues. These are things we have to do including components, integration, bugs, tutorials, etc. There may soon be a large number of "Open" Issues - this should be seen as positive - some issues are ongoing and don't get closed.

Open Notebook publication

We are using the Open Notebook philosophy of Jean-Claude Bradley and implicitly of Wikimedia content and of many Free/Open Software projects. Everything is posted publicly as soon as it is created. That means that every iteration is visible and will almost certainly contain bugs/errors. Each subsequent commit fixes some of these. We know from past experience that this is the quickest way to create high-quality content and also gives a feeling of communal ownership.

All code available at github.com/petermr/openVirus
Open notebook as supplementary material 1

Additional information

Acknowledgements

We thank the developers of all [R] packages used as part of this project. Additional thanks to the wikidata WikiProject COVID-19, WikiProject Medicine, and WikiCite. We are also grateful to the broad community of Wikidata contributors, administrators, and developers.

Competing interests

Any conflicts of interest that you would like to declare. Otherwise, a statement that the authors have no competing interest.

Ethics statement

An ethics statement, if appropriate, on any animal or human research performed should be included here or in the methods section.

Availability of resources

All code available at github.com/petermr/openVirus
Open notebook as supplementary material 1

References

[Shafee-1] La Trobe University

[Murray-Rust-2] University of Cambridge

[3] 286@cam.ac.uk

[a]

[b]

[i]