User:OpenScientist/Open grant writing/Wissenswert 2011/Documentation
This page hosts information about the project as it proceeds.
Translated from the German project proposal.
The Open Access Media Importer for Wikimedia Commons is designed in a modular fashion. This shall facilitate the addition of new media types, resources or output formats. In general, every component takes an item out of a queue, processes it and puts the data into the queue of the next component. We envisage all components to run on the same server.
- The Crawler/ Scraper scans a list of Open Access resources for new articles with multimedia files (particularly video but also audio). This can be achieved via a search API (if available, e.g. at PLoS), a local search (PLoS example) or Google (example: PLoS ONE). For each matching article, the URLs and metadata (creator, description, licensing, original article etc.) of the media files are extracted and stored locally.
- The Downloader downloads these media files and saves them locally.
- The Transcoder converts the media files into Commons-compatible open formats (mainly Ogg Theora, Ogg Vorbis) and includes the metadata into the resulting files.
- The Categorizer analyses the files and their metadata and suggests Commons categories that may be suitable.
- The Review Tool allows the user to check image and sound quality, licensing, metadata and categories and to fix any errors before the upload is approved.
- The Uploader uploads the files along with metadata and categories.
- The configuration of the component shall be performed via a protected wiki page, so as to allow also non-programmers to add new resources for the tool to work on.
- Before processing a file, all components check whether it has already been processed or even uploaded to Wikimedia Commons. In such cases, work on the file is skipped by default.
To Do 
This list is to be updated as we move forward. Before making changes here, please check the most recent uploads whether the proposed change has indeed been implemented.
- update documentation,
- add milestones
- PMCID does not display in commons:Cite journal. Done
- Cite doi template formatting from enwp does not work on Commons
- Bug tracking moved to https://github.com/erlehmann/open-access-media-importer/issues
- RW: Should we add the license and source to each video's metadata? Done
- I'd say yes. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
- RW: Should we inform the corresponding author about the import to WM Commons?
- I don't think so, though I have informed authors in the past about actual use of "their" files. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
- I don't understand the purpose of having three different commands (oa-get, oa-put, and oa-cache), each of which takes as its first argument a longish subcommand (download-metadata, etc.). I would vote either to combine all of these into one command (maybe, "oami"), or to abbreviate the subcommands, or maybe even both. Klortho (talk) 23:14, 3 June 2012 (UTC)
- We do not vote here. If you are willing to submit a patch to overhaul the option parsing system (maybe using python-opster?), feel free to do so.
Source Code 
- Source code is hosted on GitHub.
- All source code is licensed under the terms of the GPL v3
OA Repositories 
- Directory of Open Access Journals
- PubMedCentral: Open Access Subset documentation
- PubMedCentral: Search results for supplementary videos in OA articles
- Example of an article with multiple videos