User:OpenScientist/Open grant writing/Wissenswert 2011/Documentation
From Wikiversity
This page hosts information about the project as it proceeds.
Contents |
Architecture [edit]
Translated from the German project proposal.
The Open Access Media Importer for Wikimedia Commons is designed in a modular fashion. This shall facilitate the addition of new media types, resources or output formats. In general, every component takes an item out of a queue, processes it and puts the data into the queue of the next component. We envisage all components to run on the same server.
- The Crawler/ Scraper scans a list of Open Access resources for new articles with multimedia files (particularly video but also audio). This can be achieved via a search API (if available, e.g. at PLoS), a local search (PLoS example) or Google (example: PLoS ONE). For each matching article, the URLs and metadata (creator, description, licensing, original article etc.) of the media files are extracted and stored locally.
- The Downloader downloads these media files and saves them locally.
- The Transcoder converts the media files into Commons-compatible open formats (mainly Ogg Theora, Ogg Vorbis) and includes the metadata into the resulting files.
- The Categorizer analyses the files and their metadata and suggests Commons categories that may be suitable.
- The Review Tool allows the user to check image and sound quality, licensing, metadata and categories and to fix any errors before the upload is approved.
- The Uploader uploads the files along with metadata and categories.
- The configuration of the component shall be performed via a protected wiki page, so as to allow also non-programmers to add new resources for the tool to work on.
- Before processing a file, all components check whether it has already been processed or even uploaded to Wikimedia Commons. In such cases, work on the file is skipped by default.
To Do [edit]
This list is to be updated as we move forward. Before making changes here, please check the most recent uploads whether the proposed change has indeed been implemented.
Wiki [edit]
- update documentation,
- add milestones
- PMCID does not display in commons:Cite journal.
Done
- Cite doi template formatting from enwp does not work on Commons
Code [edit]
- Bug tracking moved to https://github.com/erlehmann/open-access-media-importer/issues
Discuss [edit]
- RW: Should we add the license and source to each video's metadata?
Done
- I'd say yes. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
- RW: Should we inform the corresponding author about the import to WM Commons?
- I don't think so, though I have informed authors in the past about actual use of "their" files. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
- I don't understand the purpose of having three different commands (oa-get, oa-put, and oa-cache), each of which takes as its first argument a longish subcommand (download-metadata, etc.). I would vote either to combine all of these into one command (maybe, "oami"), or to abbreviate the subcommands, or maybe even both. Klortho (talk) 23:14, 3 June 2012 (UTC)
- We do not vote here. If you are willing to submit a patch to overhaul the option parsing system (maybe using python-opster?), feel free to do so.
Source Code [edit]
- Source code is hosted on GitHub.
- All source code is licensed under the terms of the GPL v3
Links [edit]
About [edit]
OA Repositories [edit]
- Directory of Open Access Journals
- PubMedCentral: Open Access Subset documentation
- PubMedCentral: Search results for supplementary videos in OA articles
- Example of an article with multiple videos
Tools [edit]
Blog posts [edit]
Blog posts are on the Wikimedian in Residence blog. Here is a list of posts by this category. Some specific posts are:
- December 15, 2011: Announcement and overview
- January 18, 2012: Roadmap and crawler
- March 10, 2012: Frontend
- March 29, 2012: Plugging in your own data source
- April 30, 2012: Encoding
- September 30, 2012: Open Access Media Importer: Presentation at WMDE, Collaborative Coding