User:OpenScientist/Open grant writing/Wissenswert 2011/Documentation

From Wikiversity
Jump to navigation Jump to search

This page hosts information about the project in its initial stages. We have since moved the codebase to GitHub entirely, while the bot now runs on Wikimedia Commons.


Translated from the German project proposal.

The Open Access Media Importer for Wikimedia Commons is designed in a modular fashion. This shall facilitate the addition of new media types, resources or output formats. In general, every component takes an item out of a queue, processes it and puts the data into the queue of the next component. We envisage all components to run on the same server.

  • The Crawler/ Scraper scans a list of Open Access resources for new articles with multimedia files (particularly video but also audio). This can be achieved via a search API (if available, e.g. at PLoS), a local search (PLoS example) or Google (example: PLoS ONE). For each matching article, the URLs and metadata (creator, description, licensing, original article etc.) of the media files are extracted and stored locally.
  • The Downloader downloads these media files and saves them locally.
  • The Transcoder converts the media files into Commons-compatible open formats (mainly Ogg Theora, Ogg Vorbis) and includes the metadata into the resulting files.
  • The Categorizer analyses the files and their metadata and suggests Commons categories that may be suitable.
  • The Review Tool allows the user to check image and sound quality, licensing, metadata and categories and to fix any errors before the upload is approved.
  • The Uploader uploads the files along with metadata and categories.
  • The configuration of the component shall be performed via a protected wiki page, so as to allow also non-programmers to add new resources for the tool to work on.
  • Before processing a file, all components check whether it has already been processed or even uploaded to Wikimedia Commons. In such cases, work on the file is skipped by default.

To Do[edit]

This list is to be updated as we move forward. Before making changes here, please check the most recent uploads whether the proposed change has indeed been implemented.


  • update documentation,
  • add milestones
  • Cite doi template formatting from enwp does not work on Commons



  • RW: Should we add the license and source to each video's metadata? YesY Done
I'd say yes. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
  • RW: Should we inform the corresponding author about the import to WM Commons?
I don't think so, though I have informed authors in the past about actual use of "their" files. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
A single e-mail, with an opt-in for further notifications, and a link to an author's listed Commons contributions, seems minimally intrusive. It might also be useful, under "community outreach" or "broader impact" in a grant or tenure application. We'd rely on the corresponding author to forward the e-mail to interested co-authors, but we could also offer opt-in notifications (#icanhazwikicommonsnotifications?). We'd have to code each author with their e-mail address, has-been-notified-once and opted-in, so this would take some work, and may not be worth it. But I doubt anyone will object to one e-mail per e-mail address, ever, except on request. HLHJ (discusscontribs) 16:11, 15 June 2019 (UTC)
  • I don't understand the purpose of having three different commands (oa-get, oa-put, and oa-cache), each of which takes as its first argument a longish subcommand (download-metadata, etc.). I would vote either to combine all of these into one command (maybe, "oami"), or to abbreviate the subcommands, or maybe even both. Klortho (talk) 23:14, 3 June 2012 (UTC)
We do not vote here. If you are willing to submit a patch to overhaul the option parsing system (maybe using python-opster?), feel free to do so.

Source Code[edit]

  • Source code is hosted on GitHub.
  • All source code is licensed under the terms of the GPL v3



OA Repositories[edit]


Blog posts[edit]

Blog posts are on the Wikimedian in Residence blog. Here is a list of posts by this category. Some specific posts are: