User:OpenScientist/Open grant writing/Wissenswert 2011/Documentation

From Wikiversity
Jump to: navigation, search

This page hosts information about the project in its initial stages. We have since moved the codebase to GitHub entirely, while the bot now runs on Wikimedia Commons.

Architecture[edit]

Translated from the German project proposal.

The Open Access Media Importer for Wikimedia Commons is designed in a modular fashion. This shall facilitate the addition of new media types, resources or output formats. In general, every component takes an item out of a queue, processes it and puts the data into the queue of the next component. We envisage all components to run on the same server.

  • The Crawler/ Scraper scans a list of Open Access resources for new articles with multimedia files (particularly video but also audio). This can be achieved via a search API (if available, e.g. at PLoS), a local search (PLoS example) or Google (example: PLoS ONE). For each matching article, the URLs and metadata (creator, description, licensing, original article etc.) of the media files are extracted and stored locally.
  • The Downloader downloads these media files and saves them locally.
  • The Transcoder converts the media files into Commons-compatible open formats (mainly Ogg Theora, Ogg Vorbis) and includes the metadata into the resulting files.
  • The Categorizer analyses the files and their metadata and suggests Commons categories that may be suitable.
  • The Review Tool allows the user to check image and sound quality, licensing, metadata and categories and to fix any errors before the upload is approved.
  • The Uploader uploads the files along with metadata and categories.
  • The configuration of the component shall be performed via a protected wiki page, so as to allow also non-programmers to add new resources for the tool to work on.
  • Before processing a file, all components check whether it has already been processed or even uploaded to Wikimedia Commons. In such cases, work on the file is skipped by default.

To Do[edit]

This list is to be updated as we move forward. Before making changes here, please check the most recent uploads whether the proposed change has indeed been implemented.

Wiki[edit]

  • update documentation,
  • add milestones
  • Cite doi template formatting from enwp does not work on Commons

Code[edit]

Discuss[edit]

  • RW: Should we add the license and source to each video's metadata? YesY Done
I'd say yes. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
  • RW: Should we inform the corresponding author about the import to WM Commons?
I don't think so, though I have informed authors in the past about actual use of "their" files. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
  • I don't understand the purpose of having three different commands (oa-get, oa-put, and oa-cache), each of which takes as its first argument a longish subcommand (download-metadata, etc.). I would vote either to combine all of these into one command (maybe, "oami"), or to abbreviate the subcommands, or maybe even both. Klortho (talk) 23:14, 3 June 2012 (UTC)
We do not vote here. If you are willing to submit a patch to overhaul the option parsing system (maybe using python-opster?), feel free to do so.

Source Code[edit]

  • Source code is hosted on GitHub.
  • All source code is licensed under the terms of the GPL v3

Links[edit]

About[edit]

OA Repositories[edit]

Tools[edit]

Blog posts[edit]

Blog posts are on the Wikimedian in Residence blog. Here is a list of posts by this category. Some specific posts are: