User:OpenScientist/Open grant writing/Wissenswert 2011/Documentation

From Wikiversity
Jump to: navigation, search

This page hosts information about the project as it proceeds.

Contents

[edit] Architecture

Translated from the German project proposal.

The Open Access Media Importer for Wikimedia Commons is designed in a modular fashion. This shall facilitate the addition of new media types, resources or output formats. In general, every component takes an item out of a queue, processes it and puts the data into the queue of the next component. We envisage all components to run on the same server.

  • The Crawler/ Scraper scans a list of Open Access resources for new articles with multimedia files (particularly video but also audio). This can be achieved via a search API (if available, e.g. at bei PLoS), a local search (PLoS example) or Google (example: PLoS ONE). For each matching article, the URLs and metadata (creator, description, licensing, original article etc.) of the media files are extracted and stored locally.
  • The Downloader downloads these media files and saves them locally.
  • The Transcoder converts the media files into Commons-compatible open formats (mainly Ogg Theora, Ogg Vorbis) and includes the metadata into the resulting files.
  • The Categorizer analyses the files and their metadata and suggests Commons categories that may be suitable.
  • The Review Tool allows the user to check image and sound quality, licensing, metadata and categories and to fix any errors before the upload is approved.
  • The Uploader uploads the files along with metadata and categories.
  • The configuration of the component shall be performed via a protected wiki page, so as to allow also non-programmers to add new resources for the tool to work on.
  • Before processing a file, all components check whether it has already been processed or even uploaded to Wikimedia Commons. In such cases, work on the file is skipped by default.

[edit] To Do

[edit] Wiki

  • add todos,
  • milestones

[edit] Code

  • set up server/accounts for crawler
done

[edit] Discuss

  • RW: Should we add the license and source to each video's metadata?
I'd say yes. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
  • RW: Should we inform the corresponding author about the import to WM Commons?
I don't think so, though I have informed authors in the past about actual use of "their" files. --Daniel Mietchen 21:55, 20 February 2012 (UTC)

[edit] Source Code

  • Source code is hosted on GitHub.
  • All source code is licensed under the terms of the GPL v3

[edit] Links

[edit] About

[edit] OA Repositories

[edit] Tools

[edit] Blog posts

Personal tools

Variants
Actions
Navigation
Community
Toolbox
Wikimedia projects
Print/export