User:OpenScientist/Open grant writing/Wissenswert 2011/Documentation
From Wikiversity
This page hosts information about the project as it proceeds.
Contents |
[edit] Architecture
Translated from the German project proposal.
The Open Access Media Importer for Wikimedia Commons is designed in a modular fashion. This shall facilitate the addition of new media types, resources or output formats. In general, every component takes an item out of a queue, processes it and puts the data into the queue of the next component. We envisage all components to run on the same server.
- The Crawler/ Scraper scans a list of Open Access resources for new articles with multimedia files (particularly video but also audio). This can be achieved via a search API (if available, e.g. at bei PLoS), a local search (PLoS example) or Google (example: PLoS ONE). For each matching article, the URLs and metadata (creator, description, licensing, original article etc.) of the media files are extracted and stored locally.
- The Downloader downloads these media files and saves them locally.
- The Transcoder converts the media files into Commons-compatible open formats (mainly Ogg Theora, Ogg Vorbis) and includes the metadata into the resulting files.
- The Categorizer analyses the files and their metadata and suggests Commons categories that may be suitable.
- The Review Tool allows the user to check image and sound quality, licensing, metadata and categories and to fix any errors before the upload is approved.
- The Uploader uploads the files along with metadata and categories.
- The configuration of the component shall be performed via a protected wiki page, so as to allow also non-programmers to add new resources for the tool to work on.
- Before processing a file, all components check whether it has already been processed or even uploaded to Wikimedia Commons. In such cases, work on the file is skipped by default.
[edit] To Do
[edit] Wiki
- add todos,
- milestones
[edit] Code
- set up server/accounts for crawler
- done
[edit] Discuss
- RW: Should we add the license and source to each video's metadata?
- I'd say yes. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
- RW: Should we inform the corresponding author about the import to WM Commons?
- I don't think so, though I have informed authors in the past about actual use of "their" files. --Daniel Mietchen 21:55, 20 February 2012 (UTC)
[edit] Source Code
- Source code is hosted on GitHub.
- All source code is licensed under the terms of the GPL v3
[edit] Links
[edit] About
[edit] OA Repositories
- Directory of Open Access Journals
- PubMedCentral: Open Access Subset documentation
- PubMedCentral: Search results for supplementary videos in OA articles
- Example of an article with multiple videos
[edit] Tools
[edit] Blog posts
- March 10: Frontend
- January 18: Roadmap and crawler
- December 15: Announcement and overview