Research in programming Wikidata/Artolela-web

From Wikiversity
Jump to navigation Jump to search

Artolela-web is an educational game, the aim of which is to improve language skills. Artolela-web helps users in learning languages and allows to expand the outlook, as during the game the user gets acquainted with the masterpieces of painting.

About application[edit]

Picture "Children with a bird and a cat"
Artist:Eglon van der Neer
Date:1680s

Artolela-web is a web application based on a collection of pictures from the Wikimedia Commons and their description on the Wikidata site, which includes the names of pictures in English, Russian and German. The choice of languages ​​was based on the preferences of the program's developers. In another version of the game the other set of languages could be selected, as the Wikimedia contains descriptions of pictures in many languages, for example, Italian, Bashkir, Ukrainian.

Consider the picture: Children with a bird and a cat. The image file is stored on the Wikimedia Commons. The Wikidata Database contains a description of the picture: the name of the picture in Dutch and its translations into Russian, English and other languages, location, artist, size of the picture (picture's width and height) and more. In this application only the picture names in Russian, English and German were used.

The application makes the opportunity to choose languages and the number of levels, to look through correct and incorrect answers and the nnumber of correct answers. In the future, new languages ​​will be added, new pictures will be added to database from the Wikimedia Commons, registration in the game will be added and statistics of the user will be saved. A student of the Institute of Mathematics and Information Technologies of Petrozavodsk State University Yulia Ipatova is working on this project.

The development languages is PHP, CSS and Javascript.

The interface language is English.

Application description[edit]

The application is designed for testing the language skills in the game form.

Game content[edit]

  1. At the beginning of the game the user selects two languages (the first language for the picture name and the second language for possible answers) and a level (the number of pictures, whose names the user needs to guess).
  2. Then a picture and four variants of the picture name in the first language are shown to the user. The picture name in the first language is hidden from the user.
  3. The user should select the correct variant of the picture name in the first language.

Game script[edit]

  1. The user select two languages.
  2. The user select the number of levels.
  3. The user click the button "START".
  4. The user is shown the picture, four possible answers and the picture link on the Wikimedia Commons.
  5. The user select one of the possible answers.
  6. The user click the button "NEXT".
  7. After the user has passed all levels, he sees his mark (the ratio of the correct answers to the total number of questions) on the screen.
  8. If the user click the button "NEW GAME", points 1-8 are repeated.

Notes[edit]

  1. For 2017, the application supports three languages (Russian, English, German). The number pictures-objects in the Wikidata at the same time having the label in:
    • Russian and English is 3035,
    • English and German is 5012,
    • Russian and German is 724,
    at the same time
  2. If the user click the picture during the game, he sees its full version.
  3. If the user click the link "W" during the game, he moves to the picture page on the Wikimedia Commons.
  4. If the user click the link "About" during the game, he sees the game description.
  5. If the user click the link "Look answers" at the end of the game, he sees the list of all user answers. All levels are shown on the page. There are the picture, its name in the first language, possible answers in the second language, where the correct answer is green and wrong answer is red.

Application architecture[edit]

Database schema for the application Artolela-web

The application consists of 5 basic forms and the database, which includes three tables.

Database[edit]

The database contains three unrelated tables.

  • The table en_de contains the data of the pictures, which has the label in English and German languages on Wikidata at the same time. The table includes five atributes:
    • the picture ID (EDid),
    • the picture name in English (EDlabel_en),
    • the picture name in German (EDlabel_de),
    • the image filename on Wikimedia Commons (EDp_name),
    • the link to the picture on Wikimedia Commons (EDurl).
  • The table en_ru contains the data of the pictures, which has the label in English and Russian languages on Wikidata at the same time. The table includes five atributes:
    • the picture ID (ERid),
    • the picture name in English (ERlabel_en),
    • the picture name in Russian (ERlabel_ru),
    • the image filename on Wikimedia Commons (ERp_name),
    • the link to the picture on Wikimedia Commons (ERurl).
  • The table ru_de contains the data of the pictures, which has the label in Russian and German languages on Wikidata at the same time. The table includes five atributes:
    • the picture ID (RDid),
    • the picture name in Russain (RDlabel_ru),
    • the picture name in German (RDlabel_de),
    • the image filename on Wikimedia Commons (RDp_name),
    • the link to the picture on Wikimedia Commons (RDurl).
The picture "Starry Night"
Artist: Vincent Willem van Gogh
Date: 1889

The example of the table en_de for the picture «Starry Night»:

EDid 56
EDlabel_en The Starry Night
EDlabel_de Sternennacht
EDp_name Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg
EDurl http://commons.wikimedia.org/wiki/Special:FilePath/Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg

The example of the table en_ru for the picture «Starry Night»:

ERid 122
ERlabel_en The Starry Night
ERlabel_ru Звездная ночь
ERp_name Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg
ERurl http://commons.wikimedia.org/wiki/Special:FilePath/Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg

The example of the table ru_de for the picture «Starry Night»:

RDid 72
RDlabel_ru Звездная ночь
RDlabel_de Sternennacht
RDp_name Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg
RDurl http://commons.wikimedia.org/wiki/Special:FilePath/Van%20Gogh%20-%20Starry%20Night%20-%20Google%20Art%20Project.jpg

Note 1: Tables are redundant because url = concat('http://commons.wikimedia.org/wiki/Special:FilePath/', StringEscape(p_name)). This redundancy speeds up the application because we get the data from the database, and we do not need further process of this data and spend time on this process.

Note 2: Before entering the URL in database, you need to escape signs "%".

There are two approaches to create a structure of this database:

  • to create one table that would store all pictures that have a label at least in two of three laguages. If the label in one of the languages misses for a particular picture, then the corresponding field in the table would remain blank;
  • to create three tables for each pair of languages.

The first approach allows to get rid of redundancy in the database, and significantly reduce its volume. However, requests to the database require more time, since the developer have to add a condition to search only those records that contain a label in the first and second languages.

The second approach leads to redundancy in the database, since if the picture has the label in all three language, it repeats in all tables. However, in this case, requests to the database require less time, since there is no need to add a condition to search only those records that contain the label in first and second languages.

Within the framework of this application, it was decided to create three different tables to speed up the word with the database and, accordingly, to accelerate the application.

Forms[edit]

The opening form of the application
The description form of the application
The game form of the application
The result form of the application
The answer form of the application

The application consists of 5 forms:

  1. The opening form of the application. This form sets the application preferences (languages and the number of levels). Elements of the form are:
    • two blocks, each of which consists of three radiobuttons for selecting languages,
    • one slider for selecting the number of levels,
    • one button to begin the game,
    • one block containing a link to the description of the game.
    Note 1: It is necessary to check that the user has chosen different languages as the native language and the foreign language. If the user has chosen the same languages, the user will see the appropriate message and the application will automatically select another foreign language.
    Note 2: The user may select from 1 to 50 levels. The step of the scale is one picture.
  2. The description form of the application. This form has the detailed description of this game. Elements of the form are:
    • the text block for displaying the game description,
    • one button for moving to the opening form.
  3. The game form for application is the main screen of the game. Elements of the form are:
    • one text block for displaying the number of the level,
    • one block for displaying the picture,
    • one block with the link to the picture on the Wikimedia Common as "W" letter,
    • one block, which contains four radiobuttons for displaying possible answers,
    • one button for moving to the next level,
    • one button for starting the game again.
    Note: If the picture can not be downloaded from the server, a blank image will be displayed instead of it.
  4. The result form of application. Elements of the form are:
    • one text block for displaying the results of the game,
    • one button for moving to the opening form,
    • one block, which contains the link for viewing correct and wrong answers.
  5. The answer form of application. This form shows all user answers. Elements of the form are:
    • the text block for displaying the number of the level,
    • one block for displaying the picture,
    • the text block for displaying the picture name,
    • one block containing the list of possible answers,
    • one button for moving to the result form.
    All elements, without the button, repeat as many times as the levels were chosen.

Random selection[edit]

It was necessary to generate the IDs of the picture's data and the answers which we get from the database. For this purpose, the function rand(int $min, int$ max)[1] of the language PHP was used. $min is the smallest value that can be returned and $max is the largest value that can be returned. This function returns a pseudo-random number in the range from $min to $max. The parameter $min is 1, since the numeration of strings in the database starts at 1; the parameter $max is the number of pictures for chosen languages.

Pictures preparation[edit]

The application uses pictures that are stored on the server. Before uploading images to the server, they need to be downloaded and optimized.

SPARQL[edit]

Picture's description, which used in the application, are got from the Wikidata. Let's consider an example of filling the table en_ru. The picture is saved in the application database only if it has a label field in English and Russian and if there is a link to the image on the Wikimedia Commons (image (P18)). To get all the pictures that satisfy these conditions, a SPARQL-скрипт was used, thanks to which 3035 entries were received.

#List of pictures in Russian and English  having property "image" 
SELECT ?picture ?label_ru ?label_en ?image
WHERE
{
  #Instance of painting
  ?picture wdt:P31 wd:Q3305213.

  #Pictures with label in Russian
  ?picture rdfs:label ?label_ru filter (lang(?label_ru) = "ru").

  #Pictures with label in English
  ?picture rdfs:label ?label_en filter (lang(?label_en) = "en").

  #Pictures with link to Wikimedia
  ?picture wdt:P18 ?image.
}

Get links[edit]

After the list of pictures is received, it is necessary to receive links to files with images of these pictures. To do this, the SPARQL script was exported to a JSON file using standard SPARQL query tools.

Upload images[edit]

To download a large number of files, the Download Master program was used. This program was chosen because it has the function of downloading files by importing a list of URLs. Another advantage of Download Master is the ability to export a list of downloads containing links to images stored on the Wikimedia Commons and the names of the downloaded files into an XML file.

Note 1: Using Download Master, it was possible to extract links to pictures from the JSON file without its preliminary processing.

Note 2: Since there was a lot links to the pictures, there were problems with downloading them by Download Master. The main problem is the speed of reading the file by the program. While the file is being read, the computer may hang. If the computer's power is not enough, you need to divide the JSON file into several files. This can be done manually or with the help of several SPARQL scripts and keywords LIMIT and OFFSET. The keyword LIMIT specifies the maximum value of the decisions that will result (that is, the maximum number of rows). The keyword OFFSET allows to not show the first n decisions as a result.

Compress images[edit]

After loading images, the total size of images that have a label at the same time:

  • in English and Russian, was 14,9 Gb (average size of one picture was 5 Mb),
  • in English and German, was 20,0 Gb (average size of one picture was 4 Mb),
  • in Russian and German, was 7,23 Gb (average size of one picture was 10 Mb).

Pictures of a such large size can be downloaded from the Internet very slowly, because of which the user may have a feeling that the application "hangs". Therefore, it was decided to compress pictures to an acceptable size. To compress images FILEminimizer Suite[2] was used, since it allows compressing images in batch mode and provides a good percentage of compression[3] (the average percentage of compression is more than 90%). After compression, the total size of images that have a label at the same time:

  • in English and Russian, is 737 Mb (average size of one picture is 248 Kb),
  • in English and German, is 1,02 Gb (average size of one picture is 213 Kb),
  • in Russian and German, is 177 Mb (average size of one picture is 250 Kb).

Hosting[edit]

After compression, images were uploaded to a remote server.

Installation[edit]

Configuration[edit]

  • The PHP version: PHP version 5.4 and more
  • The Apache version: Apache 2.4 and more
  • The database model: MySQL 4.0 and more.

Application download[edit]

Download the project Artolela-web from the GitHub by the archive. Unzip it, e.g. to:

D:\artolela-web-master

Then you install the application on your own server. From your own server, you must delete the following folders:

  • json,
  • python,
  • sql.

Database installation[edit]

Open the page "Databases"
Open the page "PHPMYADMIN"
Crossing to the tab "Import"

Database creation[edit]

Open "Databases" on your server.

Create your own database, e.g.

pictures

An example of installing a database under Unix[edit]

mysqladmin -p -uroot create DATABASE_NAME
mysql -p -uroot mysql

mysql> CREATE USER 'USER_NAME'@'localhost' IDENTIFIED BY 'your_password';
mysql> GRANT select ON DATABASE_NAME.* TO USER_NAME@'localhost';
mysql> FLUSH PRIVILEGES;

mysql> use DATABASE_NAME 
mysql> charset binary
mysql> source .sql/pictures.sql

PhpMyAdmin[edit]

Open your server control panel. Open PHPMYADMIN control panel.

Log in PHPMYADMIN, by entering your server login and password.

Data download[edit]

Select your database in PhpMyAdmin left panel. Open the tab "Import" in PhpMyAdmin upper panel.

Select file "sql/pictures.sql" in the directory, in which you unzip this project and click "Next". After some time this file will be processed, and the necessary tables appear in the database with the corresponding records.

File config.php[edit]

Open directory in which you unzip this project. Select file "config.php". Change this file, writig in it the server login login, server password, and name of database.

Results[edit]

Source code of the application is available at GitHub in the repository Artolela-web.

References[edit]

External links[edit]