Research in programming Wikidata/Anime

From Wikiversity
Jump to navigation Jump to search

This chapter is dedicated to anime (Q1107) Wikidata object analysis. Using SPARQL queries executed on Wikidata objects of anime type, several tasks were accomplished. These include a list of seiyu (voice actors) and their number of roles, a line chart of seiyu who have acted in one or more anime, a directed graph connecting seiyu and anime they voiced and estimates of the ages of seiyu at the time(s) of voice work.

Anime objects[edit | edit source]

Anime is Japanese animation. It has its own marked visual style, but there are other features that are not so obvious. For instance, anime has a significantly wider variety of genres in comparison to American and European animation — from family and kids’ comedies to dramas, the latter of which are usually depicted with live actors in Western cinema.

Each anime has its own voice actors. From here on we will refer to the Japanese voice actors as seiyu. In Japanese animation the terms seiyu and voice actor are synonymous. The designation title will usually reference certain anime and associated manga (Japanese comics). In general, title is a term that includes various media products, from novels to films, that are of the same name and are based on one or the other.

In order to work with the anime list from Wikidata we need to use the anime object and the instance of property.

Let us retrieve the list of all anime titles, without taking the subclasses into account.

# List of instances of anime
SELECT ?anime ?animeLabel
    ?anime wdt:P31 wd:Q1107. # instance of anime
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en,ja" }

683 results in 2017 and 216 results in 2021.

There are many more anime objects in Wikidata, but they are not instances of anime but of its subclasses, for example, anime series.

Let us execute the following query in order to obtain the list of anime genres and the number of anime that correspond to these genres.

# Select anime and its subclasses with number of titles corresponding to these subclasses
SELECT ?subAnime ?subAnimeLabel (COUNT(?subAnimeInstance) AS ?count) WHERE {
  ?subAnime wdt:P279* wd:Q1107.       # select anime subclass list
  ?subAnimeInstance wdt:P31 ?subAnime # connect titles and their subclasses
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,ja". }
GROUP BY ?subAnime ?subAnimeLabel

This classification of anime by genre is not perfect because it is significantly skewed toward anime television series: among the 4875 anime titles, 2984 are instances of the anime series genre (62.7%). Also, some subclasses correspond not to genres, but to particular anime (e.g. Evangelion).

We can visualize this distribution using Rawgraphs service (Fig. 1).

Fig.1: Anime genres sunburst diagram created with Rawgraphs service (2021)

Let us retrieve the list of all anime titles that are instances of anime subclasses by using the following query:

# List of instances of anime and subclasses of anime
SELECT ?anime ?animeLabel
    ?anime wdt:P31/wdt:P279* wd:Q1107. # instance of anime with subclasses
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en,ja" }

4757 results.

Anime that have the most complete information on Wikidata are Gurren Lagann, Space Battleship Yamato, Project A-Ko.

There are also some anime with many missing properties, including Doraemon, The Animal Conference on the Environment, Assassins Pride.

According to a profiling of Wikidata using ProWD, Fullmetal Alchemist: The Sacred Star of Milos has the biggest amount of properties (24 properties) among all the anime titles in Wikidata.

List of seiyu ordered by their number of roles in anime[edit | edit source]

Naturally, there are multiple characters in anime. Accordingly, different seiyu give voice to them. Most seiyu have taken part in a number of anime, but some have even managed to work on several dozen titles. Talented seiyu are sometimes invited to voice different characters in one anime. Hiroshi Kamiya is one of the most popular seiyu. He has worked on more than 180 anime and earned many awards. Attack on Titan is one of the most famous anime with his participation in which he voiced Captain Levi, one of the main characters.

Let us create a list of seiyu ordered according to the number of anime voiced by them.

# Ordered list of actors (seiyu) according to the number of anime where they took part in.
SELECT ?seiyu ?seiyuLabel (COUNT(?anime) AS ?count)
  ?anime wdt:P31/wdt:P279* wd:Q1107;	 # instance of anime or its subclasses
         wdt:P725 ?seiyu. 	             # instance of seiyu (voice actor)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,ja" }
GROUP BY ?seiyu	?seiyuLabel	    # group by seiyu 
ORDER BY DESC(?count)	# order by count of voiced anime

SPARQL query, 148 results (2017) and 2910 results (2021)

Line chart of number of seiyu who worked on one or more anime[edit | edit source]

We can create a line chart with seiyu plotted according to their total number of roles. The more anime seiyu have voiced, the farther to the right they are on the chart. We can use the following query to create the chart.

# Histogram of the number of seiyu who acted in one or more anime
#defaultView:LineChart                                            # use line chart as result representation
SELECT ?haveseiyu (COUNT(?haveseiyu) AS ?quantity) WHERE {        # count the number of seiyu that have a voice acting
     SELECT (COUNT(?seiyu) AS ?haveseiyu) WHERE {             # count quantity of voice acting
       ?anime wdt:P31/wdt:P279* wd:Q1107;                     # instance of anime and its subclasses
              wdt:P725 ?seiyu.                                # instance of seiyu
       SERVICE wikibase:label { bd:serviceParam wikibase:language "en,ja"}
     GROUP BY ?anime             # group list by number of voiced anime
     ORDER BY DESC(?haveseiyu)   # order by seiyu quantity (descending)
GROUP BY ?haveseiyu              # group anime by seiyu quantity
ORDER BY DESC(?haveseiyu)        # order by seiyu quantity (descending)

SPARQL-query, 13 results (2017) and 58 results (2021).

Figure 2 shows that the higher the number of voiced anime is, the lower the number of seiyu who attain so many roles. Line 4 of query above sets the limit at 71 anime as there are only a few seiyu who have worked on a larger number of anime, and expanding the line chart farther to the right would not be informative.

As Figure 2 shows, most seiyu have voiced only one anime during their life. On the chart, there are 254 such seiyu. However, seiyu is a profession to which people often devote their lives. The fact that many voiced only one role according to Wikidata seems to be a result of the incompleteness of the data set.

Fig. 2: Line chart that shows number of anime voiced by different seiyu (2021)

Directed graph that connects seiyu to anime they have voiced[edit | edit source]

Most of the seiyu give voice to multiple characters from different anime. Let us create a directed graph that connects seiyu to anime they have voiced using the following query.

# Graph of seiyu and anime they took part in
SELECT DISTINCT ?item ?itemLabel ?rgb ?link
{ # voice actors (seiyu) with more than one anime
  VALUES ?toggle { true false }
  VALUES ?seiyu { wd:Q1207010 wd:Q233902  wd:Q1323728 }
  ?anime  wdt:P31/wdt:P279* wd:Q1107; # instance of anime or its subclass
          wdt:P725 ?seiyu;            # seiyu who voiced this anime 
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en,ja"}
  BIND(IF(?toggle,?anime,?seiyu) AS ?item).
  BIND(IF(?toggle,?animeLabel,?seiyuLabel) AS ?itemLabel).
  BIND(IF(?toggle,"FFFFFF","7FFF00") AS ?rgb).
  BIND(IF(?toggle,"",?anime) AS ?link).

SPARQL-query, 826 results (2017) and 496 results (2021).

The ?seiyu variable (line 7) contains an array of Wikidata objects that correspond to several seiyu including Bin Shimada and others. We picked only three seiyu for illustrative purposes as a graph including more seiyu would be unwieldy to read.

The BIND(IF(?toggle, ?anime, ?seiyu)) construction in line 11 determines the graph node type: if ?toggle is true, then the node corresponds to anime, and seiyu otherwise. The item label and the node color are determined in the same way in lines 12 and 13. Line 14 creates the edges linking the seiyu and anime nodes.

Figure 3 shows part of the graph for several famous seiyu.

Fig. 3: Directed graph that connects seiyu to anime they have voiced (2021)

Fullness of Wikidata[edit | edit source]

The list of anime of English Wikipedia contains around 1600 titles. But there are special websites dedicated to anime, such as Gogoanime online cinema which contain information about many more titles. At the time of writing, there were 10,072 anime on Gogoanime (74 pages of 136 titles each plus one page of 8), whereas Wikidata provides information for only about 4875 titles. In addition, we should take into account the rapidness of anime releases. As such, we can conclude that Wikidata does not reflect accurate information about anime (only 48.4% of titles are represented).

We cannot consider Gogoanime a reliable source (RS), but it can be used to analyze the incompleteness of Wikidata.

The query in Sect. 2 returned 2910 names of seiyu from Wikidata. The problem is that we searched only for seiyu, who have worked on anime. When we query the names of all voice actors, without the anime restriction, the resulting number increases by a factor of five (see Query 6.7). A significant increase in the number of results relative to the above-mentioned query reminds us that there are many more areas in the voice acting industry than just anime, for example, Western animation, podcasts and video games. This should be taken into account when forming queries.

# Ordered list of actors according to the quantity of projects voiced by them
SELECT ?actor ?actorLabel (COUNT(?project) AS ?count)
  ?project wdt:P725 ?actor.	 # instance of voice actor
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,ja" }
GROUP BY ?actor	?actorLabel
ORDER BY DESC(?count)	# order by number of voiced projects

SPARQL-query, 3965 results (2017) and 14742 results (2021).

The sunburst diagram, Figure 4, is one way to visualize the output of the query. Such a diagram allows us to see the voice actors who contributed the most to the voice acting industry.

Fig. 4. Sunburst diagram of number of roles voiced by different actors, 2021. The diagram is constructed using Rawgraphs
Fig. 4. Sunburst diagram of number of roles voiced by different actors, 2021. The diagram is constructed using Rawgraphs

Is the release date of anime available?[edit | edit source]

Fans of anime often want to know the release date of their favorite titles. Wikidata does not always contain complete information on release dates. Let us retrieve the number of anime of which the release date is not available using the following query.

# List of anime the release date of which is empty
SELECT ?anime ?animeLabel
    ?anime wdt:P31/wdt:P279* wd:Q1107;                # instance of anime
    FILTER NOT EXISTS { ?anime wdt:P577 [] }          # return the list of anime the release date of which is empty
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en,ja" }

SPARQL-query, 237 results (2017) and 2940 results (2021).

Release dates of 2940 anime out of 4875 titles on Wikidata are not specified, or 60.3%. In 2017, 237 of 683 titles (34.7%) did not have a release date.It seems, unfortunately, an increase in the number of values for a list is not always accompanied by quality property information.

Analysis of seiyu age at time of voice work[edit | edit source]

As for any other profession, seiyu are of a certain age when they work, voicing various anime. SPARQL and external data mining tools, like Python programming language, allows to estimate such ages using available Wikidata.

In order to obtain the input data for our study, we need to execute three SPARQL queries and export their output to .csv files. Next, these CSV files are used in a Python script that generates a chart. You can run Python programs on Google Colaboratory.

We can retrieve the lists of all seiyu and their birthdates from Wikidata with two following queries using the SERVICE command and the rdfs:label construction.

The scripts of the two queries differ in the following ways:

  • The label (name) of a seiyu is retrieved with the ?seiyuLabel variable in the first query (the SERVICE command is used to define the languages of output) and with the rdfs:label command in the second query.
  • In the first query, it is also necessary to follow the ?seiyuLabel with a GROUP BY parameter in order to connect seiyu objects with their labels.
# Get list of all seiyu objects, their names and birth dates
SELECT ?seiyu ?seiyuLabel ?bDate WHERE {
  ?anime (wdt:P31/(wdt:P279*)) wd:Q1107;
    wdt:P725 ?seiyu.       # seiyu is anime voice actors
  ?seiyu wdt:P569 ?bDate.  #       has a birthday
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en,ja"}
GROUP BY ?seiyu ?seiyuLabel ?bDate

SPARQL query, 2515 results (2021).

# Get list of all seiyu objects, their names and birth dates
SELECT ?seiyu (SAMPLE(?seiyu) AS ?seiyuLabel) ?bDate WHERE {
  ?anime (wdt:P31/(wdt:P279*)) wd:Q1107;
    wdt:P725 ?seiyu.       # seiyu is anime voice actors
  ?seiyu wdt:P569 ?bDate.  #       has a birthday 
  ?seiyu rdfs:label ?label.
GROUP BY ?seiyu ?bDate

SPARQL query, 2515 results (2021).

Note that the script retrieves not only the release dates of anime movies (property P577), but also the start dates of the series (property P580).

Let us get the links between seiyu and the anime they have voiced.

# List of links between seiyu and anime where they are involved in
SELECT DISTINCT ?item ?itemLabel ?link ?itemType
  VALUES ?toggle { true false }
  ?anime  wdt:P31/wdt:P279* wd:Q1107; # instance of anime or its subclass
          wdt:P725 ?seiyu.            # list seiyu who acted in this anime
  BIND(IF(?toggle,?anime,?seiyu) AS ?item).                 # connection of "from anime to seiyu" type
  BIND(IF(?toggle,?animeLabel,?seiyuLabel) AS ?itemLabel).  # similar connection between labels
  BIND(IF(?toggle,?seiyu,?anime) AS ?link).                 # # connection of "from seiyu to anime" type
  BIND(IF(?toggle,?seiyu,"seiyu") AS ?itemType).            # service column to distinguish seiyu and anime items
                                                            # if the item describes a seiyu, its value is "seiyu",
                                                            # the link is kept otherwise
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en,ja"}

SPARQL query, 27106 results (2021).

Analysis results can be shown as a histogram, Figure 3. To create it we will use Python libraries Pandas and Matplotlib. The script which generates the final histogram is published on GitHub.

The histogram displays age in years along its X-axis and the total number of roles dubbed by seiyu of this age along the Y-axis.

Fig. 3: Histogram of number of anime voiced by seiyu of different ages (2021)

It is a fun fact that there are occasions on Wikidata when seiyu were born after the release date in which they performed. This issue is probably related to an absence of information on the new seasons of reboots of old anime series. For example, in 2021 such a situation happened with the anime series Sazae-san and the seiyu named Nobunaga Shimazaki. The seiyu was born in 1988, whereas the anime series’ initial start date is 1969.

Future work[edit | edit source]

  1. Find the 10 most popular anime released in the current year. Anime popularity is estimated by the number of articles in different language sections. For example, if an article about anime is present in English, Russian and Spanish Wikipedia, then its popularity score is three.
  2. Find five anime in which the greatest number of female seiyu are involved.
  3. Create a bubble chart of the distribution of anime by genre (including the number of anime in each genre) using the subclass property.
  4. Mark the voice actors’ places of birth on the map.
  5. Create a histogram or bubble chart of voice actor nationalities.
  6. Create a histogram of the number of released anime by year, or of the number of voice actors by year of birth.
  7. Create histograms similar to Figure 3, but taking into account the gender identity of the voice actor (one for males, one for females, and one for other).

Test[edit | edit source]

Syntax error

1 There are some anime:
Rave Master (Shan T Lao Fu Zi)
Tetsujin 28-go (Tetsujin 28-gou)
Grenadier (Grenadier)
Attack on Titan (Shingeki no Kyojin)
Correlate the anime's data with the images below.

1 (Rave Master),2 (Tetsujin 28-go),3 (Grenadier),4 (Attack on Titan)

2 There are anime:
Gurren Lagann (Tengen Toppa Gurren Lagann)
Steins;Gate (Steins;Gate)
Hellsing (Hellsing)
Elfen Lied (Elfen Lied)
Years of the creation of anime are known: 2011, 2007, 2004, 2001.
Arrange the anime's data in order of decreasing date of their creation (1st place is the newest anime, 4th place is the oldest one).

1 place (2011),2 place (2007),3 place (2004),4 place (2001)
Gurren Lagann
Elfen Lied

3 About what anime this description is for?:
Brief description: "And what will happen after death?" Countless generations of people asked this question ..."
Genres: Drama, Action, Comedy, School
Seiyu (fem.): Kana Hanadzawa
Publication date: 2005
Note: Punctuation and spaces signs are important, if there are any of them.

References[edit | edit source]

  • Andrew Krizhanovsky; Andrew Krizhanovsky; Daria Boollieva (2017). "Аниме" [Anime]. Authorea.
  • Parenchenkov E. (2021). "Anime". ProWD. Retrieved 2021-09-24.