Research in programming Wikidata/Countries

From Wikiversity
Jump to navigation Jump to search

The chapter is devoted to the study of countries based on the knowledge base of the Wikidata international project. SPARQL queries were used in order to analyse and compare "countries" objects in Wikidata. A list of all currently existing countries, a list of countries ordered by date of creation, a list of demonyms of countries were generated. A bubble chart with the forms of government of countries, a graph of neighboring countries and a map of neighboring countries of Russia were constructed. In addition, conclusions were drawn regarding the completeness of the Wikidata for this topic.

Note: "Country" is too ambiguous word, so it's better to replace it everywhere with a class sovereign state.

List of countries[edit | edit source]

Let's build a list of all countries in English and Russian.

#List of countries in English and Russian
SELECT ?country ?label_en ?label_ru
WHERE
{
	?country wdt:P31 wd:Q6256. # instance country
	?country rdfs:label ?label_en filter (lang(?label_en) = "en").
	?country rdfs:label ?label_ru filter (lang(?label_ru) = "ru").
}

SPARQL query. The result contains 205 countries in 2017 and 175 in 2020.

According to the degree of occupancy of properties on Wikidann, one can distinguish between "full" and "empty" countries.

Examples of the most complete and developed countries on Wikidata according to ProWD are: Israel, France, United States of America. According to ProWD, the leaders among the countries in terms of the number of properties in Wikidata are Israel and France (127 properties each), the lowest number of properties is in the Democratic Republic of Vietnam (24 properties).

Age of countries[edit | edit source]

Let's build a list of countries sorted by the date of the country's foundation (the first mention of the country).

Given:

# List of countries sorted by inception 
SELECT ?country ?countryLabel ?inception
WHERE
{
	?country wdt:P31 wd:Q6256.    # instance of country
	?country wdt:P571 ?inception. # the first mention
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY (?inception)

SPARQL query. The result contains 112 countries with completed date of foundation in 2017 and 187 in 2021.

As a result of executing the request, a modest list of countries was obtained, including only 184 countries for 2020. Using the example of Russia, we will figure out what is the matter here. The Russia (Q159) object in the "instance of" field contains not one, but eight values, including country (Q6256).

On the Wikidata page "Request a query", some editors ask questions about how to write a particular script, while other editors answer. Use this forum.

The solution and the answer to this question were found on the page "Wikidata: Request a query", namely in the section available at the link https://w.wiki/tLm.

The point is that the wdt construction allows you to find only true values. For Russia, the preferred value in the "instance of" field is a sovereign state, not a country. To check all the options presented in the "instance of" field in Russia, you need to use the p:/ps: construction.

Thus, the script for getting all 232 countries sorted by creation date is shown in the next listing.

# List of countries sorted by inception date
SELECT ?country ?countryLabel
(MIN(?year) AS ?min_year)
WHERE
{
	?country p:P31 [ps:P31 wd:Q6256]. # instance of a country 
	?country p:P571 [ps:P571 ?inception]. # all inception dates
	BIND(YEAR(?inception) AS ?year)
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
GROUP BY ?country ?countryLabel ?min_year
ORDER BY ?min_year

SPARQL query. The result contains 112 countries with completed date of foundation in 2017 and 235 in 2021.

To remove from this list no longer existing countries, that is, instances of the historical country(Q3024240) object, use the MIN US operator.

Using the script, 211 non-historical countries with a known foundation date were obtained.

# List of countries sorted by inception date
SELECT ?country ?countryLabel 
(MIN(?year) AS ?min_year)
WHERE
{
	?country p:P31 [ps:P31 wd:Q6256]. # instance of a country 
	MINUS {?country p:P31 [ps:P31 wd:Q3024240]}. # except historical countries
	?country p:P571 [ps:P571 ?inception]. # all inception dates
	BIND(YEAR(?inception) AS ?year)
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
GROUP BY ?country ?countryLabel ?min_year
ORDER BY ?min_year

SPARQL query. The result contains 112 countries with completed date of foundation in 2017 and 211 in 2021.

For example, France — 463 year, Russia — 862, Republic of Kosovo — 2008, South Sudan — 2011. The largest number of countries appeared in 1960 (16 countries), in 1991 (15 countries), in 1962 (6 countries) and in 1821 (6 countries).

Let's display a list of countries with an empty "inception of" property.

# List of countries  with an empty inception date
SELECT ?country ?countryLabel 
WHERE
{
?country wdt:P31 wd:Q6256. # country
MINUS { ?country wdt:P571 [] }. # inception of country is empty
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL query. The result contains 100 countries without completed date of foundation in 2017 and 7 in 2020.

Completeness of Wikidata[edit | edit source]

Let's analyze the completeness of Wikidata: historical and modern countries.

According to the "Russian classification of countries of the world" there are 251 countries on earth.

This task does not take into account ancient, non-existent states (for example: Assyria(Q41137), since they are not a "country" object but a "historical country" object. Let us note that the number of historical countries is an order of magnitude greater than the existing countries.

Using the script, let's build a list of historical states. There were three thousand such former states, which is an order of magnitude more than the number of modern states.

# List of historical countries
SELECT ?country ?countryLabel
WHERE
{
	?country p:P31 [ps:P31 wd:Q3024240]. # instance of a historical country 
	SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,ru"} 
}

SPARQL query. The result contains 3025 countries without completed date of foundation in 2021.

According to the category of "Alphabetical list of countries and territories" in Russian Wikipedia, there are 252 countries.

According to the category of "List of sovereign states" in English Wikipedia, there are 206 countries.

It is not always possible to specify the exact date of the country's foundation for various reasons: absence, lack or inconsistency of written sources. For example, the basis of the Old Russian state is associated with the vocation of Varangian prince Rurik in 862, but there is no exact date (object Russia (Q159)). Also, some modern countries were preceded by a number of others and the date of formation of which of them should be considered as the date of creation of the country is an open question (for example, Mongolia(Q711).

List of demonyms in English[edit | edit source]

Demonyms — is the name of the inhabitants of a certain area, correlated with the toponym. For example, demonyms for Russia will be Russians, a Russian, a Russian woman, for the Czech Republic — Czechs.

In addition to the geographical factor, the new lexemes used to determine origin or belonging also come from ethnic, political, religious characteristics of people.

Demonyms can be defined by the names of different objects of the earth's surface, mountains, islands, continents. Also, the designation of the place of origin of people may depend on the political and administrative division. For example, to denote citizenship: Thailand — Thai people, Canada — Canadians. Intra-state division can also give rise to new names, Crimea — Crimeans.

Let's build a list of countries that have demonyms in English.

Given:

# List of countries with demonyms in English
SELECT ?country ?countryLabel 
WHERE
{
	?country p:P31 [ps:P31 wd:Q6256]. # instance of a country
	?country wdt:P1549 ?demonym .     # has demonym
	FILTER((LANG(?demonym)) = "en")
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
GROUP BY ?country ?countryLabel

SPARQL query. The result contains 197 countries with demonyms in 2017 and 209 in 2021.

List of demonyms[edit | edit source]

Let's build a list of all demonyms in English.

# List of demonyms of countries in English
SELECT ?country ?countryLabel ?demonym
WHERE
{
	?country p:P31 [ps:P31 wd:Q6256]. # instance of a country
	?country wdt:P1549 ?demonym .     # has demonym
	FILTER((LANG(?demonym)) = "en")
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL query. The result contains 237 demonyms in 2017 and 296 in 2021.

Countries with unfilled demonyms[edit | edit source]

Let's build a list of countries which do not have demonyms in English.

#List of countries without demonyms in English
SELECT ?country ?countryLabel 
WHERE
{
	?country p:P31 [ps:P31 wd:Q6256].  # instance of a country
	MINUS { ?country wdt:P1549 ?demonym.    # without demonyms
		FILTER((LANG(?demonym)) = "en") # in Russian
	}
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
GROUP BY ?country ?countryLabell

SPARQL query. The result contains 5 countries without demonyms in 2017 and 9 in 2021.

Thanks to the MINUS construction, the final list did not include countries with ethno-burial names in Russian.

Number of demonyms in countries[edit | edit source]

One country can have from zero, if the data is not filled in, to three or four ethnohoronyms. For example, Turkey has three names of its inhabitants: Turks, Tarchanka, Turks, Ethiopia has four: Ethiopian, Ethiopian, Ethiopian, Ethiopian.

Let`s display the list of countries, ordered by the number of demonyms filled in Wikidata.

# List of countries ordered by number of demonyms
SELECT  ?country ?countryLabel (COUNT(*) AS ?demonyms)
WHERE
{
	?country p:P31 [ps:P31 wd:Q6256].# instance of a country
	?country p:P1549 [ps:P1549 []].  # has demonym
	SERVICE wikibase:label {bd:serviceParam wikibase:language "en"}
}
GROUP BY ?country ?countryLabel 
ORDER BY DESC(?demonyms)

SPARQL query. The result contains 199 count of demonyms in countries in 2017 and 215 in 2021.

According to data for 2017, the United States of America has the largest number of demonyms (41 demonyms), followed by Great Britain (40), Germany (40) and Canada (36). For 2021, the largest number of demonyms is in Germany (64 demonyms), Russia (61), Canada (60) and the United States (60). Thus, from 2017 to 2021, approximately 20 demonyms were added per country.

Вasic forms of government[edit | edit source]

Let's construct a bubble diagram of countries' government forms, where the size of the bubble will correspond to the number of countries with one form of government or another.

Given:

# Forms of government ordered by number of countries
#defaultView:BubbleChart
SELECT ?bfog ?form (COUNT(*) AS ?countries)
WHERE 
{
	?country p:P31 [ps:P31 wd:Q6256].# instance of a country
	?country p:P122 [ps:P122 ?bfog].# # basic form of government
	OPTIONAL {
	?bfog rdfs:label ?form
		FILTER (LANG(?form) = "en")
	}
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en"}
}
GROUP BY ?bfog ?form
ORDER BY DESC(?countries) ASC(?form)

SPARQL query. The result contains 30 basic forms of government in 2017 and 41 in 2020.

The variable "bfog" (short for "basic form of government") contains the form of government, for example, "republic".

The last line in query contains the ordering commands first in descending order (DESC) and then ascending order (ASC). Thus, the forms of government are first sorted by the number of countries (?countries). Then, if the countries are equally divided, then the forms of government are sorted lexicographically.

As a result of the query, we get a bubble chart with the most popular forms of government in countries in 2017 and in 2020.

Bubble chart forms of government countries 2017
Bubble chart forms of government countries 2020


Thus, for the period from 2017 to 2020, the form of government "republic" became more "popular". The number of countries having the form of a "mixed republic" has significantly decreased. Forms such as democratic centralism, democratic republic, democracy, Islamic state and parliamentary democracy emerged.

Neighboring countries[edit | edit source]

Countries have such a property as a common border. On Wikidata, this property is shares border with (P47). Using this property, let's build a graph of neighboring countries.

Given:

# Graph of countries which share border
#defaultView:Graph
SELECT ?country ?countryLabel ?border ?borderLabel
WHERE
{
	?country p:P31 [ps:P31 wd:Q6256]. # instance of a country
	OPTIONAL { ?country wdt:P47 ?sharesBorderWith }
	SERVICE wikibase:label {bd:serviceParam wikibase:language "en"}
}

SPARQL query. The result contains 795 neighboring countries in 2017 and 912 in 2020.

As a result of the query, we get a graph with 787 edges on 2017 and 912 edges on 2020, where the edge is a neighborhood between the two countries. The graph represents several connected components, since there are island countries that do not have neighbors (for example, Mauritius, Maldives, Madagascar).

Neighboring countries graph, 2017
Neighboring countries graph, 2020


Neighboring countries of Russia[edit | edit source]

We will construct a graph of neighboring countries of Russia.

# Map of neighboring countries of Russia
#defaultView:Map
SELECT ?border_country ?border_countryLabel ?coords ?layer
WHERE 
{                                         # border_country
	?border_country p:P47 [ps:P47 wd:Q159]. #   has border with Russia
	?border_country p:P31 [ps:P31 wd:Q6256].#   is a country
	OPTIONAL {?border_country wdt:P3896 ?coords.}
	BIND (?coords AS ?layer)
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

SPARQL-запрос. The result contains \num{17} neighboring countries in 2021.

The line in query with the comment "is a country" is needed to check that the object specified as "having a common border"' with Russia is a country. This made it possible to exclude from the list the region of Georgia (Racha-lechkhumi and Kvemo-Svaneti), and for example, the island of Japan (Hokkaido), indicated in the list of border objects.

As a result of the query execution, we get a map of neighboring countries of Russia, including 17 countries, namely: Japan, Norway, USA, Finland, Sweden, Poland, Lithuania, People's Republic of China, Belarus, Estonia, Latvia, Ukraine, Azerbaijan, Georgia, Kazakhstan, DPRK and Mongolia.

Map of neighboring countries of Russia, 2020.


Future work[edit | edit source]

  1. Build a list of country flags and mottos. Not all countries have mottos.
  2. Mark the capitals of modern countries on the map.
  3. In each part of the world, calculate the top five countries with the highest population density.
  4. Build a bar graph showing the distribution of the number of countries by government. Evaluate whether this distribution is a "heavy tail".
  5. Print the list of countries sorted by the number of neighbors. Which countries have the most and least neighbors, what is the average number of neighbors? Is there a correlation between this indicator and any other country dimension?

Tasks[edit | edit source]

1 Which of the two hundred existing countries today emerged in the most productive years by the number of formed countries?

1821, 1918, 1971, 1991
16 стран: Russia, Moldova, Belarus, Ukraine, Estonia, Slovenia, Republic of Macedonia, Croatia, Azerbaijan, Georgia, Kazakhstan, Uzbekistan, Armenia, Kyrgyzstan, Tajikistan
6 стран: Greece, Peru, Guatemala, Honduras, Costa Rica, Nicaraguа
5 стран: Latvia, Lithuania, Poland, Estonia, Georgia
4 страны: Bangladesh, Bahrain, Qatar, Sri Lanka

2 Latvia has 119, Thailand 77, Denmark 5, and Russia 81. What we are talking about?

Is a number of cities with a population of more than one million?
Is a number of higher education institutions?
Is a number of Administrative Units?
Is a number of official languages?

3 Area: Israel 20770 square kilometers, population 8463400 people, area Mongolia 1566000 square kilometers, population 2953190 people, area Republic of Korea 100295 Square kilometers, population 50219669 people, and the area of Singapore 719.1 square kilometers, the population of 5781728 people.
Arrange the flags of these Asian countries in order of increasing population density.

1 place,2 place,3 place,4 place

4 Which of these languages are official in Russia?

Abaza
Moksha
Erzya
Belarusian


SPARQL queries with answers:

References[edit | edit source]


  • "Общероссийский классификатор стран мира" [Russian classification of countries of the world]. All-Russian classifiers. 2016. Retrieved May 3, 2017.
  • Balakireva M. (2020). "Countries". ProWD. Retrieved September 24, 2020.

Links[edit | edit source]