Research in programming Wikidata/Cities

From Wikiversity
Jump to navigation Jump to search

The article is devoted to the study of different types of cities corresponding to the four objects of Wikidata - "Town", "City", "Big city" and "City with millions of inhabitants". Using SPARQL queries to Wikidata, data on the number of instances of the objects under study was obtained and the following information was gathered:

  • Population of different types of cities
  • Number of cities without sister cities
  • List of cities ordered by number of sister cities
  • Number of cities with certain amount of sister cities
  • Country with most sister cities
  • Closest neighbours of Russia by number of sister cities


Item lists[edit | edit source]

"Town"[edit | edit source]

SELECT ?city ?cityLabel WHERE {
    ?city wdt:P31 wd:Q3957.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL query, 13800 records (2020).

"City"[edit | edit source]

  • Wikidata element: Q515
SELECT ?city ?cityLabel WHERE {
  ?city wdt:P31 wd:Q515.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL query, 20800 records (2017), 9260 records (2020).

Most complete elements include > San-Francisco, Berlin, Petrozavodsk, …

Almost empty elements are > Madinat Zayed, Muzaffarpur, Willow-River, …

According to ProWD Singapore is the leader in terms of the number of properties (104 properties) among cities around the world. Novorossiysk contains 31 properties. This is the maximum number of properties for Russian cities.

"Big city"[edit | edit source]

SELECT ?city ?cityLabel WHERE {
  ?city wdt:P31 wd:Q1549591.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL query, 198 records (2017), 3075 records (2020).

Most complete elements include > Bern, Berlin, Geneva, …

Almost empty elements are > Balanga (Nigeria), Ungaran, Kayes, …

According to ProWD Singapore is the leader in terms of the number of properties (104 properties) among big cities around the world. Moscow contains 76 properties. This is the maximum number of properties for Russian big cities.

"City with millions of inhabitants"[edit | edit source]

SELECT ?city ?cityLabel WHERE {
  ?city wdt:P31 wd:Q1637706.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL query, 616 records (2020).

Different types of cities[edit | edit source]

SELECT ?city ?cityLabel WHERE {                                     # Selecting items which are ...
  { ?city wdt:P31 wd:Q3957 } UNION                                  # ... instances of "town" ...
  { ?city wdt:P31 wd:Q515 } UNION                                   # ... instances of "city" ...
  { ?city wdt:P31 wd:Q1549591 } UNION                               # ... instances of "big city" ...
  { ?city wdt:P31 wd:Q1637706 }                                     # ... instances of "city with millions of inhabitants"
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

SPARQL query, 26751 records (2020).

Population[edit | edit source]

"Town"[edit | edit source]

Used:

SELECT (SUM(?population_city) as ?sum) WHERE {                    # Selecting total population of items which are
  SELECT (MAX(xsd:integer(REPLACE(STR(?population),"\\.",""))) as ?population_city) ?city WHERE {
    ?city wdt:P31 wd:Q3957.                                       # ... instances of "town" ...
    ?city wdt:P1082 ?population                                   # ... with filled property "population"
  }
  GROUP BY ?city
}

SPARQL query, 53,30 million people (2020).

"City"[edit | edit source]

Used:

SELECT (SUM(?population_city) as ?sum) WHERE {                    # Selecting total population of items which are
  SELECT (MAX(xsd:integer(REPLACE(STR(?population),"\\.",""))) as ?population_city) ?city WHERE {
    ?city wdt:P31 wd:Q515.                                        # ... instances of "city" ...
    ?city wdt:P1082 ?population                                   # ... with filled property "population"
  }
  GROUP BY ?city
}

SPARQL query, 1 133,56 million people (2020).

"Big city"[edit | edit source]

Used:

SELECT (SUM(?population_city) as ?sum) WHERE {                    # Selecting total population of items which are
  SELECT (MAX(xsd:integer(REPLACE(STR(?population),"\\.",""))) as ?population_city) ?city WHERE {
    ?city wdt:P31 wd:Q1549591.                                    # ... instances of "big city" ...
    ?city wdt:P1082 ?population                                   # ... with filled property "population"
  }
  GROUP BY ?city
}

SPARQL query, 2 538,49 million people (2020).

"City with millions of inhabitants"[edit | edit source]

Used:

SELECT (SUM(?population_city) as ?sum) WHERE {                    # Selecting total population of items which are
  SELECT (MAX(xsd:integer(REPLACE(STR(?population),"\\.",""))) as ?population_city) ?city WHERE {
    ?city wdt:P31 wd:Q1637706.                                    # ... instances of "city with millions of inhabitants" ...
    ?city wdt:P1082 ?population                                   # ... with filled property "population"
  }
  GROUP BY ?city
}

SPARQL query, 2 118,39 million people (2020).

Analysis[edit | edit source]

Different characters, such as point, comma, or space, are used as separators in different countries. As a result, the variants of representing the value of the population property can also be different. Problems arise when using a point, because in Wikidata this character is the separator between the integer and decimal parts of a number. To disambiguate, REPLACE function to remove the specified character should be used. This conversion does not affect the value itself, since the population is an integer, and the separators are used solely for ease of reading.

The table below shows a summary of the population of different types of cities, as well as the proportion of the population per type of city of the world population, which reached approximately 7,8 billion people in 2020[1]. According to Wikidata, almost three quarters of the world's population live in cities.

City type Population
(million people)
% of
world
"Town" 53,30 0,7 %
"City" 1 133,56 14,5 %
"Big city" 2 538,49 32,5 %
"City with millions of inhabitants" 2 118,39 27,1 %
Total 5 843,74 74,8 %

Sister cities[edit | edit source]

Sister cities are cities of different states that have established permanent friendly relations with each other in order to strengthen international relationship in the fields of culture, economics, creation and management of urban infrastructure, the functioning of civil society, and so on[2].

How many cities don't have a single sister city?[edit | edit source]

Used:

SELECT (COUNT(?city) as ?count) WHERE {                             # Counting items which are ... 
  { ?city wdt:P31 wd:Q3957 } UNION                                  # ... instances of "town" ...
  { ?city wdt:P31 wd:Q515 } UNION                                   # ... OR instances of "city" ...
  { ?city wdt:P31 wd:Q1549591 } UNION                               # ... OR instances of "big city" ...
  { ?city wdt:P31 wd:Q1637706 }                                     # ... OR instances of "city with millions of inhabitants"
  FILTER NOT EXISTS { ?city wdt:P190 [] }                           # ... with unfilled property "sister city"
}

SPARQL query, 21479 cities (2020).

There are 26751 cities of four types known by Wikidata for 2020. Thus, sister cities are known only for 20% of cities.

List of cities ordered by number of sister cities[edit | edit source]

All[edit | edit source]

Used:

SELECT ?city ?cityLabel (COUNT(?sister) AS ?sisterCount) WHERE {     # Counting sister cities of cities which are ...
  { ?city wdt:P31 wd:Q3957 } UNION                                   # ... instances of "town" ...
  { ?city wdt:P31 wd:Q515 } UNION                                    # ... OR instances of "city" ...
  { ?city wdt:P31 wd:Q1549591 } UNION                                # ... OR instances of "big city" ...
  { ?city wdt:P31 wd:Q1637706 }                                      # ... OR instances of "city with millions of inhabitants"
  ?city wdt:P190 ?sister.                                            # ... with filled property "sister city"
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?city ?cityLabel                                            # Grouping by city
ORDER BY DESC(?sisterCount)                                          # Sorting by number of sister cities (descending)

SPARQL query, 4046 cities with sister cities (2020).

Russia[edit | edit source]

Used:

SELECT ?city ?cityLabel (COUNT(?sister) AS ?sisterCount) WHERE {           # Counting sister cities of cities which are ...
  VALUES ?cityTypes {wd:Q3957 wd:Q515 wd:Q1549591 wd:Q1637706}
  ?city wdt:P31 ?cityTypes.                                          # ... instances of different types of cities ...
  ?city wdt:P17 wd:Q159.                                             # ... belonging to Russia ...
  ?city wdt:P190 ?sister.                                            # ... with filled property "sister city"
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?city ?cityLabel                                            # Grouping by city
ORDER BY DESC(?sisterCount)                                          # Sorting by number of sister cities (descending)

SPARQL query, 82 cities with sister cities (2020).

There were more cities wishing to be friends with the cultural capital of Russia (Saint Petersburg, 230 sister cities) than with the official capital (Moscow, 134 sister cities) for 2020. Omsk (58), Volgograd (56) and Kaliningrad (54) had almost the same number of sister cities. Petrozavodsk, Perm, Vladimir and Belgorod each had 14 sister cities.

Number of cities with certain amount of sister cities[edit | edit source]

All[edit | edit source]

Used:

#defaultView:LineChart                                                   # Do line chart as result representation
SELECT ?sisterCount (COUNT(?sisterCount) AS ?FreqNSister) WHERE {        # Count No. of cities having ?sisterCount sister cities                                                                        
                                                                         # and number of sister cities themselves
  {
     SELECT (COUNT(?sister) AS ?sisterCount) WHERE {                     # Count sister cities of cities which are ...
       VALUES ?cityTypes {wd:Q3957 wd:Q515 wd:Q1549591 wd:Q1637706}
       ?city wdt:P31 ?cityTypes.                                         # ... instances of different types of cities ...
       ?city wdt:P190 ?sister.                                           # ... with filled property "sister city"
     }
     GROUP BY ?city                                                      # Group list by city
  }
}
GROUP BY ?sisterCount                                                    # Group by number of sister cities
ORDER BY DESC(?sisterCount)                                              # Order by number of sister cities (descending)

SPARQL query, 90 variants of sister cities amount (2020).

Relation between number of sister cities the city have (S) and number of world cities which have this amount of sister cities (N), 2020
Relation between number of sister cities the city have (S) and logarithm of the number of world cities which have this amount of sister cities (N), 2020


A little more than four thousand cities (4046 cities) have at least one sister city, of which:

  • 32% (1314 cities) have relations with more than five cities;
  • 18% (728 cities) have at least 11 sister cities;
  • 9% (345 cities) friends with more than 20 cities;
  • 2% (94 cities) have 50 or more sister cities.

It can be concluded that the relation between number of sister cities the city have and number of cities which have this amount of sister cities has a distribution close to a power law.

Russia[edit | edit source]

Used:

#defaultView:LineChart                                                   # Do line chart as result representation
SELECT ?sisterCount (COUNT(?sisterCount) AS ?FreqNSister) WHERE {        # Count No. of cities having ?sisterCount sister cities                                                                        
                                                                         # and number of sister cities themselves
  {
     SELECT (COUNT(?sister) AS ?sisterCount) WHERE {                     # Count sister cities of cities which are ...
       VALUES ?cityTypes {wd:Q3957 wd:Q515 wd:Q1549591 wd:Q1637706}
       ?city wdt:P31 ?cityTypes.                                         # ... instances of different types of cities ...
       ?city wdt:P17 wd:Q159.                                            # ... belonging to Russia ...
       ?city wdt:P190 ?sister.                                           # ... with filled property "sister city"
     }
     GROUP BY ?city                                                      # Group list by city
  }
}
GROUP BY ?sisterCount                                                    # Group by number of sister cities
ORDER BY DESC(?sisterCount)                                              # Order by number of sister cities (descending)

SPARQL query, 24 variants of sister cities amount (2020).

Relation between number of sister cities the Russian city have (S) and number of Russian cities which have this amount of sister cities (N), 2020


A little less than a hundred Russian cities (82 cities) have at least one sister city, of which only 48% (39 cities) are connected with over than five cities.

Which country has the most sister cities?[edit | edit source]

Used:

#defaultView:BubbleChart
SELECT ?countryLabel (COUNT(?sister) as ?sisterCount) WHERE {       # Selecting number of distinct sister cities of particular country cities which are ... 
  SELECT DISTINCT ?countryLabel ?sister WHERE {                           
    VALUES ?cityTypes {wd:Q3957 wd:Q515 wd:Q1549591 wd:Q1637706}
    ?city wdt:P31 ?cityTypes.                                       # ... instances of different types of cities ...
    ?city wdt:P17 ?country.                                         # ... with filled property "country" ...
    ?city wdt:P190 ?sister.                                         # ... with filled property "sister city"
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  }                                 
}
GROUP BY ?countryLabel
ORDER BY DESC(?sisterCount)

SPARQL query, 208 countries (2020).

Bubble chart of the countries, the size of the ball - the sister cities number of certain country cities, 2020


Germany had the largest number of sister cities (1375 cities) for 2020.

List of countries having sister cities with Germany[edit | edit source]

Used:

SELECT ?country ?countryLabel (COUNT(DISTINCT ?sister) as ?sisterCount) WHERE {  
                                                                         # Selecting number of distinct particular country sister cities of cities which are ...
    VALUES ?cityTypes {wd:Q3957 wd:Q515 wd:Q1549591 wd:Q1637706}
    ?city wdt:P31 ?cityTypes.                                            # ... instances of different types of cities ...
    ?city wdt:P17 wd:Q183.                                               # ... belonging to Germany ...
    ?city wdt:P190 ?sister.                                              # ... with filled property "sister city" which are ...
    ?sister wdt:P17 ?country.                                            # ... with filled property "country" ...
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?country ?countryLabel
ORDER BY DESC(?sisterCount)

SPARQL query, 93 countries (2020).

The table shows a list of ten countries that have the largest number of sister cities with Germany (2020).

# Country Number of
sister cities
% of total
1 France 247 18,0 %
2 Germany 195 14,2 %
3 United Kingdom 120 8,7 %
4 Italy 86 6,3 %
5 Poland 81 5,9 %
6 United States of America 60 4,4 %
7 Austria 41 3,0 %
8 Russia 39 2,8 %
9 Hungary 39 2,8 %
10 Belgium 33 2,4 %

Closest neighbours of Russia by number of sister cities[edit | edit source]

Used:

#defaultView:Map
SELECT ?country ?countryLabel ?sisterCount ?shape ?layer WHERE {
  { # Selecting number of distinct particular country sister cities of cities which are ...
    SELECT ?country ?countryLabel (COUNT(DISTINCT ?sister) as ?sisterCount) WHERE {  
      VALUES ?cityTypes {wd:Q3957 wd:Q515 wd:Q1549591 wd:Q1637706}
      ?city wdt:P31 ?cityTypes.         # instances of different types of cities
      ?city wdt:P17 wd:Q159.            # city belongs to Russia
      ?city wdt:P190 ?sister.           # city has "sister city"
      ?sister wdt:P17 ?country.         # which belongs to "country"
      FILTER(?country NOT IN(wd:Q159))  # except the Russia
      SERVICE wikibase:label {bd:serviceParam wikibase:language "en"}
    }
    GROUP BY ?country ?countryLabel
    ORDER BY DESC(?sisterCount)
  }
  OPTIONAL {?country wdt:P3896 ?shape.} # country has "geoshape"
  BIND(
    IF(?sisterCount < 5, "<5",
    IF(?sisterCount <= 10, "5-10",
    IF(?sisterCount <= 20, "11-20",
    IF(?sisterCount <= 30, "21-30",
    IF(?sisterCount <= 40, "31-40",
    ">40"))))) AS ?layer).
}

SPARQL query, 102 countries (2020).

Map of closest neighbours of Russia by number of sister cities, 2020


Russia has more than twenty sister cities with countries such as United States of America (46), China (46), Germany (44), Ukraine (28), Bulgaria (25), Poland (24), France (23) and Italy (22).

Wikidata completeness and disadvantages[edit | edit source]

City is a type of human settlement with people not occupied with agriculture. At the same time, different countries use different criteria when assigning city status to settlements, the main of which is population. Some countries don't define a term "city" at all. So, in France, only one geographic unit of this kind is used — a commune, regardless of the number of people living in it and the type of their activity. Therefore, it can be difficult to clearly determine which settlement is classified as a city and which is not.

In practice, some Wikidata objects can simultaneously be instances of different types of cities. For example, Shanghai is assigned to three objects under study: city, big city, city with millions of inhabitants. It is easy to guess that such multiple assignment affects the results of SPARQL queries, in particular, using the UNION construction. This can be verified by running, for example, SPARQL query for finding different types of cities. Shanghai is found in the results for three times.

Wikidata has an inheritance mechanism expressed in the subclass of property. This mechanism consists in the fact that if an object is an instance of big city, then it is also an instance of city, since big city is a subclass of city. Thus, the situation described above with Shanghai can be resolved by leaving only one class — city with millions of inhabitants. It should be noted that replacing a UNION construction with a subclassing construction is not equivalent.

# Selecting items which are ...
SELECT ?city ?cityLabel WHERE {
	# ... instances of "city" ...                  
	{ ?city wdt:P31 wd:Q515 } UNION    
	# ... instances of "big city" ...                               
	{ ?city wdt:P31 wd:Q1549591 } UNION     
	# ... instances of "city with millions of inhabitans"                          
	{ ?city wdt:P31 wd:Q1637706 }                                     
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

SPARQL query

# Selecting items which are ...
SELECT ?city ?cityLabel WHERE {
	# ... instances of "city" subclasses 
	?city wdt:P31/wdt:P279* wd:Q515
	SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

SPARQL query

Shanghai, considered earlier, can be found four times in the new query results. The fact is that in addition to some of the objects under study, there are other classes inherited from city. For example, lost city, free imperial city, autonomous city and even ideal city.

Also, probably due to the ambiguity in the criteria for assigning city status, subclasses were created for specific countries — city in Chile, city in Cyprus, city of Japan and so on. This tendency was not spared by the cities of Russia, which could be noticed when comparing the results of a SPARQL query to find instances of the "City" object. For 2020, most of them belong to the city/town class.

According to the Russian Census (2010)[3] and the Crimean Federal District Census (2014)[4] , the total number of Russian cities was 1117 in 2014. All cities in Russia have an article in both Russian and English Wikipedia.

Number of Wikidata elements which are Russian cities equals to 1126[5]. It can be assumed that Wikidata completely covers, at least, Russian cities.

Future work[edit | edit source]

  1. Construct a graph of Russian sister cities.
  2. Get list of Russian cities situated beyond the Arctic circle.
  3. On which river in Russia is the largest number of cities located?
  4. Which country has the largest proportion of sister cities within a country relative to the number of sister cities that relate that country to other countries?

Tests[edit | edit source]

1 Which of the following cities were named after toponyms?

Tolyatti
Tula
Chernyakhovsk
Kurilsk
Vologda
Obninsk

2 Which of the following flags are belonging to these cities: Nizhnevartovsk, Petropavlovsk-Kamchatsky, Neftekamsk, Karabulak?

3 Which of the following cities were founded more than 400 years ago?

Moscow
Sarov
Kazan
Astrakhan
Samara
Voronezh

Check yourself:

  1. cities named after toponyms
  2. flags of cities
  3. founded more than 400 years ago

Addon[edit | edit source]

  1. Total number of sister city statements per country

References[edit | edit source]

  • Menshikova E. (2020). "Cities in Russia". ProWD.
  • Menshikova E. (2020). "Big cities in Russia". ProWD.

Links[edit | edit source]