Research in programming Wikidata/Human settlements

From Wikiversity
Jump to navigation Jump to search

This article explores the object of the Wikidata "human settlement" and its properties. The following problems were solved in the paper with the help of SPARQL requests: finding instances of the object "human_settlement", building an ordered list of countries by the total population, living in the "human_settlement" and a list of objects that accompany the "human_settlement" in the "instance" property. Also a graph was constructed, which show the proportion of the population living in "human settlement". The diagram shows that a high percentage of the population living in "human settlement", accounts for less industrial countries, while a small percentage of the population living in "human settlement" have industrialized countries. In addition, an analysis of the completeness of the Wikidata on the basis of solved tasks is performed. The property "instance_of" was added to several objects to improve the results.

Instances of the object "human settlement"[edit | edit source]

Let's build a list of all the human settlements.

#list of all human settlements
SELECT ?hum ?humLabel 
WHERE 
{
  ?hum wdt:P31 wd:Q486972. # instance of human settlement
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]"}
}

SPARQL-query, 411,393 results for 2017. In 2021, the list of settlements cannot be obtained due to the large number of objects and, therefore, the script takes too long.

Let's count all settlements.

# Number of human settlements
SELECT (COUNT(?hum) AS ?count) 
WHERE {
  ?hum wdt:P31 wd:Q486972. # instance of human settlement
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]"}
}

SPARQL-query, 563,416 results for 2021.

As a result of the request, 563416 unique human settlements were received.

The most complete and detailed human settlements on Wikidata are: Antakya, General Roca, Padre Las Casas.

Almost empty and less informative human settlements were: Belomorsk, Segezha, Yanishpole.

Duplicate objects were found and merged: Belomorsk with Belomorsk, Segezha with Segezha, Yanishpole with Yanishpole.

According to ProWD the Tokyo is the leader in terms of the number of properties (73 properties) among human settlements around the world. Yalta contains 36 properties. This is the maximum number of properties for Russian human settlements.

List of countries by total population[edit | edit source]

Let us construct an ordered list of countries by the total number of people living in "human settlements".

# List of countries by population in settlements
SELECT ?country ?countryLabel (SUM(?population) as ?sumPopulation)
WHERE
{
  ?hum wdt:P31 wd:Q486972; # instance of human settlement
       wdt:P17 ?country;   # settlement in the ?country
       wdt:P1082 ?population. # settlement has ?population
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]"}
}
GROUP BY ?country ?countryLabel 
ORDER BY DESC (?sumPopulation)

SPARQL-query, 161 results (2017), 213 results (2021).

The human settlements are grouped by countries using GROUP BY command:

GROUP BY ?country ?countryLabel
The bubble diagram of countries by the total number of people living in "human settlement"


The bubble diagram above shows countries by the total number of people in "human settlement". The diagram and query show that the biggest number of the population live in the "human settlement" in such countries as Brazil (12 million), Pakistan (10 million), Mexico (8 million), Yemen (8 million), India (7 million), Bangladesh (7 million). These countries have climatic and geographic conditions for comfortable living in human settlement.

Checking that the script is executed correctly[edit | edit source]

To verify the correctness of the calculations, let's write a script where the list of human settlements with the number of inhabitants for the country with the smallest result of the total population can be seen. The request showed that this is the country Montenegro, therefore, we get the list of settlements and their population in Montenegro.

# List of settlements and population of Montenegro
SELECT ?humLabel ?hum ?population
WHERE {
  ?hum wdt:P31 wd:Q486972;    # instance of human settlement
       wdt:P17 wd:Q236;       # settlement in the Montenegro
       wdt:P1082 ?population. # settlement has ?population
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]"}
 }

SPARQL-запрос, 1 result (2017), 25 results (2021).

The test was successful, as this script showed the population number of people living in "human settlements" Montenegro. This population number is the same as in the previous script that shows countries by the total number of people living in "human settlement".

Completeness of the Wikidata[edit | edit source]

Human settlement is a common name for places with permanent residents. According to the editors of the Wikidata, the concept of a human settlement includes cities, villages, hamlets and others. The complete list can be seen in the section of this article «List of objects associated with "human_settlement" in "instance of"». There was no exact information on the number of human settlements in the world. Therefore, the completeness of the human settlements that are on the Wikidata will be checked. The given task is: to build an ordered list of countries by the total number of people living in the "human settlement". To do this, let is construct a request that will show the human settlements with an empty property 'population' - SPARQL-query. The query showed that there are 372997 such settlements. So from 411393 (based on query «Instances of the "human settlement"») only 38396 or 9.3% of human settlements have a 'population' property. And now let's look at the settlements, which do not have the country - SPARQL-query. There were 8427 objects. Therefore, as a result of solving this problem, an incomplete picture was obtained of the total population in settlements by country.

According to the project "Human settlements of Russia/Statistics", the Russian Wikipedia contains approximately 75000 articles about the settlements of Russia. According to the 2010 census, there are 155510 settlements in Russia. Let's check how many objects are contained in the Wikidata about Russian settlements with the help of the following SPARQL-query. As a result, 4113 objects will be received, which is 2.6% of the total number of settlements. Thus, the Wikidata contain too little information about the settlements of Russia.

So, the degree of filling of the Wikidata by human settlements is low. Namely, in some cities, towns, villages and other settlements on the Wikipedia there is no property "instance of", whose value can be "human settlement". In addition, there are almost empty and poorly completed objects. To solve these problems, it is need to fill in these properties and link the objects of the Wikidata to each other.

Filling in the Wikidata[edit | edit source]

The "instance of" property of 100 objects of human settlements in Russia (with empty property "instance of") was assigned the value "human settlement".

As of October 25, 2017, the Wikidata contained 4207 objects about the human settlements of Russia, which was 2.6% of the total number of settlements according to the census for 2010 and 5.6% of the data of the Russian Wikipedia. This can be seen with the following SPARQL-query.

The proportion of the country's population living in the "human settlement"[edit | edit source]

Construct an ordered list of countries by the percentage of the ratio of the population living in "human settlements" to the number of inhabitants in the country.

# An ordered list of the ratio of the number of people living in "settlement" to the number of inhabitants in the country.
SELECT ?country ?countryLabel (SUM(?population / ?pop) as ?proportionPopulation) (?proportionPopulation * 100 as ?percentPopulation)
WHERE {
  ?hum wdt:P31 wd:Q486972.    # instance of human settlement  
  ?hum wdt:P17 ?country.      # country 
  ?hum wdt:P1082 ?population. # population
  ?country wdt:P1082 ?pop.    # population in the country
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]"}
}
GROUP BY ?country ?countryLabel
ORDER BY DESC (?percentPopulation)

SPARQL-запрос, 158 results (2017), 206 results (2021).

Diagram of the share of the population of the country living in "human settlements"


The curve in the figure for each individual country shows the ratio of the number of people living in "human settlements" to the number of inhabitants in the country. The graph shows that the highest percentage accounted for the following countries are Kiribati (78%), Niue (70%), Greece (53%), Tuvalu (48%), Comoros (43%), Mauritius (42%). It is interesting to note that these are mostly small island states. Probably, most of the inhabitants of these countries are concentrated in settlements.

Consider the G8 countries: Russia (2.98%), the USA (1.76%), Japan (0.80%), Canada (0.26%), France (0.20%), Germany (0.24%), Great Britain (0.18%), Italy ( 0.07%). Note that these are industrialized countries.

Let us derive the following hypothesis: a high percentage of the population of the country living in "human settlements" indicates a more agrarian country. In fact, there is the possibility of developing agriculture in these territories. Based on the graph and query, it can be seen that the highest percentage accounted for countries that are island, southern, hot countries, in which it is inappropriate to develop industry (a small territory, a small number of people, remoteness from the continents). And the industrialized countries (G8) have a very low percentage of the population of the country living in "human settlements". Consequently, the hypothesis is confirmed.

The list of objects that accompany "human settlement" in "instance of"[edit | edit source]

Let's construct the list of the objects accompanying "human_settlement" in the "instance of" property.

# List of objects accompanying "human_settlement" in the property "instance of"
SELECT ?inst ?instLabel (COUNT(?hum) as ?sumHum) 
WHERE{ 
  ?hum wdt:P31 wd:Q486972;  # instance of human settlement
       wdt:P31 ?inst.       # other objects in instance
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]"}
}  
GROUP BY ?inst ?instLabel

SPARQL-запрос, the last query takes too long to run and yields an error message: "Query timeout limit reached". Let's add several constraints to this query, in order to speed up it and to reduce the number of result objects.

# List of objects accompanying "human_settlement" in the property "instance of"
SELECT ?inst (COUNT(?hum) as ?sumHum) 
WHERE{          
  ?hum wdt:P31 wd:Q486972; # instance of human settlement
       wdt:P31 ?inst.      # other objects in instance
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]"}
}  
GROUP BY ?inst

SPARQL-запрос, 610 results (2017), 1245 results (2021).

First, let's turn off from consideration such settlements that have only human settlement in the list "instance of". The result will not deteriorated, since it will not include only the "human settlement" type. To this end, we will include in our script a filter for the selection of the necessary settlements.

Secondly, we will not consider such objects of variable ?inst, which have the property "country". This will allow to cut off hundreds of types of settlements specific for individual countries, for example, administrative-territorial unit of Russia.

These restrictions allowed to fulfill the request for all countries of the world in an acceptable time (87 ms).

# Modernized list of objects accompanying "human_settlement" in the property "instance of"
SELECT ?inst ?instLabel (COUNT(?hum) as ?sumHum) 
WHERE{ 
  ?hum wdt:P31 wd:Q486972;  # instance of human settlement
       wdt:P31 ?inst.       # other objects in instance
  
  MINUS {?inst wdt:P17 []}. # without country
  FILTER(?inst != wd:Q486972 ). # without human settlement
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]"}
}  
GROUP BY ?inst ?instLabel

SPARQL-запрос, 355 results (2017), 707 results (2021).

Such facilities include:

  1. Village - 2844.
  2. Municipality - 1181.
  3. Hamlet - 662.
  4. Archaeological site - 425.
  5. Locality - 425.
  6. Destroyed city - 423.
  7. City - 322.
  8. Town - 277.
  9. Abandoned village - 254.
  10. Quarter - 207.

Future work[edit | edit source]

  • Count and deduce a list of famous personalities born in the human settlements (by country).
  • Calculate and build a graph of the ratio of the total area of human settlements to the area of the country.
  • Find human settlements, founded in the XXI century.
  • Consider only those settlements that no longer exist. Construct a list of such settlement, ordered by the length of existence of the settlement.

Exercises[edit | edit source]

1 Which populated place in Russia has the lowest population density?

Aleisk town in the Altai Krai
Zarechny town in the Sverdlovsk Oblast
Barabinsk town in the Novosibirsk Oblast
Zverevo town in the Rostov Oblast

2 Choose which of the presented coats of arms belong to the settlements of the Russian Federation, and which are not.

Belong,not belong

3 What country does the panorama of this human settlement?



SPARQL-query with answers:

References[edit | edit source]


  • Maksimenko L. (2021). "Human Settlements in Russia". ProWD. Retrieved 2021-09-24.