Research in programming Wikidata/Programming languages

From Wikiversity
Jump to navigation Jump to search

We explore the properties of programming languages ​​based on the knowledge base of the Wikidata international project. Using SPARQL queries, computed on objects of the "programming language" type in Wikidata, a number of tasks have been solved. The list of all programming languages ​​under permissive licenses is received. A bubble diagram is constructed by the number of file formats. Maps, showing the place of formation of institutions and companies in which people, who were involved in the creation of programming languages ,​​studied or worked, are constructed. A list of all object-oriented programming languages ​​is obtained. The conclusion about the exhaustive completeness of Wikis relative to object-oriented programming languages is drawn.

Formulation of the problem[edit | edit source]

We study programming languages, in particular, information about them in Russian Wikipedia, English Wikipedia and Wikidata.

Tasks:

  1. Construct an ordered list of programming languages by the number of interlinks.
  2. Construct a list of languages by the number of visits of articles in Russian Wikipedia.
  3. Construct a directed acyclic graph of dependencies of programming languages from each other (or find cycles in dependencies, if such a graph can not be constructed). See the "influenced by" property in Java.

Instances of the "Programming Language" object[edit | edit source]

Let's build a list of all languages.

#added 2016-10
#List of `instances of` "programming language" 
SELECT ?lang ?langLabel
WHERE
{
    ?lang wdt:P31 wd:Q9143. # instances of programming language
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL-query, 732 results (2017), 1423 results (2020).

👍 The most complete and well-developed programming languages on Wikidata for 2017 were: Java, Python, C. For 2020 the most well-developed programming languages on Wikidata are: C++ (26 properties), Java (26 properties), JavaScript (25 properties), R (25 properties).

👎 Almost empty and uninformative languages for 2017 were: CLIPS, Dylan, Go!.

The disadvantage of the resulting list is that a number of objects turned out to be nameless on the Wikidata (No label defined). Let's try to get a list of languages, which "label" field will be non-empty.

#List of `instances of` "programming language" only with a label.
SELECT ?item ?item_label
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label . 

    FILTER (LANG(?item_label) = "en") . 
}

SPARQL-query, 709 results (2017), 1422 results (2020).

There are two dozen less results.

Demonstration of work with operations on sets in SPARQL[edit | edit source]

Output all programming languages that are open (free) software and / or experienced the influence of at least one of the following programming languages: C, Python, Java. At the same time, developed by any of the companies, except: Sun Microsystems, Johnson Space Center.

Used:

#2017-02
SELECT DISTINCT ?item ?item_label
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label . 

    FILTER (LANG(?item_label) = "en") . 

    {
      { ?item wdt:P737 wd:Q15777 } UNION # influenced by C
      { ?item wdt:P737 wd:Q28865 } UNION # influenced by Python
      { ?item wdt:P737 wd:Q251   } UNION # influenced by Java
      { ?item wdt:P31  wd:Q341   }
    } MINUS 
  	{ 
      { ?item wdt:P178 wd:Q14647  } UNION # developer Sun Microsystems
      { ?item wdt:P178 wd:Q208371 }       # developer Космический центр имени Линдона Джонсона
    }  
}

SPARQL-query, 115 results (2017), 122 results (2020).

Permissive licenses[edit | edit source]

We will output all programming languages under permissive licenses (practically do not limit freedom of action of users of software and developers).

Used:

#2017-03
SELECT DISTINCT ?item ?item_label
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label . 

    FILTER (LANG(?item_label) = "en") . 
  
      { ?item wdt:P275 wd:Q308915  }  UNION  # license Mozzila Public
      { ?item wdt:P275 wd:Q334661  }  UNION  # license MIT
      { ?item wdt:P275 wd:Q191307  }  UNION  # license BSD
      { ?item wdt:P275 wd:Q6905323 }         # license CC
}

SPARQL-query, 37 results (2017), 82 results (2020).

There were, for example, CoffeeScript, Go, Haml, in this list of 37 "free" languages.


Consider the relationship between permissive and proprietary or closed-licensed languages.

#2020-10-07
#The script calculates the percentage of programming languages with a free license in relation to languages with a closed license
SELECT (COUNT(?not_free)* 100 / (COUNT(?free)) as ?total) WHERE
{ 
{
    SELECT ?free WHERE 
    {
         ?free wdt:P31 wd:Q9143 # instances of programming language
         ; rdfs:label ?item_label . 

         FILTER (LANG(?item_label) = "en") . 
  
         { ?free wdt:P275 wd:Q308915  }  UNION  # license Mozzila Public
         { ?free wdt:P275 wd:Q334661  }  UNION  # license MIT
         { ?free wdt:P275 wd:Q191307  }  UNION  # license BSD
         { ?free wdt:P275 wd:Q6905323 }         # license CC
    }
}
UNION
{
    SELECT ?not_free WHERE 
    {
      ?not_free wdt:P31 wd:Q9143 # instances of programming language
      ; rdfs:label ?lang_label . 
      FILTER (LANG(?lang_label) = "en") .
  
      { ?not_free wdt:P275 wd:Q6165015 } UNION # Java Research License
      { ?not_free wdt:P275 wd:Q218616 } UNION # proprietary software
      { ?not_free wdt:P275 wd:Q3238057 } UNION # proprietary license 
      { ?not_free wdt:P275 wd:Q31202214 } UNION # proprietary software license 
      { ?not_free wdt:P275 wd:Q979794 } # Aladdin Free Public License
    }
}
}

SPARQL-запрос, for 2020 the ratio is 25%.

Number of source file formats[edit | edit source]

Depending on the programming language, the source code files for programs may have different extensions. Let's construct a bubble diagram by the number of valid formats of the source code files.

Used:

#added 2017-04
#defaultView:BubbleChart
SELECT ?lang_name (count(*) as ?count)
WHERE
{
    ?lang wdt:P31 wd:Q9143. # instance of programming language
  	?lang wdt:P1195 ?count. # file extension
  	?lang rdfs:label ?lang_name.
    filter (lang(?lang_name) = "en").
}

GROUP BY ?lang_name 
ORDER BY DESC(?count)

SPARQL-query.

Bubble chart by the number of formats of source code files (2020).
Bubble chart by the number of formats of source code files (2017).


The figure shows that the most historically rich in formats and file extensions programming languages are C++ (10 formats), Geometric Description Language (8), Racket (7). For example, files with a program in the Racket language can have the extensions rkt, rktl, rktd, scrbl, plt, ss or scm.

By 2020, languages such as REXX (6 formats), Java (5 formats), Wolfram Language (5 formats), Raku (9 formats), Geometric description language (8 formats) have also started to take the lead.

Countries in which people and organizations, associated with the creation of programming languages, live[edit | edit source]

Let's map the countries in which people and organizations, connected with the creation of programming languages, live. Noticing that the developer of the language can act both as an organization and as individuals. To determine the location (property: coordinate location) of the organization, we will use the coordinates of its headquarters (property: headquarters location), for the person - the coordinates of the place of his birth (property: place of birth).

Used:

#2017-05
#defaultView:Map
SELECT ?item_label ?developer_label ?location_label ?coord
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label.     
    FILTER (LANG(?item_label) = "en"). 
  
    ?item wdt:P178 ?developer. # developer
    ?developer rdfs:label ?developer_label. 
    FILTER (LANG(?developer_label) = "en"). 
      		
    { ?developer wdt:P159 ?location. } UNION # headquarters location
    { ?developer wdt:P19  ?location  }       # place of birth
    ?location rdfs:label ?location_label. 
    FILTER (LANG(?location_label) = "en").
    
    ?location wdt:P625 ?coord. # coordinate location

    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en".
    }   	
}

SPARQL-query.

The most favorable countries for the emergence of people capable of developing programming languages (2020)

We will also construct a bubble chart to identify the most favorable countries for the emergence of people capable of developing programming languages and locating headquarters in these countries. We see in the figure that the most favorable countries were the United States (159 people and the headquarters of the apartments) and the United Kingdom (15). In Russia, only two programming languages were developed: Refal and the Embedded Programming Language 1C: Enterprise.

For 2020, the number of headquarters in the US is 241, in the UK - 24, in France - 18, and in Russia - 5.

Universities where people who developed programming languages studied[edit | edit source]

Let's display on the map educational institutions, in which students, who subsequently developed programming languages, studied.

Used:

#2017-05
#defaultView:Map
SELECT ?item_label ?developer_label ?educational_institution_label ?coord
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label. 
    FILTER (LANG(?item_label) = "en"). 
    
    ?item wdt:P178 ?developer. # developer
    ?developer rdfs:label ?developer_label. 
    FILTER (LANG(?developer_label) = "en"). 
    	
    ?developer wdt:P69 ?educational_institution. # educated at
    ?educational_institution rdfs:label ?educational_institution_label. 
    FILTER (LANG(?educational_institution_label) = "en").
    
    ?educational_institution wdt:P625 ?coord. # coordinate location
    
    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en".
    } 	
}

SPARQL-query, 142 results (2017), 282 results (2020).

The map shows that most of the people involved in the creation of programming languages studied in Europe or the United States.

Let's construct a bubble chart for the most popular educational institutions, among future developers of programming languages. You can see in the figure that the first places were: Princeton University (8) and Stanford University (8). MSU was at the end of the list, Tony Hoare, who developed ALGOL60, and Valentin Turchin, who developed Refal, studied there. Moscow State University was included in this list, which includes 142 universities of the world.

Professions of the creators of programming languages[edit | edit source]

Let's construct a bubble diagram showing which professions prevail among people who develop programming languages.

Used:

#2017-05
#defaultView:BubbleChart
SELECT ?occupation_label (count(*) as ?occupation)
WHERE
{
    ?item wdt:P31 wd:Q9143. # instances of programming language 
    ?item wdt:P178 ?developer. # developer
    ?developer wdt:P106 ?occupation. # occupation
    ?occupation rdfs:label ?occupation_label. 
    FILTER (LANG(?occupation_label) = "en"). 
}
GROUP BY ?occupation_label 
ORDER BY DESC(?count)

SPARQL-query, 48 results (2017), 74 results (2020).

Which professions prevail among people developing programming languages. (2020).
Which professions prevail among people developing programming languages.(2017).

The most common professions were: a specialist in computer science, an engineer, a teacher. It is interesting to note that there are such professions as: jazz musician, politician (Herbert A. Simon). In 2020, among the developers of programming languages, there were the most specialists in the field of computer science (172 people), as well as 96 engineers, 57 teachers, 56 programmers and 43 mathematicians.

Object-oriented programming languages[edit | edit source]

List all object-oriented programming languages.

Used:

#2017-4
SELECT DISTINCT ?item ?item_label
WHERE
{
    ?item wdt:P31 wd:Q899523 # instances of object-oriented programming language
    ; rdfs:label ?item_label . 

    FILTER (LANG(?item_label) = "en") . 
}

SPARQL-query, 116 results (2017), 118 results (2020).

Thus, 16% of programming languages are object-oriented.

Fullness of Wikidata[edit | edit source]

According to the Bourabai Research University [1], there are at least 26 programming languages ​​that support an object-oriented paradigm. In the articles devoted to object-oriented programming, another 4[2] and 3[3] programming languages ​​are added to this list. The SPARQL-query returned 116 results. It is difficult to judge the completeness of the data in the three sources cited above, since there are a large number of little-known, obsolete and narrowly focused languages ​​that are not covered in authoritative sources. From this it can be concluded that Wikidata provides a fairly complete list of object-oriented programming languages.

Filling objects[edit | edit source]

Let's list all people who are involved in the development of programming languages and whose objects are filled with the 'label' field in English:

#2017-05
SELECT ?item_label ?item ?developer_label ?developer
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label. 
    FILTER (LANG(?item_label) = "en"). 

    ?item wdt:P178 ?developer. # developer 
    ?developer wdt:P31 wd:Q5.  # instances of human
    ?developer rdfs:label ?developer_label. 
    FILTER (LANG(?developer_label) = "en").  
}

SPARQL-query, 133 results (2017), 223 results (2020).

On 21.05.17 there are 133 such results. We will derive a similar list, but with a filled-in 'label' field in Russian. There are 88 such results. Filling in the fields label and description in Russian for these objects and printing the result:

#2017-05
SELECT ?item_label ?item ?developer_label ?developer
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label. 
    FILTER (LANG(?item_label) = "en"). 

    ?item wdt:P178 ?developer. # developer 
    ?developer wdt:P31 wd:Q5.  # instances of human
    ?developer rdfs:label ?developer_label. 
    FILTER (LANG(?developer_label) = "en").  
}

SPARQL-query, 133 results (2017), 183 results (2020).

Future work[edit | edit source]

  1. Output all programming languages with the "mascot character" property.
  2. Calculate the number of programming languages founded before 1992 (property: "inception").
  3. Construct a bar chart that shows the number of known hashtags in Twitter for each programming language (property: "Twitter hashtag").

Tests[edit | edit source]

1 Relate the programming language and its developer.

J. Ichbiah C. Moore J. Armstrong
Ada
Forth
Erlang

2 Select the logo of the programming language LOLCODE:

KTurtle logo.svg
Camelia.svg
LOLCode logo.png
Micropython-logo.svg

3 Fill the gaps.

Fortran is in the first place in the number of its dialects. Their number reaches the order of

. In the second place Lisp -

dialects. The third place is shared by Standard ML and Object Pascal with

dialects.


SPARQL queries with replies:

References[edit | edit source]

  1. Object-oriented programming (OOP-1).
  2. Object-oriented programming.
  3. First languages of object-oriented programming (OOP).
  • "Object-oriented programming (OOP-1)". bourabai.ru.
  • "Object-oriented programming". fandom.wikia.com.
  • Igor Garshin. "First languages of object-oriented programming (OOP)".
  • Burdin Grigoriy. "Programming languages". ProWD. Retrieved 2020-09-28.

Links[edit | edit source]