Research in programming Wikidata/Programming languages

From Wikiversity
Jump to navigation Jump to search

We explore the properties of programming languages ​​based on the knowledge base of the Wikidata international project. Using SPARQL queries, computed on objects of the "programming language" type in Wikidata, a number of tasks have been solved. The list of all programming languages ​​under permissive licenses is received. A bubble diagram is constructed by the number of file formats. Maps, showing the place of formation of institutions and companies in which people, who were involved in the creation of programming languages ,​​studied or worked, are constructed. A list of all object-oriented programming languages ​​is obtained. The conclusion about the exhaustive completeness of Wikis relative to object-oriented programming languages is drawn.

Formulation of the problem[edit]

We study programming languages, in particular, information about them in Russian Wikipedia, English Wikipedia and Wikidata.

Tasks:

  1. Construct an ordered list of programming languages by the number of interlinks.
  2. Construct a list of languages by the number of visits of articles in Russian Wikipedia.
  3. Construct a directed acyclic graph of dependencies of programming languages from each other (or find cycles in dependencies, if such a graph can not be constructed). See the "influenced by" property in Java.

Instances of the "Programming Language" object[edit]

Let's build a list of all languages.

#added 2016-10
#List of `instances of` "programming language" 
SELECT ?lang ?langLabel
WHERE
{
    ?lang wdt:P31 wd:Q9143. # instances of programming language
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL-query, 732 results.

👍> The most complete and well-developed programming languages on Wikidata are: Java, Python, C.

👎> Almost empty and uninformative languages were: CLIPS, Dylan, Go!.

The disadvantage of the resulting list is that a number of objects turned out to be nameless on the Wikidata (No label defined). Let's try to get a list of languages, which "label" field will be non-empty.

#List of `instances of` "programming language" only with a label.
SELECT ?item ?item_label
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label . 

    FILTER (LANG(?item_label) = "en") . 
}

SPARQL-query, 709 results.

There are two dozen less results.

Demonstration of work with operations on sets in SPARQL[edit]

Output all programming languages that are open (free) software and / or experienced the influence of at least one of the following programming languages: C, Python, Java. At the same time, developed by any of the companies, except: Sun Microsystems, Johnson Space Center.

Used:

#2017-02
SELECT DISTINCT ?item ?item_label
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label . 

    FILTER (LANG(?item_label) = "en") . 

    {
      { ?item wdt:P737 wd:Q15777 } UNION # influenced by C
      { ?item wdt:P737 wd:Q28865 } UNION # influenced by Python
      { ?item wdt:P737 wd:Q251   } UNION # influenced by Java
      { ?item wdt:P31  wd:Q341   }
    } MINUS 
  	{ 
      { ?item wdt:P178 wd:Q14647  } UNION # developer Sun Microsystems
      { ?item wdt:P178 wd:Q208371 }       # developer Космический центр имени Линдона Джонсона
    }  
}

SPARQL-query, 115 results.

Permissive licenses[edit]

We will output all programming languages under permissive licenses (practically do not limit freedom of action of users of software and developers).

Used:

#2017-03
SELECT DISTINCT ?item ?item_label
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label . 

    FILTER (LANG(?item_label) = "en") . 
  
      { ?item wdt:P275 wd:Q308915  }  UNION  # license Mozzila Public
      { ?item wdt:P275 wd:Q334661  }  UNION  # license MIT
      { ?item wdt:P275 wd:Q191307  }  UNION  # license BSD
      { ?item wdt:P275 wd:Q6905323 }         # license CC
}

SPARQL-query, 37 results.

There were, for example, CoffeeScript, Go, Haml, in this list of 37 "free" languages.

Number of source file formats[edit]

Depending on the programming language, the source code files for programs may have different extensions. Let's construct a bubble diagram by the number of valid formats of the source code files.

Used:

#added 2017-04
#defaultView:BubbleChart
SELECT ?lang_name (count(*) as ?count)
WHERE
{
    ?lang wdt:P31 wd:Q9143. # instance of programming language
  	?lang wdt:P1195 ?count. # file extension
  	?lang rdfs:label ?lang_name.
    filter (lang(?lang_name) = "en").
}

GROUP BY ?lang_name 
ORDER BY DESC(?count)

SPARQL-query.

Bubble chart by the number of formats of source code files


The figure shows that the most historically rich in formats and file extensions programming languages are C++ (10 formats), Geometric Description Language (8), Racket (7). For example, files with a program in the Racket language can have the extensions rkt, rktl, rktd, scrbl, plt, ss or scm.

Countries in which people and organizations, associated with the creation of programming languages, live[edit]

Let's map the countries in which people and organizations, connected with the creation of programming languages, live. Noticing that the developer of the language can act both as an organization and as individuals. To determine the location (property: coordinate location) of the organization, we will use the coordinates of its headquarters (property: headquarters location), for the person - the coordinates of the place of his birth (property: place of birth).

Used:

#2017-05
#defaultView:Map
SELECT ?item_label ?developer_label ?location_label ?coord
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label.     
    FILTER (LANG(?item_label) = "en"). 
  
    ?item wdt:P178 ?developer. # developer
    ?developer rdfs:label ?developer_label. 
    FILTER (LANG(?developer_label) = "en"). 
      		
    { ?developer wdt:P159 ?location. } UNION # headquarters location
    { ?developer wdt:P19  ?location  }       # place of birth
    ?location rdfs:label ?location_label. 
    FILTER (LANG(?location_label) = "en").
    
    ?location wdt:P625 ?coord. # coordinate location

    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en".
    }   	
}

SPARQL-query.

Countries in which people and organizations, associated with the creation of programming languages, live


We will also construct a bubble chart to identify the most favorable countries for the emergence of people capable of developing programming languages and locating headquarters in these countries. We see in the figure that the most favorable countries were the United States (159 people and the headquarters of the apartments) and the United Kingdom (15). In Russia, only two programming languages were developed: Refal and the Embedded Programming Language 1C: Enterprise.

Universities where people who developed programming languages studied[edit]

Let's display on the map educational institutions, in which students, who subsequently developed programming languages, studied.

Used:

#2017-05
#defaultView:Map
SELECT ?item_label ?developer_label ?educational_institution_label ?coord
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label. 
    FILTER (LANG(?item_label) = "en"). 
    
    ?item wdt:P178 ?developer. # developer
    ?developer rdfs:label ?developer_label. 
    FILTER (LANG(?developer_label) = "en"). 
    	
    ?developer wdt:P69 ?educational_institution. # educated at
    ?educational_institution rdfs:label ?educational_institution_label. 
    FILTER (LANG(?educational_institution_label) = "en").
    
    ?educational_institution wdt:P625 ?coord. # coordinate location
    
    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en".
    } 	
}

SPARQL-query, 142 results.

Universities where people who developed programming languages studied


Let's construct a bubble chart for the most popular educational institutions, among future developers of programming languages. You can see in the figure that the first places were: Princeton University (8) and Stanford University (8). MSU was at the end of the list, Tony Hoare, who developed ALGOL60, and Valentin Turchin, who developed Refal, studied there. Moscow State University was included in this list, which includes 142 universities of the world.

Professions of the creators of programming languages[edit]

Let's construct a bubble diagram showing which professions prevail among people who develop programming languages.

Used:

#2017-05
#defaultView:BubbleChart
SELECT ?occupation_label (count(*) as ?occupation)
WHERE
{
    ?item wdt:P31 wd:Q9143. # instances of programming language 
    ?item wdt:P178 ?developer. # developer
    ?developer wdt:P106 ?occupation. # occupation
    ?occupation rdfs:label ?occupation_label. 
    FILTER (LANG(?occupation_label) = "en"). 
}
GROUP BY ?occupation_label 
ORDER BY DESC(?count)

SPARQL-query, 48 results.

Professions of the creators of programming languages


The most common professions were: a specialist in computer science, an engineer, a teacher. It is interesting to note that there are such professions as: jazz musician, politician (Herbert A. Simon).

Object-oriented programming languages[edit]

List all object-oriented programming languages.

Used:

#2017-4
SELECT DISTINCT ?item ?item_label
WHERE
{
    ?item wdt:P31 wd:Q899523 # instances of object-oriented programming language
    ; rdfs:label ?item_label . 

    FILTER (LANG(?item_label) = "en") . 
}

SPARQL-query, 116 results.

Thus, 16% of programming languages are object-oriented.

Fullness of Wikidata[edit]

According to the Bourabai Research University [1], there are at least 26 programming languages ​​that support an object-oriented paradigm. In the articles devoted to object-oriented programming, another 4[2] and 3[3] programming languages ​​are added to this list. The SPARQL-query returned 116 results. It is difficult to judge the completeness of the data in the three sources cited above, since there are a large number of little-known, obsolete and narrowly focused languages ​​that are not covered in authoritative sources. From this it can be concluded that Wikidata provides a fairly complete list of object-oriented programming languages.

Filling objects[edit]

Let's list all people who are involved in the development of programming languages and whose objects are filled with the 'label' field in English:

#2017-05
SELECT ?item_label ?item ?developer_label ?developer
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label. 
    FILTER (LANG(?item_label) = "en"). 

    ?item wdt:P178 ?developer. # developer 
    ?developer wdt:P31 wd:Q5.  # instances of human
    ?developer rdfs:label ?developer_label. 
    FILTER (LANG(?developer_label) = "en").  
}

SPARQL-query, 133 results.

On 21.05.17 there are 133 such results. We will derive a similar list, but with a filled-in 'label' field in Russian. There are 88 such results. Filling in the fields label and description in Russian for these objects and printing the result:

#2017-05
SELECT ?item_label ?item ?developer_label ?developer
WHERE
{
    ?item wdt:P31 wd:Q9143 # instances of programming language
    ; rdfs:label ?item_label. 
    FILTER (LANG(?item_label) = "en"). 

    ?item wdt:P178 ?developer. # developer 
    ?developer wdt:P31 wd:Q5.  # instances of human
    ?developer rdfs:label ?developer_label. 
    FILTER (LANG(?developer_label) = "ru").  
}

SPARQL-query, 133 results.

Future work[edit]

  1. Output all programming languages with the "mascot character" property.
  2. Calculate the number of programming languages founded before 1992 (property: "inception").
  3. Construct a bar chart that shows the number of known hashtags in Twitter for each programming language (property: "Twitter hashtag").

Tests[edit]

1

Relate the programming language and its developer.

J. Ichbiah C. Moore J. Armstrong
Ada
Forth
Erlang

2

Select the logo of the programming language LOLCODE:

KTurtle logo.svg
Camelia.svg
LOLCode logo.png
Micropython-logo.svg

3

Fill the gaps.

Fortran is in the first place in the number of its dialects. Their number reaches the order of

. In the second place Lisp -

dialects. The third place is shared by Standard ML and Object Pascal with

dialects.


SPARQL queries with replies:

References[edit]

  1. Object-oriented programming (OOP-1).
  2. Object-oriented programming.
  3. First languages of object-oriented programming (OOP).
  • "Object-oriented programming (OOP-1)". bourabai.ru.
  • "Object-oriented programming". fandom.wikia.com.
  • Igor Garshin. "First languages of object-oriented programming (OOP)".

Links[edit]