Jump to content

Research in programming Wikidata/Programming languages

From Wikiversity

We explore the properties of programming languages ​​based on the knowledge base of the Wikidata international project. Using SPARQL queries, computed on objects of the "programming language" type in Wikidata, a number of tasks have been solved. The list of all programming languages ​​under permissive licenses is received. A bubble diagram is constructed by the number of file formats. Maps, showing the place of formation of institutions and companies in which people, who were involved in the creation of programming languages ,​​studied or worked, are constructed. A list of all object-oriented programming languages ​​is obtained. The conclusion about the exhaustive completeness of Wikis relative to object-oriented programming languages is drawn.

Formulation of the problem

[edit | edit source]

We study programming languages, in particular, information about them in Russian Wikipedia, English Wikipedia and Wikidata.

Tasks:

  1. Build a list of programming languages.
  2. Find the percentage of free to closed languages.
  3. Show on the map the place of study and residence of developers of programming languages.

Objects Used in SPARQL Queries

[edit | edit source]

Properties Used in SPARQL Queries

[edit | edit source]

Instances of the "Programming Language" object

[edit | edit source]

Let's build a list of all languages.

#List of `instances of` "programming language" 
SELECT ?lang ?langLabel
WHERE
{
    ?lang wdt:P31 wd:Q9143. # instances of programming language
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL-query, 732 results (2017), 1422 results (2020).

👍 The most complete and well-developed programming languages on Wikidata for 2017 were: Java, Python, C. For 2020 the most well-developed programming languages on Wikidata are: C++ (26 properties), Java (26 properties), JavaScript (25 properties), R (25 properties).

👎 Almost empty and uninformative languages for 2017 were: CLIPS, Dylan, Go!.

The disadvantage of the resulting list is that a number of objects turned out to be nameless on the Wikidata (No label defined). Let's try to get a list of languages, which "label" field will be non-empty.

#List of `instances of` "programming language" only with a label.
SELECT ?lang ?langLabel
WHERE
{
    ?lang wdt:P31 wd:Q9143
    ; rdfs:label ?langLabel FILTER (LANG(?langLabel) = "en") . 
}

SPARQL-query, 709 results (2017), 1422 results (2020).

There were two dozen less results in 2017, but all languages in 2020 have labels.

Demonstration of work with operations on sets in SPARQL

[edit | edit source]

Output all programming languages that are open (free) software or influenced by at least one of the following programming languages: C, Python, Java and not developed by any of the companies, except: Sun Microsystems, Johnson Space Center.

SELECT DISTINCT ?lang ?langLabel
WHERE
{
    ?lang wdt:P31 wd:Q9143.
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }.
    {
      { ?lang wdt:P737 wd:Q15777 } UNION
      { ?lang wdt:P737 wd:Q28865 } UNION 
      { ?lang wdt:P737 wd:Q251   } UNION
      { ?lang wdt:P31 wd:Q341    }
    } MINUS 
  	{ 
      { ?lang wdt:P178 wd:Q14647  } UNION
      { ?lang wdt:P178 wd:Q208371 }
    }   
}

SPARQL-query, 115 results (2017), 122 results (2020).

Permissive licenses

[edit | edit source]

We will output all programming languages under permissive licenses (practically do not limit freedom of action of users of software and developers).

SELECT DISTINCT ?lang ?langLabel
WHERE
{
    ?lang wdt:P31 wd:Q9143
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
    { ?lang wdt:P275 wd:Q308915  }  UNION  # license Mozzila Public
    { ?lang wdt:P275 wd:Q334661  }  UNION  # license MIT
	{ ?lang wdt:P275 wd:Q191307  }  UNION  # license BSD
	{ ?lang wdt:P275 wd:Q6905323 }         # license CC
}

SPARQL-query, 37 results (2017), 82 results (2020).

There were, for example, CoffeeScript, Go, Haml, in this list of 37 "free" languages.


Consider the relationship between permissive and proprietary or closed-licensed languages.

#The script calculates the percentage of programming languages with a free license in relation to languages with a closed license
SELECT (COUNT(?notFree)* 100 / (COUNT(?free)) as ?total) WHERE
{{
    SELECT ?free WHERE {
         ?free wdt:P31 wd:Q9143 # instances of programming language
         SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }.
  
         { ?free wdt:P275 wd:Q308915  }  UNION  # license Mozzila Public
         { ?free wdt:P275 wd:Q334661  }  UNION  # license MIT
         { ?free wdt:P275 wd:Q191307  }  UNION  # license BSD
         { ?free wdt:P275 wd:Q6905323 }         # license CC
    }} UNION {SELECT ?notFree WHERE {
      ?notFree wdt:P31 wd:Q9143 # instances of programming language
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }.
      { ?notFree wdt:P275 wd:Q6165015 } UNION # Java Research License
      { ?notFree wdt:P275 wd:Q218616 } UNION # proprietary software
      { ?notFree wdt:P275 wd:Q3238057 } UNION # proprietary license 
      { ?notFree wdt:P275 wd:Q31202214 } UNION # proprietary software license 
      { ?notFree wdt:P275 wd:Q979794 } # Aladdin Free Public License
    }
}}

SPARQL-запрос, for 2020 the ratio closed languages to free is 25%.

Number of source file formats

[edit | edit source]

Depending on the programming language, the source code files for programs may have different extensions. Let's construct a bubble diagram by the number of valid formats of the source code files.

#defaultView:BubbleChart
SELECT ?langLabel (count(*) as ?count)
WHERE {
 ?lang wdt:P31 wd:Q9143.
 ?lang wdt:P1195 ?count.
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }.
}
GROUP BY ?langLabel
ORDER BY DESC(?count)

SPARQL-query.

Bubble chart by the number of formats of source code files (2020).
Bubble chart by the number of formats of source code files (2017).

The figure shows that the most historically rich in formats and file extensions programming languages are C++ (10 formats), Geometric Description Language (8), Racket (7). For example, files with a program in the Racket language can have the extensions rkt, rktl, rktd, scrbl, plt, ss or scm.

By 2020, languages such as REXX (6 formats), Java (5 formats), Wolfram Language (5 formats), Raku (9 formats), Geometric description language (8 formats) have also started to take the lead.

Countries in which developers and organizations, associated with the creation of programming languages, live

[edit | edit source]

Let's map the countries in which people and organizations, connected with the creation of programming languages, live. Noticing that the developer of the language can act both as an organization and as individuals. To determine the location (property: coordinate location) of the organization, we will use the coordinates of its headquarters (property: headquarters location), for the person - the coordinates of the place of his birth (property: place of birth).

#defaultView:Map
SELECT ?langLabel ?developerLabel ?locationLabel ?coord
WHERE {
  ?lang wdt:P31 wd:Q9143. # instances of programming language
  ?lang wdt:P178 ?developer. # developer
  { ?developer wdt:P159 ?location. } UNION # headquarters location
  { ?developer wdt:P19 ?location. } # place of birth
  ?location wdt:P625 ?coord. # coordinate location
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } 	
}

SPARQL-query.

The most favorable countries for the emergence of people capable of developing programming languages (2020)
The most favorable countries for the emergence of people capable of developing programming languages (2020)

We will also construct a bubble chart to identify the most favorable countries for the emergence of people capable of developing programming languages and locating headquarters in these countries. We see in the figure that the most favorable countries were the United States (159 people and the headquarters of the apartments) and the United Kingdom (15). In Russia, only two programming languages were developed: Refal and the Embedded Programming Language 1C: Enterprise.

For 2020, the number of headquarters in the US is 241, in the UK - 24, in France - 18, and in Russia - 5.

Universities where people who developed programming languages studied

[edit | edit source]

Let's display on the map educational institutions, in which students, who subsequently developed programming languages, studied.

#defaultView:Map
SELECT ?langLabel ?developerLabel ?educationalInstitutionLabel ?coord
WHERE
{
  ?lang wdt:P31 wd:Q9143. # instances of programming language
  ?lang wdt:P178 ?developer. # developer		
  ?developer wdt:P69 ?educationalInstitution. # educated at
  ?educationalInstitution wdt:P625 ?coord. # coordinate location
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } 	
}

SPARQL-query, 142 results (2017), 282 results (2020).

The map shows that most of the people involved in the creation of programming languages studied in Europe or the United States.

Let's construct a bubble chart for the most popular educational institutions, among future developers of programming languages. You can see in the figure that the first places were: Princeton University (8) and Stanford University (8). MSU was at the end of the list, Tony Hoare, who developed ALGOL60, and Valentin Turchin, who developed Refal, studied there. Moscow State University was included in this list, which includes 142 universities of the world.

Professions of the creators of programming languages

[edit | edit source]

Let's construct a bubble diagram showing which professions prevail among people who develop programming languages.

#defaultView:BubbleChart
SELECT ?occupationLabel (count(*) as ?occupation)
WHERE {
 ?lang wdt:P31 wd:Q9143. # instances of programming language 
 ?lang wdt:P178 ?developer. # developer
 ?developer wdt:P106 ?occupation. # occupation
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
GROUP BY ?occupationLabel 
ORDER BY DESC(?count)

SPARQL-query, 48 results (2017), 74 results (2020).

Which professions prevail among people developing programming languages. (2020).
Which professions prevail among people developing programming languages.(2017).

The most common professions were: a specialist in computer science, an engineer, a teacher. It is interesting to note that there are such professions as: jazz musician, politician (Herbert A. Simon). In 2020, among the developers of programming languages, there were the most specialists in the field of computer science (172 people), as well as 96 engineers, 57 teachers, 56 programmers and 43 mathematicians.

Object-oriented programming languages

[edit | edit source]
Popularity of programming paradigms for 2020

In addition to the programming languages ​​themselves, Wikidata also describes programming paradigms. With the help of the script https://w.wiki/oLg and the illustration, you can see that by the number of programming languages, the most popular is object-oriented programming (399 languages ​​for 2020), followed by procedural languages (297 languages ​​for 2020). It is worth noting that multi-paradigm programming (programming with the simultaneous use of multiple paradigms) is also represented by a large number of programming languages.

Let's list all the object-oriented programming languages.

SELECT DISTINCT ?lang ?langLabel
WHERE
{
 ?lang wdt:P31 wd:Q899523 # instances of object-oriented programming language
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL-query, 116 results (2017), 118 results (2020).

Thus, 16% of programming languages are object-oriented.

Fullness of Wikidata

[edit | edit source]

According to the Bourabai Research University [1], there are at least 26 programming languages ​​that support an object-oriented paradigm. In the articles devoted to object-oriented programming, another 4[2] and 3[3] programming languages ​​are added to this list. The SPARQL-query returned 116 results. It is difficult to judge the completeness of the data in the three sources cited above, since there are a large number of little-known, obsolete and narrowly focused languages ​​that are not covered in authoritative sources. From this it can be concluded that Wikidata provides a fairly complete list of object-oriented programming languages.

Filling objects

[edit | edit source]

Let's list all people who are involved in the development of programming languages and whose objects are filled with the 'label' field in English:

SELECT ?langLabel ?lang ?developerLabel ?developer
WHERE
{
 ?lang wdt:P31 wd:Q9143. # instances of programming language
 ?lang wdt:P178 ?developer. # developer 
 ?developer wdt:P31 wd:Q5. # instances of human
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

SPARQL-query, 133 results (2017), 223 results (2020).

For 2017 there are were 133 such results. We will derive a similar list, but with a filled-in 'label' field in Russian. There are 88 such results. Filling in the fields label and description in Russian for these objects and printing the result:

SELECT ?langLabel ?lang ?developerLabel ?developer
WHERE
{
 ?lang wdt:P31 wd:Q9143. # instances of programming language
 ?lang wdt:P178 ?developer # developer 
 ; rdfs:label ?developerLabel FILTER (LANG(?developerLabel) = "ru").
 ?developer wdt:P31 wd:Q5 # instances of human
 ; rdfs:label ?langLabel FILTER (LANG(?langLabel) = "ru")
}

SPARQL-query, 133 results (2017), 183 results (2020).

Future work

[edit | edit source]
  1. Output all programming languages with the "mascot character" property.
  2. Calculate the number of programming languages founded before 1992 (property: "inception").
  3. Construct a bar chart that shows the number of known hashtags in Twitter for each programming language (property: "Twitter hashtag").
  4. Construct an ordered list of programming languages by the number of interlinks.
  5. Construct a list of languages by the number of visits of articles in Russian Wikipedia.
  6. Construct a directed acyclic graph of dependencies of programming languages from each other (or find cycles in dependencies, if such a graph can not be constructed). See the "influenced by" property in Java.

Tests

[edit | edit source]

1 Relate the programming language and its developer.

J. Ichbiah C. Moore J. Armstrong
Ada
Forth
Erlang

2 Select the logo of the programming language LOLCODE:

3 Fill the gaps.

Fortran is in the first place in the number of its dialects. Their number reaches the order of

. In the second place Lisp -

dialects. The third place is shared by Standard ML and Object Pascal with

dialects.


SPARQL queries with replies:

References

[edit | edit source]
  1. Object-oriented programming (OOP-1).
  2. Object-oriented programming.
  3. First languages of object-oriented programming (OOP).
  • "Object-oriented programming (OOP-1)". bourabai.ru.
  • "Object-oriented programming". fandom.wikia.com.
  • Igor Garshin. "First languages of object-oriented programming (OOP)".
  • Burdin Grigoriy. "Programming languages". ProWD. Retrieved 2020-09-28.
[edit | edit source]