Internationalisation of Programming Languages
|This is a research project at Wikiversity.|
This research project explore the topic of programming languages internationalisation. That is Internationalization and localization of programming languages themselves, in contrast with how this topic is covered within this kind of languages to generate localised end user human interfaces.
The goals of this research include:
- explaining issues related to this topic
- gathering a state of the art on the topic
- exposing prominent difficulties that this field include, both on a social and technical point of view
- exploring existing theoretic solutions to the previous problems, and possibly make original proposals
- classify effective implementation of internationalized programming languages and compilers
Although not in the core topic of this research, programming languages using a single-language lexicon might be explored if they offer compatibility facilities with programming languages based on a different lexicon.
This research does not aim at covering any extra topic, particularly it won't cover the broader topic of compilers and programming languages outside of internationalisation and localisation.
- 1 Overview of Related and Underlying Issues
- 2 State of the Art
- 2.1 Assembly language
- 2.2 Basic
- 2.3 C
- 2.4 C#
- 2.5 C++
- 2.6 Go
- 2.7 Java
- 2.9 MATLAB
- 2.10 Objective-C
- 2.11 Pascal
- 2.12 Perl
- 2.13 PHP
- 2.14 PL/SQL
- 2.15 Python
- 2.16 R
- 2.17 Ruby
- 2.18 Rust
- 2.19 Scala
- 2.20 Scratch
- 2.21 Shell
- 2.22 Swift
- 2.23 TypeScript
- 3 Noticeable Difficulties
- 4 Theoretic solutions
- 5 Effective implementations
- 6 Related resources
- 7 Notes
Overview of Related and Underlying Issues
So whenever possible, meditation of the impact of our choices regarding the linguistic diversity and multilingualism on Internet matters. Programming languages are an important part of the infrastructure of the Internet, and of the digital revolution as a whole. In the light of this issues, allowing people to code software using whichever lexical inventory they prefer should be an important concern in every programming languages. On a side note, it's clear that protocols are also an important part of the digital infrastructure. And some of them do include speaking language mnemonics, like
HELO in SMTP. Other protocol relies more or less extensively on a digital nomenclature, like HTTP, which is a far more internationalization-friendly approach. However the the topic of protocols is mostly out of the scope of this research, and won't be further developed hereafter.
Although most programming languages will only require a few tens mandatory keywords, most of the time it's the whole ecosystem built around this small kernel which is reinforcing the positive feedback loop of using an unique spoken language as lexical source in the abstracted interface that constitute a program source code.
Of course, they are many cases where using a language largely spread at international level comes with many benefits. This research project don't deny this positive aspects of an international common working language. This includes a wider usefulness of code through usability in many more situations through minor adaptations to locales constraints, as well as larger reviewing potentials for libre software projects.
Thus said, this advantages are not always so useful. Many users might face the need to make little programmatic tasks for very specific cases, in the form of very short code. Or some projects are heavily linked to locale contexts, like a system pertaining to some locale laws. Other cases, such as educational purposes, could illustrate situations where using an non-native language might be an unnecessary overload on the programming task.
On the other hand proliferation of completely unrelated in-house languages or incompatible derivative localized programming languages combine drawbacks of both situations. Thus the idea of internationalisation and localisation of programming languages.
State of the Art
This section list past and existing technologies relative to the research topic. Each section will make a review as comprehensive as possible for each language documented. No restriction is set for languages covered. Priority will be set to on most popular languages and languages including the most advanced features on the topic. Any contribution for coverage any programming language is welcome. Subsection are alphabetically ordered[note 1].
The research will notably looking at the top 20 of the TIOBE index and the most popular languages on open platforms of source code sharing such as Github.
Assembly is generally a simple map between a set of words and a list of numbered operators, although some assembly will provide "macro" keywords performing typical sequence of operator in once command. So for this kind of language, a translexicalisation to any language is rather straight forward as it is the modus operandi of such a programming language anyway.
Visual Basic .NET
- GopherCon 2017: Aditya Mukerjee - Translating Go to Other (Human) Languages, and Back Again - YouTube
Perl 6 comes with native facilities for creating grammar which will be consecutively fully integrated in the interpretation stack. This is called a slang. It can be something as simple as a relexicalisation, like Mosdef which enable to use def rather than sub as keyword to introduce a new function. But it is flexible enough to parse a whole programming language such as Perl 5, which is based on completely unrelated technology stack despite the name, in the v5 slang. Actually, even Perl 6 is parsed and executed using a Perl 6-style grammar.
Rust currently doesn't integrate any facility for internationalisation and localisation of its programming language. Discussions on the topic are however ongoing :
- The internationalization of Rust itself - community - The Rust Programming Language Forum
- Localization team discussion issue · Issue #178 · rust-community/team
- Pre-RFC for the official localisation team by sebasmagri · Pull Request #2 · rust-community/localisation
- English in computing
- International Components for Unicode
- ISO/IEC 15897
- Non-English-based programming languages
- Internationalization of Compilers
- The Open Group Base Specifications Issue 7 IEEE Std 1003.1-2008, 2016 Edition – Internationalization Variables
- Lexicalization and Institutionalization – The State of the Art in 2004
- Grammars – Perl 6 Website
- Using the
LANG=C sortcollocation if further ordering is required