Theory of Programming Languages/Introduction

This is a lesson in in the course, Theory of Programming Languages, which is a part of The School of Computer Science

Objective

In this lesson, students will be introduced to programming languages and the distinctions between them.

Students will be able to answer:

What is a language?
How is a programming language different than a natural language?
What is the difference between a High Level Language(HLL) and a Low Level Language(LLL)?

Introduction

Programming languages are special purpose languages used to instruct machines and express the semantics of algorithms. They were invented to make machines easier to use, as their processes could be automated in logical ways. New languages are still being written to this day to make different logical or mechanical problems easier to solve. Programming languages are strikingly different from human languages as they are highly formal and logical. Yet many of the concepts applied can be directly compared to human language. Programming languages can be divided in many ways, but the clearest distinction is between "Low Level Languages" (LLL) and "High Level Languages" (HLL).

Basic Automata Theory

In order to understand what a language is, one must also understand the state machines that correspond to each type of language. Theoretically, there exists an infinite number of languages. There exist languages for practically every application, including natural language. There exists languages for parsing the syntactic structure of DNA sequences, for example, which is a branch of Regulated Rewriting. Here, we will not go into advanced topics such as this, but the important thing to note is that everything is a language. Basic interactions with electrical devices, zipping and unzipping a backpack, making a sandwich, are all examples of strings in a language. For the computer scientist, everything is a language.

The most basic type of automaton is a Deterministic Finite Automaton (DFA). This is also called a finite-state machine, although a Turing machine also may have a finite number of states, so this is not the best term in a mathematical context. A DFA has a starting state, a set of final (accepting) states, a set of all states, an alphabet. A string is a sequence of characters from the alphabet. A language is defined as the set of all strings that will take you from the starting state to an accepting state. Thus, if a DFA has a state that is a "dead end," and doesn't lead anywhere, and is not an accepting state, then the string that transitions to that state is not in the language.

Low Level Languages

LLLs are very simple, often based directly on the machines which they control - different brands of computer chips use different LLLs. Low Level Languages are also unique in that they can be directly represented by machine code, made of electrical signals or other physical processes. These languages can be represented using simple strings of text but even with that level of detail, humans have great difficulty reading the code - they are primarily the mother tongue of the machine.

This is an example of a lower level language called MIPS assembly:

add    $t3, $a0, $t0	
lw     $t3, 0($t0)	
sub    $v0, $0, $t1
srl    $v2,$t1, $s7

Because MIPS assembly (or assembly language) is a LLL, it can be directly represented in binary form. Here is that same program written in its binary counterpart:

00000000100010000101100000100000
10001101000010110000000000000000
00000000000010010001000000100010
00000000000010010000010111000010

Binary is often written in hexadecimal format to reduce size:

As you can clearly tell, it is hard to read "008858208D0B000000091022000905C2" and discern the program behind it. This is why people tend to avoid LLLs unless they are directly working with or learning about the hardware within a computer. Around the time of the beginning of the advent of computing, Assembly language was considered high-level, and made it possible to program a microprocessor to perform complicated tasks. In this same respect, to someone who is designing computer hardware, the Assembly language is relatively high level even today.

Assembly language is assembled by an assembler into a table of address-value pairs. The best way to understand low-level programming in assembly is to assemble your program yourself by hand. Different types of instructions might take up different amounts of memory. For example, ALU instructions will usually be completed in two clock cycles. In the first clock cycle, the instruction is loaded to the instruction register. In the second clock cycle, the instruction is fed from the instruction register to the ALU, which then performs the calculation before the next clock cycle. The main reason for relying on clock cycles in CPU design (apart from storing memory in registers) is to provide a type of buffer for the digital circuits so that the propagation delay of one part of a circuit does not cause unexpected behavior in another part of the circuit. After executing the ALU (r-type) instruction, the program counter is incremented. Since the program counter might be driving the address bus, the data bus will be read from the next piece of memory.

Assembly programming can be uniquely powerful because each assembled instruction is built directly into the hardware, so performance is practically unmatched. Also, state machines can be easily mapped to equivalent assembly programs. The mathematical theory of languages is concerned primarily with 2 equivalent representations of languages: state machines and formal grammar. Thus, a solid understanding of assembly programming is vital to connecting abstract mathematical representations of languages to the physical hardware.

High Level Languages

High Level languages, while just as logical as their lower level siblings, provide other useful features. Besides being much easier for a human to read, these languages are not based on a specific machine - they are translated down by "compilers", "interpreters" and "virtual machines" into LLLs. (This process will be covered in later lessons, at this point simply understand that HLLs can run on a variety of machines while LLLs are limited to a single type.) HLLs are also less efficient than LLLs because they need to be translated to machine code, but this extra step gives a computer the chance to check code for logical errors. Generally, HLLs are easier for people to learn and are more pleasing to work with - but they are less efficient than LLLs and are more distantly connected to the machines they run on.

High Level Languages fall into several main varieties: Imperative, Object Oriented, Functional, and Logical. Later lessons will go into detail describing the pros and cons of each type.

Assignments

Make a list of 15 programming languages. Write down if they are a HLL or LLL - if they are a HLL write down which type they are (Logical, Imperative, etc).
Find out which kind of LLL each computer in your house uses (this includes cell phones). To do this you will need to look up what type of processor it uses, and then find what language that processor runs.
Make a list of your favorite programs, find out what languages they are written in.
Read The Introduction from your textbook.

Completion status: this resource is just getting off the ground. Please feel welcome to help!