Arrays, strings and records

From Wikiversity
Jump to navigation Jump to search

Course Navigation

<< Previous - Procedures and functions Next - Basic I/O >>


To understand this section, let's study this step by step.[edit | edit source]

There are two parts that you may consider while programming. The first part is how the computer calculates or treats your data. The second part is how the user is viewing the result of it. Here we will discuss how the data is treated in the background. This is the part that a programmer must study, because this determines everything that the user will see, and how they will see it.

Terminology[edit | edit source]

Variable[edit | edit source]

Computers can only handle binary patterns (i.e. numbers), but a variable has a type which determines how the binary patterns/number are treated and viewed. For example, an integer variable would be viewed as a plain whole number, a real/float variable's binary pattern will be decoded to the appropriate real number using appropriate encoding (such as IEEE 754), a character variable would be viewed as the letter of the alphabet that corresponds to the number according to a certain encoding (such as ASCII or utf-8). A computer uses many tricks to represent strings and other complex data as numbers, the details of which is determined by the particular architecture of the computer and the level of abstraction that the programming language uses.

Null (variably called Nil, None, Nothing)[edit | edit source]

Null (or its many synonyms) is a special value assigned to variables that have not been assigned any value (a bit of circular logic here, but you get the point).

Bit, Byte, and Word[edit | edit source]

Bit is a single binary digit, it may be 0 or 1. Byte is 8-digit binary number, it can represent numbers from 0 to 255. Word depends on the architecture of the CPU, it is the largest chunk of data that the CPU can process natively, common Word sizes are 8-bit, 16-bit, 32-bit, and 64-bit.

Arrays, Strings, and Records[edit | edit source]

An array is internally represented as a contiguous block of memory reserved to contain multiple variables of the same type. A string is an array of bytes, it is usually used to hold human readable messages because a byte corresponds well to the de facto standard encoding (ASCII), but string is not limited to characters, it can contain any arbitrary binary data.

The Difference Between Array and Variable[edit | edit source]

A variable is a container for a single number. A variable can be thought of as a box, the number of items in the box determines the value of the box, and the type of items in the box must be uniform.

An array is a collection of such boxes aligned together. The analogy is a medicine box that is specially designed so that it can only hold a specific type of medicine. Each partition in the box can contain any number of medicine (as long as the capacity permits), but you can only put in the medicine that the box is designed for. In an array, we use "index" to address each partition, an index is a number that uniquely identifies each partition in the medicine box. So we'd say, "the second partition" to identify the partition that holds the medicine for Tuesday (assuming the left-most partition is Monday). In many programming languages, arrays are 0-indexed, meaning that the first item in the array is accessed as the "0"th index of the array, while the final element of the array is held in the (array size - 1)th index. For example, in an array of 8 variables, you would access the first variable in the array at index 0, and the 8th element of the array at index 7.

The Difference Between Array and String[edit | edit source]

Have you ever wondered how computers store words and sentences like this paragraph inside themselves? I've said earlier that computers can only handle and operate on binary patterns (i.e. numbers), so how could it store something other than numbers? The answer is encoding. Encoding is a standard that specifies how a binary pattern would be treated and viewed, for example in ASCII encoding the letter 'A' is encoded as the number 65 (binary pattern 01000001), 'B' as number 66 (01000010), and so on. ASCII encoding defines 26 upper case letters ('ABC'), 26 lower case letters ('abc'), 10 number digits ('123'), several whitespace characters (space, tabs, newline), some special characters (#!@$;), and some obsolete control characters from the age of teletypes.

Now, back to the topic of "difference between array and string", what is the difference between them? Well, a string is a special case of an array- it is an array of single bytes. A string can thus contain any arbitrary binary data. One of the most common usage of strings is to hold words and sentences, this is because the de facto standard ASCII encoding fits nicely into a single byte.