Applied Programming/Strings

From Wikiversity
Jump to navigation Jump to search

This lesson introduces strings and string processing.

Objectives and Skills[edit]

Objectives and skills for this lesson include:[1]

  • Evaluate an expression to identify the data type assigned to each variable
    • Identify str, int, float, and bool data types
  • Perform data and data type operations
    • Convert from one data type to another type; construct data structures; perform indexing and slicing operations
  • Determine the sequence of execution based on operator precedence
    • Assignment; Comparison; Logical; Arithmetic; Identity (is); Containment (in)
  • Select the appropriate operator to achieve the intended result
    • Assignment; Comparison; Logical; Arithmetic; Identity (is); Containment (in)

Readings[edit]

  1. Wikipedia: String (computer science)

Multimedia[edit]

  1. YouTube: Python Tutorial: Slicing Lists and Strings
  2. YouTube: Python Tutorial: String Formatting
  3. YouTube: Python Simple String Manipulation
  4. YouTube: Strings, Escape Sequences and Comments : Python Tutorial #4
  5. YouTube: Implement Run-length Encoding of Strings
  6. YouTube: Iterating Over a Python String
  7. YouTube: Python Tutorial for Beginners 7: Loops and Iterations - For/While Loops
  8. YouTube: Computer Programming - Strings

Examples[edit]

Activities[edit]

  1. Review Wikipedia: Run-length encoding. Create a program that asks the user for an input string of alphabetic characters. Convert the string to a run-length encoded (RLE) string of characters and numbers. Use the compressed format, where a single instance of a character has no count. For example, AAABCC would be A3BC2. Use a separate function for string processing. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.
  2. Enhance the RLE program above to check to see if a string has numbers in it. If so, it is already in RLE format. Decode RLE strings and display the results. Strings that have no encoding should be encoded as above. Use a separate function for decoding. Validate parameters and update program and function documentation, consistent with the documentation standards for your selected programming language.
  3. Review Wikipedia: Escape sequence. Enhance the RLE program above by allowing a # symbol to be used as an escape sequence, indicating that the following number is a number character rather than an encoding count. Use a pair of # symbols (##) to indicate a # character. This change should allow any input sequence to be encoded. Enhance the decoding function to support the new format. Validate parameters and update program and function documentation, consistent with the documentation standards for your selected programming language.

Lesson Summary[edit]

  • A string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. Strings are widely used in almost all programming languages as they are quite powerful. [2]
  • A character in a programming language is the smallest unit of textual information.[3]
    • Strings are lists or containers of individual characters 'strung' together, integrated with other useful functionality (like the ability to 'find' a character or a sub-string).[3]
  • Characters may be alphabetic letters, numeric digits, punctuation marks and other symbols, whitespace, or 'control characters'.[3]
    • Control characters are not printed like other symbols; instead, they communicate a more abstract idea, like the signaling of a newline or the ringing of a bell.[4]
  • In some lower-level languages like C++ or Java, individual characters are of a specific data type, usually termed 'char'.[3]
  • Since information stored in a computer is represented in binary format, the need for a standardized table, pairing numbers with characters, became readily apparent. Thus, ASCII, the American Standard Code for Information Interchange, was born.[5]
  • ASCII is a near-ubiquitous example of an encoding scheme or a character set, the systematic mapping of code points to symbols of language.[5]
  • Originally, ASCII only defined 128 characters, encompassing the entire alphabet (both lower- and upper-case), the digits (0-9), and other characters of importance.[5]
    • It was subsequently expanded through various unofficial revisions in order to make full use of the 28 or 256 possibilities given a one byte series of data.[6]
    • Although still relevant, ASCII is being phased out by Unicode, a superset of ASCII that implements universal (not just English-based) symbols by extending the original set of 128 characters.[7]
  • String literal is a quoted sequence of characters (formally "bracketed delimiters"), as in x = "foo", where "foo" is a string literal with value foo – the quotes are not part of the value.[8]
  • There are numerous alternate notations for specifying string literals and the exact notation depends on the individual programming language.[8]
  • An empty string is literally written by a pair of quotes with no character at all in between.[8]
  • The string must begin and end with the same kind of quotation mark and the type of quotation mark may give slightly different semantics.[8]
  • A number of languages provide for paired delimiters, where the opening and closing delimiters are different.[8]
    • For example: “Hi There!”, ‘Hi There!’, „Hi There!“, «Hi There!»
  • String literals can contain literal newlines, spanning several lines. Alternatively, newlines can be escaped, most often as \n.[8]
  • Python has a special form of a string, designed for multiline literals, called triple quoting. These literals strip leading indentation are especially used for inline documentation, known as docstrings.[8]
  • Strings are a data type typically implemented as an array data structure of bytes. [2]
  • In general, there are two types of strings, fixed and variable-length strings.[2]
    • Fixed-length strings contain a fixed maximum length to be determined at compile time and the same amount of memory will be used.[2]
    • Variable-length strings do not have a fixed length and can vary the amount of memory to be used at runtime. Variable-Length strings are also more common in modern programming languages.[2]
  • There are a variety of algorithms for processing strings:[2]
    • String searching algorithms for finding a given substring or pattern
    • String manipulation algorithms
    • Sorting algorithms
    • Regular expression algorithms
    • Parsing a string
    • Sequence mining
  • Most programming languages offer string functions in order to manipulate a string. Refer to String functions for a list of string functions used in various languages.
  • String indexing and slicing ...
    • Strings and substrings can be accessed by index, with the first character receiving an index of 0, and all others with an index incremented from that of the previous character.
    • All characters including white spaces are given an index.
    • Though implementation varies between languages, various functions such as slicing, and concatenating can be done by utilizing index. Refer to String functions

Key Terms[edit]

concatenation
When a sequence of symbols in string S is joined/followed by the sequence of characters in string T, and is denoted string ST.[2]
fixed-length strings
Fixed length strings have a fixed, maximum length to be determined at compile time and use the same amount of memory whether the maximum is needed or not.[2]
prefix
A string A = a1, a2, …an has a prefix  = a1, a2, … am when m ≤ n. A proper prefix of the string A would not be equal to itself (0 ≤ m < n).[2]
reversal
The reverse of a string is a string with the same symbols but in reverse order.[2]
rotation
A string s = uv is said to be a rotation of t if t = vu.[2]
string
Traditionally a sequence of characters, either as a literal constant or as some kind of variable.[2]
string datatype
A datatype modeled on the idea of a formal string.[2]
string literal
When a string appears literally in source code, also known as an anonymous string.[2]
substring
Occurs when one string is a prefix of a suffix of an original string, and equivalently a suffix of a prefix.[2]
suffix
Any substring of an original string that includes the original string’s last letter, including itself. A proper suffix of a string is not equal to/the same as the string original string itself.[2]
variable-length strings
Variable-length strings have a length that is not arbitrarily fixed and can use varying amounts of memory depending on the actual requirements at run time.[2]

See Also[edit]

References[edit]