Jump to content

Automata theory

From Wikiversity
Navigation
School of Computer Science — Automata theory 1.1 — Automata Theory
Module tutor
IsmAvatarWikiversity talk page | E-mail tutor
Please contact the module tutor, the person who maintains this page, to request help or to give feedback.


Educational level: this is a tertiary (university) resource.

Sometimes also referred to as the Theory of Computation or a superset thereof, Automata Theory is a field within Set Theory and Computer Science, and serves as the theoretical framework through which computers and modern computing came about. Although somewhat esoteric, as indicated by the strange names you will encounter throughout, it has many applications, ranging across programming machines, string computations (such as regular expressions and parsing), and problem solving of decision problems. It can also be an entertaining exercise in programming to try and solve common problems (such as number addition) by writing machines to solve them. The machines constructed will largely resemble flow charts and will behave very similarly.

Before starting this course, it is recommended you have some familiarity with Set Theory and Set Notation, although the first few uses of it will briefly review the meanings of the notations. You should also be somewhat familiar with Formal Definitions and Formal Proofs, or at least understand what they are. Even though this is a topic within Computer Science, an understanding of computers, programming, or the likes is not necessary for this. Many topics of this depth use a lot of jargon, formulas, and symbols, and can get very confusing very fast for someone who doesn't have a background in that stuff. I try my best to avoid that and keep it as humanly readable as possible. If there is something you don't understand, feel free to ask about it on the Discuss page or direct it to the instructor.

Terminology and symbols

[edit | edit source]

Throughout the different books and materials on Automata Theory and computability, different symbols and terminology may be used, as there is no standard for symbols and terminology. Generally, however, the concepts are the same, there's just slight deviations in symbols or terminology, so it's not too difficult to decipher. Herein we will use one set of symbols and terminology, and leave it as an exercise to the reader to decipher other symbols and terminology used in other sources when they are encountered.

Introduction

[edit | edit source]

First we must define the basic components that will make up our definitions.

  • Alphabet - A finite set of symbols or characters. Denoted uppercase Greek letter Sigma: ∑
  • String - A finite sequence of symbols from some alphabet ∑, usually surrounded by "quotes". If your alphabet is { a, b, c }, a possible string would be "acbab".
    • The empty string, or a string with no characters in it, is denoted with lowercase Greek letter lunate epsilon: ϵ = ""
    • The set of all possible strings over an alphabet ∑ is denoted ∑* (the Kleene star)
      • For an alphabet ∑ = { a, b }, ∑* would be { ϵ, "a", "b", "aa", "ab", "ba", "bb", "aaa", ... } (the set is infinite, but only consists of a's and b's)
  • Language - A set of valid strings over a finite alphabet. Denoted L, and with the context of alphabet ∑ it can be denoted ∑L.
    • Empty Language - A language containing no strings, denoted {} or ∅.
      • Do not confuse with the empty string, which is not in the empty language. ϵ ≠ ∅ and ϵ ∉ ∅.
    • Natural Languages (like the English language) will not be dealt with. They are far too complex and oftentimes can't even be expressed with a formal grammar.

Latin alphabet letters frequently used in notation:

  • c - usually represents symbols or characters.
  • s, t, w - usually represent strings
  • x, y - usually represent variables
  • i, j, k, n - usually represents numbers

Function notation:

  • |s| - Length function. Returns the number of symbols in s. For instance, |"hello world"| = 11
  • #c(s) - Occurrence function. Returns the number of times c occurs in s. For instance, #l("hello world") = 3
  • s||t or st - Concatenation function. Returns a single string whose prefix is s and whose suffix is t, essentially combining s and t to form one string.
    • Concatenation is associative. (st)w = s(tw).
    • ϵ is identity for concatenation. ϵs = sϵ = s. That is, any string s concatenated with ϵ remains s.
  • si - Replication function. Returns the string s repeated i times. For instance, a3 = aaa and "ab"3 = "ababab".
    • w0 = ϵ
    • wi+1 = wiw
  • sR - Reversal function. "abc"R = "cba"
    • Reversing any string of length 1 or 0 will return the original string (in the latter case, ϵ).
    • (st)R = tRsR

String terminology:

  • Substring - A sequence of consecutive symbols from a string. "beg" is a substring of "begin".
    • A substring t is a proper substring of s iff s ≠ t. That is, "hi" is a substring of "hi", but not a proper substring.
  • Prefix - A substring that comes at the beginning of a string. Formally, s is a prefix of t iff ∃w ∈ Σ*(t = sw)
  • Suffix - A substring that comes at the end of the string. Formally, s is a suffix of t iff ∃w ∈ Σ*(t = ws)
  • Proper - A proper substring is any substring s of t where s ≠ t. Likewise, you can also have a proper prefix and a proper suffix.
  • ϵ is a substring, prefix, and suffix of every string.

Next we will discuss Finite Automata.

Other topics that will be discussed for automata theory:

  • Finite Automata
  • Regular Expressions
  • Regular Grammars
  • Regular Languages
    • Deciding whether a language is Regular
  • Context-Free Grammars
    • Deciding whether a language is Context-Free
  • Push-down Automata
  • Turing Machines
  • Decidability (Halting, etc)
  • Proofs