Automata theory/Finite automata
|This page is under construction. Content is likely to be revised significantly in the near future.|
Now that we've formalized the language that we use for strings and such, we need to introduce a few terms to describe our machines.
- Input - A string fed to a machine which the machine will determine whether it is part of the language that the machine was designed for. The string must only be made up of symbols from the machine's alphabet (e.g. it doesn't make sense to feed a 3 to a machine that only reads 0's and 1's). An input is read by a machine in a forward fashion, one symbol at a time.
- Return - The results of running the machine on a given input. Initially, this will either be accepted or rejected, indicating whether the input string is respectively part of the language or not. Later, we will introduce machines capable of generating an output string to return in addition to its standard return value.
- State - A resting place while the machine reads more input, if more input is available. States are typically named. Canonical names for states (which we will get to later) commonly consist of the lowercase letter 'q' followed by a number, e.g. q0. State names must be unique. Graphically, states are represented by a circle with the name inside.
- Start State - For programmers, this is known as the program entry point. It is the state that the machine naturally starts in before it reads any input. The name for the start state will usually either be S or, canonically, q0 (it is numbered with the lowest number of all the states). Graphically, the start state is represented as a state with an unlabeled arrow pointing from nothing to it.
- Accepting State or Final State - A set of states which the machine may halt in, provided it has no input left, in order to accept the string as part of the language. Throughout this course we will exclusively use the term Accepting because I think it is a more descriptive term, but be aware that other places will use the term Final to mean the same thing. Graphically, accepting states are indicated distinctly from other states in any of a number of ways. Common ways include bolding or underlining the name, or using a double circle instead of a single. If the machine has only 1 accepting state, it is oftentimes named with the letter A. It is worth noting that a machine with no accepting states does not accept any string as part of the language (not even the empty string) - it is essentially the Empty Language, ∅.
- Rejecting State - Any state in the machine which is not denoted as an accepting state. The string is only rejected if the machine halts in a rejecting state with no more input left. Rejecting states have no special representation, because every state is either accepting or rejecting.
- Dead State - A rejecting state that is essentially a dead end. Once the machine enters a dead state, there is no way for it to reach an accepting state, so we already know that the string is going to be rejected. Graphically, the dead state is often omitted and assumed for any input that the machine does not have explicit instructions on what to do with. A machine may have multiple dead states, but at most only one dead state is needed per machine.
- Transition - A way for a machine to go from one state to another given a symbol from the input. Graphically, a transition is represented as an arrow pointing from one state to another, labeled with the symbol or symbols that it can read in order to move the machine from the state at the tail of the arrow to the tip of the arrow. Transitions may even point back to the same state that they came from, which is called a loop.
Using this terminology, we can logically deduce that a Finite State Automaton or Finite State Machine (FSM) is a machine that has a finite number of states.
Deterministic Finite Automata
Machines are categorized into different types. Each type behaves radically different from other types, and tend to appear graphically different (especially the labels for the transitions). The first type we will deal with is the Finite Automata (FA), and specifically, the Deterministic Finite Automata (DFA), since it is one of the simplest types. It is called a Finite Automata because we know that given a finite input, the machine will execute in reasonable finite time and give us a result. This is called Halting, and we will get into that more in depth much later. It is Deterministic in that every state has exactly one transition per symbol, so there are no ambiguities as to which state to go to on a given symbol of input. In the next section, we will introduce the Non-deterministic Finite Automata (NFA).
Again, different sources will represent this differently. I will use the form that I find most logical.
Formally, a DFA is represented by a 5-tuple. M = (Q, ∑, δ, q0, A) where
- Q is a finite set of states.
- ∑ is the alphabet that the input and the machine will use.
- δ is a transition function mapping Q × Σ → Q. That is, given any state in Q and any symbol in Σ, our transition function will give us a state in Q to transition to.
- q0 is the start state such that q0 ∈ Q.
- A is the set of accepting states such that A ⊆ Q.
Because the transition function can quickly become a rather large mapping, we rarely write out formal machines. Instead, we tend to let computers and algorithms handle it for us, write mathematical/set rules that define a machine, or use the graphical representation. If we do need to write out the transition function, we usually do so in a table format, called a transition table, where rows and columns are the states and symbols to transition from, and the entries are the states to transition to.
Reading a Finite Automata Diagram
This is probably the part that most of you have been waiting for.
Remember that states are circles and transitions are arrows. The first step is to identify the Start State. As a reminder, this is the state with an arrow pointing to it from nowhere. It looks like this:
Next, identify any accepting states. Remember, they are either bold, underlined, or double circled.
Now let's put this machine in motion. In order to do this, we will need a string for input. Let's take the following strings:
For the first input, "011", we read the first symbol, 0. From our start state, we see a loop transition on 0 back to the start state. The next symbol of input is 1. We are currently still in the start state, and we see a transition on 1 to state q1. The final symbol of input is 1. From q1, we see a transition on 1 to state A. We have no more symbols left in our input, and we are in state A, which we determined is an accepting state. Therefore, the string "011" is accepted by this machine.
The next 3 strings will demonstrate strings that are not part of our language, and are appropriately rejected (not accepted) by our machine.
- "01" starts with a 0, so it takes the loop from the start state back to itself. The next symbol is a 1, which takes the transition to state q1. At this point, we've run out of input, and we are in state q1. q1 is not an accepting state, so we reject.
- "10" starts with a 1, so it takes the transition from the start state to q1. The next symbol is a 0, but we don't have any transitions from q1 for 0. This means that there is an implied dead state that we would transition to. In other words, we reject the string.
- "110" takes the 1 to q1. From q1, the next symbol of input we read is a 1, so we transition to state A. Beware, just because we are in an accepting state does not mean that we accept. We have to run out of input, too. We still have one symbol left in our input, 0. There are no transitions for 0, so there is an implied dead state that we would transition to. The input is not accepted.
As a quick exercise to the reader, create some strings of your own consisting only of 0's and 1's, and test them on this machine. Keep a record of which are accepted and which are not accepted. Try to recognize the pattern and determine, in plain English, what kind of strings this machine accepts.
Answer: As it turns out, this machine accepts strings with any number of 0's followed by at least two 1's. Or formally, L = 0i111j
There is a formal notation for processing a given input on a given machine. To do this, we must introduce a few more vocab words, a tuple, and a symbol.
A configuration pairs the current state the machine is in with the remaining input that it still has to process. This configuration can be represented with the tuple
(q ∈ Q, s ⊆ w) where:
- q ∈ Q is the machine's current state.
- s ⊆ w is the remaining input.
When you start a machine, the initial configuration is
(q0,w) where q0 is the start state, of course.
Assuming a machine reaches an accepting state with no input, and thus accepts the string, the final configuration would be
(q ∈ A,ϵ). Remember that A denotes the set of accepting states and ϵ denotes the empty string (no more input left). This is called an accepting configuration. A rejecting configuration would then be
(q ∉ A,ϵ).
Yields represents the transition from one configuration to another.
- Yields in one step is a transition to the next state over 1 symbol of input. This is indicated by the yields or right tack symbol: ⊢ . Sometimes it is useful to disambiguate which machine you are computing, in which case you simply subscript the machine, ⊢M.
- For abbreviating the transition across multiple steps, like a closure (skipping to the last computation), you may use the yields symbol followed by the Kleene star: ⊢*M.
A computation is combining configurations with yields. The full set of steps taken by a machine on a given input is called the computation history. It can be denoted
(q0,w) ⊢* (q ∈ Q,ϵ).
Here is the computation history of the string "011" on our machine from before: (S,011) ⊢ (S,11) ⊢ (q1,1) ⊢ (A,ϵ)
As an exercise to the reader, generate the computation histories of the other strings from before, assuming the dead state is D, and remember that D ∉ A. Indicate whether each final configuration (along with our first string's final configuration (A,ϵ)) is an accepting or rejecting configuration, and why.