## 4. Deterministic Finite Automata

### Deterministic Finite Automata (DFA)

We now begin the machine view of processing a string over alphabet Σ. The machine has a semi-infinite tape of squares holding one alphabet symbol per square. The machine has a finite state set, K, with a known start state. Initially we want the processing to be deterministic, that is, there is only one possible outcome from processing a string. Here is how we process it:
• place string on a tape with one symbol in each square
• place machine in start state and the read head on the first square
• a computation step is done by considering the (current state, current tape symbol) and based the value of this pair, move to a new state and move the tape head one square to the right
• stop the machine when there are no more symbols to process
According to the description, the state change operation is a function K × Σ ⇾ K. There are a number of interpretations we can give to this processing method, but the one of most interest to us is that of accepting the input string based on terminal state. Because of the deterministic behavior, we can say more strongly that it is deciding this string, as to whether it belongs to the language or not. The natural interpretation is means that an accepted string's terminal belongs to a set of final, or accepting states, F ⊆ K.

The description of a DFA M = (K, Σ, δ, s, F) is that which is defined in the textbook.

We must define precisely what it means to compute an input string. Because the machine never goes back, and stops after the last symbol, we can characterize the machine state as a configuration, which is an element of K × Σ* representing (current machine state, remaining portion of string to process). We define the binary relation:
```⊢  ⊆  (K×Σ*) × (K×Σ*)      the yields in one step relation
```
It is defined as follows: for all σ ∈ Σ,
```(p,σw) ⊢ (q,w)  for all w ∈ Σ*     if and only if    δ(p,σ) = q
```
This makes rigorous several notions about processing in a DFA:
1. symbols are processed from left to right, processing each one only once
2. the state change only consults the current symbol (what is ahead in the string is irrelevant)
Define * to be the reflexive, transitive closure of . This is the yields relation.

In a DFA, computing the string w means to put the machine in the start configuration:
```(s,w)
```
A terminal state, q, is one in which
```(s, w) ⊢* (q, ε)
```
Because the state transition is a function, it implies that there is only one state p such that (q,w) ⊢* (p,ε). It is sometimes convenient to express the state transition as a function:
```δ* : K × Σ* ⇾ K
```
where
```δ*(q, w) = p   if and only if    (q, w) ⊢* (p, ε)
```
Because of the nature of the DFA, this function is well-defined.

#### Acceptance

We say the string w is accepted by a DFA if, proceeding from the start state, the terminal state we reach is final i.e., ∈ F. Concisely, w is accepted if δ*(s,w) ∈ F. Because of the determinism, it is sometimes said that w is decided by a DFA.

The language accepted by a DFA is the set of all strings accepted by it.

#### DFA and Regular Language Equivalence

One of the main goals of Chapter 2 is to show the following:

Theorem: A language is accepted by a DFA if and only if it is a regular language (i.e., has a regular expression).

### Graphical representation of DFA

Finite automata lend themselves readily to graphical interpretation.
• Each state is a node in the graph:
• The start state is designated by: and final states are designated by:
• A transition δ(p,σ) = q is represented by the labeled edge (p,σ,q):
Although it seems obvious, we still should state the equivalence of the two representations, in that state transitions by a string is equivalent to paths in the graph:

Claim: (q01,q1) (q12,q2) ... (qn-1n,qn) are labeled edges if and only if (q0, σ1σ2...σn) ⊢* (qn, ε)

The empty path (no labeled edges) corresponds to the empty string (no symbols).

#### Example 2.1.1 in textbook:

This is the even-parity checker for the number of b's in a string, i.e., the machine accepts the language
```L = { w : #b's in w is even }
```
Accepted strings include: ε, a*, a*ba*b.

We'll soon see the construction of a RE from a DFA, but this one is easy. You can see two looping paths from the start/final state back to itself by either a or ba*b. Choose from (a∪ba*b) represents one loop going by two general "paths". We can repeat this 0 or more times, getting (a∪ba*b)* as our RE for the DFA.

Switching final and non-final states gives us a DFA which represents the complement of the previous language:
```L = { w : #b's in w is odd }.
```
Note that the RE doesn't easily "complement" in any way.

#### Example 2.1.2 in textbook:

```L = { w : w does not contain the substring bbb }
```
The state q3 is called a dead state, because no paths beyond this point can reach a final state. Again, switch final and non-final states constructs the DFA for the complement language:
```L = { w : w contains the substring bbb }
```
It's also easy to see that a regular expression for L is (a∪b)*bbb(a∪b)*, since we only need to find the substring bbb somewhere in the string, not necessarily finding the first occurrence, which is what the DFA does.

### DFAs for substring acceptance

The previous example can be generalized. Consider the language
```{ w ∈ Σ* : w contains the substring u }
```
This is very easily expressed by the regular expression *)u(Σ*). To generate a DFA, write
```u = σ1...σn
```
Draw the partial DFA representing the "success" transitions:
```(q0,σ1,q1) (q1,σ2,q2) ... (qn-1,σn,qn)
```
Start state = q0, final state = qn.

What is missing are the failure transitions:
```(qi-1, σ, ??),  for σ ≠ σi
```
We don't always go back to the start state when we fail. In general, look at the failure string:
```y = σ1...σi-1σ  where σ ≠ σi
```
What we want to find is the string, x where
x = largest suffix of the failure string, y, which is also a prefix of the success string, u
Take this string x and run the DFA from the start state to state p, and add the failure transition
```(qi-1, σ, p)
```
Here is another example with Σ = {a,b}:
```L = { w : w contains the substring abaa }
```
The setup is this:
For example, consider the failure string at q3: abab. Observe that, for x = ab:
```ab·x     = failure string
x·aa  = success string
```
and that x is the largest possible substring satisfying this requirement. Therefore, to get the target state of the failure transition, run x from the start, thereby adding:
```( q3, b, q2 )
```
Completing this procedure, we get this DFA:

#### Ends-with substring example

Let's consider this language:
```L = { w : w ends with the substring abaa }
```
As a regular expression, we would write:
```(a ∪ b)*abba
```
In this case, we don't automatically accept anything string beyond the final state, and so we treat both transitions out of the final state as failures, looking for the state to go back to. The diagram is this:
Considering the a transition out of the final state, we get match the a suffix:
```abaaa     (fail)
abaa  (success)
```
This means that δ(q4, a) = q1.

Considering the b transition out of the final state, we get match the ab suffix:
```abaab     (fail)
abaa   (success)
```
This means that δ(q4, b) = q2.

### Complement

We will state precisely one concept that we have been suggesting in the above examples.

Theorem: If a language L is accepted by a DFA, then there is a derived DFA which accepts L = Σ* - L.

Given the DFA (K,Σ,δ,s,F), the derived DFA is (K,Σ,δ,s,K-F). Namely, the final state set of the derived DFA is the complement of the final state set of the original DFA.

### Product Construction: intersection and union

Theorem: If languages L1 and L2 over Σ are accepted by a DFAs, then there are derived DFAs which accept
• the intersection: L1 ∩ L2
• the union: L1 ∪ L2

Given the DFAs (K1,Σ,δ1,s1,F1) and (K2,Σ,δ2,s2,F2) for L1 and L2, respectively, the derived DFAs are of the form
`(K1×K2,Σ,δ,(s1,s2),F)`
where
```δ(p,q) = ( δ1(p), δ2(q) )
```
and F is either:
• for the intersection: F = F1×F2
• for the union: F = F1×K2 ∪ K1×F2
Intuitively the idea is to run the two DFA's simultaneously to a terminal state (q1,q2) and then accept the string if
• (for the intersection) both q1 and q2 are final in their respective DFAs.
• (for the union) at least one of q1 and q2 are final in their respective DFAs.

#### Intersection Example

Find a DFA for the language over {a,b}:
```{ w : w has an even number of b's and  does not contain the substring bb }
```
Here are the two languages and their DFAs:
 { w : w has an even number of b's } { w : w does not contain the substring bb }
The intersection construction gives us a machine with 6 states: {Ax, Ay, Az, Bx, By, Bz} derived from all state pairs. The hard part is usually figuring out how to meaningfully draw the constructed DFA. We can lay the states out in a 2x3 grid, but the crossing of the edges tends to obscure any simple sense of the behavior. Here in one possible rendering:
 ≡
Both states Az and Bz are both dead and can effectively be replaced by a single state.

#### Minimization

Reducing the number of states is a step towards minimizing the DFA. There is a procedure for doing so and it concerns finding states which are equivalent in the sense that the set of strings which lead to final states starting from any of them is the same. In the case of a dead state, the set of such strings is simply .

Equivalent states can be replaced by a single state