## Regular Language / FA Equivalence

### Regular Language Closure Properties

Since DFAs are equivalent to NFAs, we can talk simply about "acceptance by an automaton," using whichever form NFA or DFA which suits us. The main result we want to prove is the following closure properties.

Theorem: Languages accepted by automata are closed under union, concatenation, Kleene*, complementation, intersection.

#### Special form for finite automata

This normalization concept is discussed in the textbook on page 81. Normalization simplifies many of the necessary constructions. Every finite automaton is equivalent to one with these properties:
• there is only one final state
• the final state and start state are distinct (and so the start state is non-final)
• there are no transitions into the start state
• there are no transitions out of the final state
The argument is simple:
• If necessary, create a new state s′, add the transition (s′,ε,s) to Δ, make s′ the new start state.
• If necessary a new state f′. Add (f,ε,f′) to Δ, for each f ∈ F. Set F = {f′}.

The closure proofs for union, concatenation, Kleene-star use NFA(s) in special form and construct a derived NFA also in special form. The proofs of union and concatenation assume languages L1 and L2 with respective special form NFAs M1 = (K1,Σ,Δ1,s1,F1) and M2 = (K2,Σ,Δ2,s2,F2). Assume that K1 and K2 have no states in common. The starting point of both proofs begins like this:
• create the state set K = K1 ∪ K2   (the states of both)
• create the transition relation Δ = Δ1 ∪ Δ2 ⊆ K×(Σ∪{ε})×K   (the transitions of both)

Union:

The idea is to merge the start states as a single start state and do the same for the final states. The concept of merge means that we add new start and final states s and f to K representing the start state set {s1,s2}, and final state set {f1,f2}, respectively.

The reason this works is that neither the start states have incoming edges and neither final states have outgoing edges. Any string which is accepted must forever leave the start state, follow a path which is exclusively within one machine or the other, and then terminate in one final state or another without reentry. The idea is conceptually simple, but here are the details:
 for each transition (s1,x,q) ∈ Δ1, (x = ε, or x = σ ∈ Σ): add (s,x,q) and remove (s1,x,q) for each transition (s2,x,q) ∈ Δ2: add (s,x,q) and remove (s2,x,q) remove s1 and s2 from K make s the start state for each transition (q,x,f1) ∈ Δ1: add (q,x,f) and remove (q,x,f1) for each transition (q,x,f2) ∈ Δ2: add (q,x,f) and remove (q,x,f2) remove f1 and f2 from K make f a final state (the only one)

Concatentation:

The idea is to create a non-final merged state {f1,s2} and remove both f1 and s2. An accepting string must start at s1, pass through the merged state getting a w1 ∈ L1, and continue to f2 adding a w2 ∈ L2, thereby creating the accepting string w1·w2 ∈ L1·L2.

 Add a new state j (for join) to K which represents the merge of states {f1,s2}. for each transition (q,x,f1) ∈ Δ1: add (q,x,j) and remove (q,x,f1) for each transition (s2,x,q) ∈ Δ2: add (j,x,q) and remove (s2,x,q) remove f1 and s1 from K

Kleene-star:

 Create an automaton in normal form for L with start state s and final state f. Create a new start state s′ and new final state f′. Add the following transitions to Δ: (s′,ε,s) connect to the old start state (f,ε,f′) connect from the old final state (f,ε,s) loop from old final to old start to allow repetitions (s′,ε,f′) insure that ε is accepted

Complementation:

Take a DFA for L (convert an NFA to a DFA using the Rabin-Scott construction). Then interchange final and non-final states, namely, set F = K - F. Note that all transitions must be present — i.e., it must be a true DFA — for this construction to work properly.

Intersection

We illustrated the intersection construction before using the Cartesian produce of the state sets. However, the closure on intersection follows because you can derive intersection from union and complementation (DeMorgan's Laws):
L1 ∩ L2 = Σ* - ( (Σ* - L1) ∪ (Σ* - L2) )

### Equivalence of Regular Languages and Automata

This is Theorem 2.3.2 in the textbook. The result is credited to Kleene in 1956 and is called Kleene's Theorem.

I. Every regular language is accepted by an automaton.

The set of regular languages is the closure (smallest possible set of subsets of Σ*) containing and σ ∈ Σ under the operations union, concatenation, Kleene*. It is evident that and σ ∈ Σ are accepted by automata, and so the class of regular languages must be contained in the class of languages accepted by automata. Here are depictions of the "base" NFAs in special form as defined above:
 {σ}: ∅: {ε}:
Technically, here is no need to make a "base-case" automaton for {ε} because {ε} = ∅*.

#### Comparison with textbook

Our construction of an NFA from regular expressions differs from the textbook's version based on our requirement that the NFA must be of the special form. The special form requirement makes union and concatenation simpler at the cost of making Kleene-star more complex. For example, our construction generates this:
 ab∪aa   =
The textbook construction would give this:

#### Example 2.3.1

Try this for size: (ab∪aab)*. Our version is this:
The textbook construction would give this:

II. The language accepted by an automaton is regular.

Assume we have an automaton M in special form. Assume the states are the integers 1..n+1, for n ≥ 0. The start state s=0 and the final state is f=n+1. The idea is to eliminate all the intermediate states 1..n in order, effectively adding regular expression transitions to make up for the eliminated states.

Define, for 0 ≤ i < n+1, 0 < j ≤ n+1, and 0 ≤ k ≤ n
R(i,j,k) = { w ∈ Σ* : (i,w) ⊢* (j,ε)  along a path with intermediate states ≤ k }
in particular, when k=0:
(A) R(i,j,0) = strings representing any edges from i to j with no intermediate states
Thus, R(i,j,0) = the union of symbols on transitions from i to j.

Continuing,
R(i,j,1) = strings representing any edges from i to j with only 1 as an intermediate state
Think of it like this:
R(i,j,1) =   any strings we already have, plus any string from i to 1 omitting 1,
followed by zero or more repetitions of strings from 1 to 1, omitting 1,
followed by a string from 1 to j, omitting 1
formally written this is:
R(i,j,1) = R(i,j,0)  ∪  R(i,1,0) · R(1,1,0)* · R(1,j,0)
This equation may seem problematic if either i or j is 1, but it is correct; e.g.
R(1,1,1)   =     R(1,1,0)      ∪     R(1,1,0) · R(1,1,0)* · R(1,1,0)
"from 1 to 1"    "single loop transition"           "two or more loop transitions"
In general, for k > 0,
(B) R(i,j,k) = R(i,j,k-1)  ∪  R(i,k,k-1) · R(k,k,k-1)* · R(k,j,k-1)
When k = n, we have:
R(i,j,n) = strings from i to j where you can pass through any intermediate state
In particular
R(0,n+1,n) = the language accepted by the automaton

The claim is that, for all 0 ≤ i < n+1, 0 < j ≤ n+1, and 0 ≤ k ≤ n, R(i,j,k) is regular.

The base case is expressed as (A) above. The induction step is expressed by (B) above.
Thus we've proved our result. ■

### NFA to Regular Expression Examples

In the construction of the regular expression of an automaton, when a state is eliminated with n incoming (non-loop) edges and m outgoing (non-loop) edges, we will have to add n*m new regular expression edges and consolidate these with older ones.

The internal states (non-start and non-final) can be eliminated in any order, however, in practice it is best to eliminate states which are "less complex" first. The complexity of a state can be judged by these factors:
• the complexity of the loop expression, if there is one
• the value of n = # of incoming (non-loop) edges
• the value of m = # of outgoing (non-loop) edges
It may be impossible to completely quantify the complexity based on these three factors, but you can use them to weigh the choices of order of state elimination, in order of increasing complexity. As the construction proceeds, these complexity measures may change and you may need to revise the order of elimination of the remaining states. When you get down to the last state, you can usually read off the answer directly from the graph.

#### Example 1

Consider the DFA for the language over {a,b} of strings with an even number of a's:
First put this into special form NFA:
State q0 has one 2 incoming and 2 outgoing edges whereas state q1 has only one of each. So eliminate q1 first, getting:
The final regular expression can be read as:   (b∪ab*a)*

#### Example 2

Consider the DFA for the language over {a,b} of strings with an odd number of a's:
First put this into special form NFA:
State q0 has one 2 incoming and 1 outgoing edges whereas state q1 1 incoming and 2 outgoing, so they're basically equal. We choose q0 to eliminate first, getting:
The final regular expression can be read as:   b*a(b∪ab*a)*

#### Example 3

Consider the language over {a,b} of strings with which have bb as a substring. Using a DFA that we have already constructed, put this into a special form NFA:
State q2 is obviously the simplest, so eliminate it, getting:
Of the two remaining, q1 is the simplest. Eliminate it, getting:
The final regular expression can be read as:   (a∪ba)*bb(a∪b)*.

Of course, (a∪b)*bb(a∪b)* is the more obvious regular expression, but the one derived here from the DFA expresses the first occurrence of bb in the string.

#### Example 4

Consider this example that we have visited previously:
L = { w : w does not contain the substring bbb }
Here are two forms of the automaton. The first, you might say, is a partial DFA in that it is deterministic, but the dead state has been omitted. The second is the special form version which is the starting point of the algorithm.

The states are numbered in order of what I consider increasing complexity according to the special form. Compare the factors:
state# incoming edges# outgoing edges
112
213
332
Here are the steps. To help minimize complexity, we will use convenience quantifiers ?, +, {n,m} when it suits, namely
(r)?     = r ∪ {ε}
(r)+     = r·r*
(r){m,n} = rm ∪ rm+1 ∪ ... ∪ rn
 Eliminate 1: Use the original special form NFA above. The path [(2,b,1), (1,ε,f)] gives (2,b,f). Combine with (2,ε,f) gives (2,b?,f). The path [(2,b,1), (1,a,3)] gives (2,ba,3). Combine with (2,a,3) gives (2,b?a,f).

 Eliminate 2: Use the graph after eliminating 1. After comparing 2 and 3, we still think 2 is less complex. The path [ (3,b,2), (2,b?,f) ] gives (3,bb?,f). Combine with (3,ε,f) gives (3,b{0,2},f). The path [ (3,b,2), (2,b?a,3) ] gives (3,bb?a,3). Combine with (3,a,3) gives (3,b{0,2}a,3).

The final regular expression can be read as:   (b{0,2}a)*b{0,2}.

The answer we get is in a surprisingly concise form which we may not have been able to derive easily without this algorithmic construction.