Regular Language Closure Properties
Since DFAs are equivalent to NFAs, we can talk simply about "acceptance by an automaton," using whichever form NFA or DFA which suits us. The main result we want to prove is the following closure properties.Theorem: Languages accepted by automata are closed under union, concatenation, Kleene*, complementation, intersection.
Special form for finite automata
This normalization concept is discussed in the textbook on page 81. Normalization simplifies many of the necessary constructions. Every finite automaton is equivalent to one with these properties:- there is only one final state
- the final state and start state are distinct (and so the start state is non-final)
- there are no transitions into the start state
- there are no transitions out of the final state
- If necessary, create a new state s′, add the transition (s′,ε,s) to Δ, make s′ the new start state.
- If necessary a new state f′. Add (f,ε,f′) to Δ, for each f ∈ F. Set F = {f′}.
The closure proofs for union, concatenation, Kleene-star use NFA(s) in special form and construct a derived NFA also in special form. The proofs of union and concatenation assume languages L_{1} and L_{2} with respective special form NFAs M_{1} = (K_{1},Σ,Δ_{1},s_{1},F_{1}) and M_{2} = (K_{2},Σ,Δ_{2},s_{2},F_{2}). Assume that K_{1} and K_{2} have no states in common. The starting point of both proofs begins like this:
- create the state set K = K_{1} ∪ K_{2} (the states of both)
- create the transition relation Δ = Δ_{1} ∪ Δ_{2} ⊆ K×(Σ∪{ε})×K (the transitions of both)
Union:
The idea is to merge the start states as a single start state and do the same for the final states. The concept of merge means that we add new start and final states s and f to K representing the start state set {s_{1},s_{2}}, and final state set {f_{1},f_{2}}, respectively. The reason this works is that neither the start states have incoming edges and neither final states have outgoing edges. Any string which is accepted must forever leave the start state, follow a path which is exclusively within one machine or the other, and then terminate in one final state or another without reentry. The idea is conceptually simple, but here are the details:
for each transition (s_{1},x,q) ∈ Δ_{1}, (x = ε, or x = σ ∈ Σ): add (s,x,q) and remove (s_{1},x,q) for each transition (s_{2},x,q) ∈ Δ_{2}: add (s,x,q) and remove (s_{2},x,q) remove s_{1} and s_{2} from K make s the start state for each transition (q,x,f_{1}) ∈ Δ_{1}: add (q,x,f) and remove (q,x,f_{1}) for each transition (q,x,f_{2}) ∈ Δ_{2}: add (q,x,f) and remove (q,x,f_{2}) remove f_{1} and f_{2} from K make f a final state (the only one) |
Concatentation:
The idea is to create a non-final merged state {f_{1},s_{2}} and remove both f_{1} and s_{2}. An accepting string must start at s_{1}, pass through the merged state getting a w_{1} ∈ L_{1}, and continue to f_{2} adding a w_{2} ∈ L_{2}, thereby creating the accepting string w_{1}·w_{2} ∈ L_{1}·L_{2}.Add a new state j (for join) to K which represents the merge of states {f_{1},s_{2}}. |
for each transition (q,x,f_{1}) ∈ Δ_{1}: add (q,x,j) and remove (q,x,f_{1}) for each transition (s_{2},x,q) ∈ Δ_{2}: add (j,x,q) and remove (s_{2},x,q) remove f_{1} and s_{1} from K |
Kleene-star:
Create an automaton in normal form for L with start state s and final state f. Create a new start state s′ and new final state f′. |
Add the following transitions to Δ:
(s′,ε,s) connect to the old start state (f,ε,f′) connect from the old final state (f,ε,s) loop from old final to old start to allow repetitions (s′,ε,f′) insure that ε is accepted |
Complementation:
Take a DFA for L (convert an NFA to a DFA using the Rabin-Scott construction). Then interchange final and non-final states, namely, set F = K - F. Note that all transitions must be present — i.e., it must be a true DFA — for this construction to work properly.Intersection
We illustrated the intersection construction before using the Cartesian produce of the state sets. However, the closure on intersection follows because you can derive intersection from union and complementation (DeMorgan's Laws):L_{1} ∩ L_{2} = Σ^{*} - ( (Σ^{*} - L_{1}) ∪ (Σ^{*} - L_{2}) )
Equivalence of Regular Languages and Automata
This is Theorem 2.3.2 in the textbook. The result is credited to Kleene in 1956 and is called Kleene's Theorem.I. Every regular language is accepted by an automaton.
The set of regular languages is the closure (smallest possible set of subsets of Σ^{*}) containing ∅ and σ ∈ Σ under the operations union, concatenation, Kleene*. It is evident that ∅ and σ ∈ Σ are accepted by automata, and so the class of regular languages must be contained in the class of languages accepted by automata. Here are depictions of the "base" NFAs in special form as defined above:{σ}: | ∅: | {ε}: |
Comparison with textbook
Our construction of an NFA from regular expressions differs from the textbook's version based on our requirement that the NFA must be of the special form. The special form requirement makes union and concatenation simpler at the cost of making Kleene-star more complex. For example, our construction generates this:ab∪aa = |
Example 2.3.1
Try this for size: (ab∪aab)*. The book's version is in Figure 2-14. Our version is this:II. The language accepted by an automaton is regular.
Assume we have an automaton M in special form. Assume the states are the integers 1..n+1, for n ≥ 0. The start state s=0 and the final state is f=n+1. The idea is to eliminate all the intermediate states 1..n in order, effectively adding regular expression transitions to make up for the eliminated states. Define, for 0 ≤ i < n+1, 0 < j ≤ n+1, and 0 ≤ k ≤ nR(i,j,k) = { w ∈ Σ* : (i,w) ⊢* (j,ε) along a path with intermediate states ≤ k }
in particular, when k=0:
(A) R(i,j,0) = strings representing any edges from i to j with no intermediate statesThus, R(i,j,0) = the union of symbols on transitions from i to j. Continuing,
R(i,j,1) = strings representing any edges from i to j with only 1 as an intermediate stateThink of it like this:
R(i,j,1) = any strings we already have, plus
any string from i to 1 omitting 1,
followed by zero or more repetitions of strings from 1 to 1, omitting 1,
followed by a string from 1 to j, omitting 1
formally written this is:
followed by zero or more repetitions of strings from 1 to 1, omitting 1,
followed by a string from 1 to j, omitting 1
R(i,j,1) = R(i,j,0) ∪ R(i,1,0) · R(1,1,0)* · R(1,j,0)This equation may seem problematic if either i or j is 1, but it is correct; e.g.
R(1,1,1) = R(1,1,0) ∪ R(1,1,0) · R(1,1,0)* · R(1,1,0) "from 1 to 1" "single loop transition" "two or more loop transitions"In general, for k > 0,
(B) R(i,j,k) = R(i,j,k-1) ∪ R(i,k,k-1) · R(k,k,k-1)* · R(k,j,k-1)When k = n, we have:
R(i,j,n) = strings from i to j where you can pass through any intermediate stateIn particular
R(0,n+1,n) = the language accepted by the automaton
The claim is that, for all 0 ≤ i < n+1, 0 < j ≤ n+1, and 0 ≤ k ≤ n, R(i,j,k) is regular. The base case is expressed as (A) above. The induction step is expressed by (B) above.
Thus we've proved our result. ■
NFA to Regular Expression Examples
In the construction of the regular expression of an automaton, when a state is eliminated with n incoming (non-loop) edges and m outgoing (non-loop) edges, we will have to add n*m new regular expression edges and consolidate these with older ones. The internal states (non-start and non-final) can be eliminated in any order, however, in practice it is best to eliminate states which are "less complex" first. The complexity of a state can be judged by these factors:- the complexity of the loop expression, if there is one
- the value of n = # of incoming (non-loop) edges
- the value of m = # of outgoing (non-loop) edges
Example 1
Consider the DFA for the language over {a,b} of strings with an even number of a's:Example 2
Consider the DFA for the language over {a,b} of strings with an odd number of a's:Example 3
Consider the language over {a,b} of strings with which have bb as a substring. Using a DFA that we have already constructed, put this into a special form NFA:Example 4
Consider this example that we have visited previously:L = { w : w does not contain the substring bbb }
Here are two forms of the automaton. The first, you might say, is a partial DFA in that it is deterministic, but the
dead state has been omitted. The second is the special form version which is the starting point of the algorithm.
state | loop expression? | # incoming edges | # outgoing edges |
---|---|---|---|
1 | no | 1 | 2 |
2 | no | 1 | 3 |
3 | yes | 2 | 2 |
(r)? = r ∪ {ε} (r)+ = r·r* (r){m,n} = r^{m} ∪ r^{m+1} ∪ ... ∪ r^{n}
Eliminate 1: |
Use the original special form NFA above.
The path [(2,b,1), (1,ε,f)] gives (2,b,f).
Combine with (2,ε,f) gives (2,b?,f). The path [(2,b,1), (1,a,3)] gives (2,ba,3). Combine with (2,a,3) gives (2,b?a,f). |
Eliminate 2: |
Use the graph after eliminating 1.
After comparing 2 and 3, we still think 2 is less complex. The path [ (3,b,2), (2,b?,f) ] gives (3,bb?,f). Combine with (3,ε,f) gives (3,b{0,2},f). The path [ (3,b,2), (2,b?a,3) ] gives (3,bb?a,3). Combine with (3,a,3) gives (3,b{0,2}a,3). |
The final regular expression can be read as: (b{0,2}a)*b{0,2}.
The answer we get is in a surprisingly concise form which we may not have been able to derive easily without this algorithmic construction.