Minimization Theory in a Nutshell
String equivalence with respect to a language
Definition 2.5.1 in the textbook. Given a languge, L, write x ≈ y iffor all z ∈ Σ*, xz ∈ L ⇔ yz ∈ LLoosely we say that x and y "share a common fate" in terms of acceptance. Claim: The relation ≈ is an equivalence relation. Thus, we can talk about the equivalence classes with respect to ≈. Write [x] as the equivalence class containing string x.
The δ* function
Given a DFA M=(K,Σ,δ,s,F), defineδ*: K × Σ* → Kin terms of the yeilds relation:
δ*(p,w) = q if and only if (p,w) |* (q,ε)
By virtue of being a DFA, δ* is well-defined. We defined this function
aleady in the Deterministic Finite Automata document, but have not
made much use of it yet.
Unfortunately the Lewis textbook
does not define this function. Let's consider this common usage of |*:
(p, x) |* (q, ε) (q, y) |* (r, ε)then
(p, xy) |* (q, ε·y) |* (r, ε)i.e.,
(p, xy) |* (r, ε)Expressing this in the δ* notation:
q = δ*(p,x) r = δ*(q,y)and likewise
r = δ*(p,xy)and so we write:
δ*(p,xy) = δ*(q,y) = δ*(δ*(p,x),y)or, eliminating the middle expression:
δ*(p,xy) = δ*(δ*(p,x),y)
Relationship of DFA to string-equivalence classes
Let L be a regular language. Let KL be the set of all string equivalence classes with respect to the ≈ equivalence relation. Given any DFA M=(K,Σ,δ,s,F) for L. Consider the mapping: T : K → KL defined by:T( δ*(s,x) ) = [x]We need to argue that the T function is well-defined, meaning that:
δ*(s,x) = δ*(s,y) ⇒ x ≈ yThis is pretty obvious. In poetic English,
If x and y run to the same state,The fact that we can choose any string x implies that the r function is a surjection (an onto mapping) from K → KL. Consequently,
then x and y must share the same fate.
- |K| ≥ |KL|, and therefore
- |KL| represents the minimal number of states of any DFA for L
Myhill-Nerode Theorem
Define Min(L) = (KL,Σ,δL,sL,FL) where:
KL = { [x] : x ∈ Σ* }
sL = [ε]
FL = { [x] : x ∈ L }
δL([x],σ) = [xσ]
The theorem states that this is the unique minimal
DFA for L.
A key step in proving the theorem is showing that
δL is well-defined, i.e.
x ≈ y ⇒ xσ ≈ yσObserve that Min(L) is defined in terms of the language L per se, affording it a status of being independent of any DFA which recognizes L. We know that Min(L) is minimal in terms of the number of states, but it is important to understand better what it means to be the unique DFA for L. Loosely, is that any other minimal DFA M = (K,Σ,δ,s,F) is essentially the same as Min(L). More precisely, consider the mapping
T(q) = [x], where q = δ*(s,x)T is onto (surjective) and if M is minimal then the number of states must be equal. Consequently T must be one-to-one (injective), making T a bijection, or a one-to-one correspondence between states of M and states of Min(L). The additional key point is that T is structure-preserving, i.e.,
T(δ(p,σ)) = δL(T(p),σ)In English, this means that the states of M and Min(L) correspond to each other in a way that the transitions also correspond. In algebraic terms, T is called an isomorphism.
Simple Example
Let L ⊂ {a,b}* be the language of all strings with an even number of a's. We can easily argue that:- x ≈ ε if and only if x ∈ L
- x ≈ a if and only if x ∉ L
Recognizing a minimal DFA
Regarding the Lewis textbook, we have outlined the material through page 97. The Myhill-Nerode Theorem is theoretically useful, but it is ultimately only an existence proof, not the construction of a DFA. We need to better understand how to proceed with a given DFA for a language L. To start, we need to define yet another equivalence relation among states of a DFA. Write p ≡ q if,
for all z ∈ Σ*, δ*(p,z) ∈ F ⇔ δ*(q,z) ∈ F.
If p ≡ q, we say p and q are
indistinguishable, otherwise, they are distinguishable.
This is basically expressing the same "share a common fate"
idea as is expressed by ≈, except
that it is expressed in terms of states.
We want to say that the two equivalence relations, ≈
and ≡ express exactly the
same idea. Technically, this is what needs to be proved:
(A) x ≈ y ⇔ δ*(s,x) ≡ δ*(s,y)Here's the proof:
x ≈ y ⇔ for all z, xz ∈ L ⇔ yz ∈ L
⇔ for all z, δ*(s,xz) ∈ F ⇔ δ*(s,xz) ∈ F
⇔ for all z, δ*(δ*(s,x),z) ∈ F ⇔ δ*(δ*(s,x),z) ∈ F
⇔ δ*(s,x) ≡ δ*(s,y)
Alternative representation of the minimal DFA
Given DFA M = (K,Σ,δ,s,F), the equivalence class of state p is [p] = { q : p ≡ q }. We can now give another more concrete representation of the minimal DFA:M′ = (K′,Σ,δ′,s′,F′)where
K′ = { [q] : q ∈ K }
s′ = [s]
F′ = { [f] : f ∈ F }
δ′([q],σ) = [ δ(q,σ) ]
In addition to other steps
we have to argue that δ′ is well-defined, i.e.,
p ≡ q ⇒ δ(p,σ) ≡ δ(q,σ)The correspondence between the states of M′ and Min(L) is then written as follows:
T′: K′ → KL T′( [ δ′*([s],x) ] ) = [ x ]The statement (A) above guarantees that T′ is well-defined and one-to-one, implying that M′ is indeed (another representation of) the minimal DFA.
Test for being a minimal DFA
One consequence of this minimal DFA construction is thatA DFA is minimal if and only if all states are distinguishable.This means that no two states are "≡" equivalent, or better this:
A DFA is minimal if, for every pair of states p and q, we can find a distinguishing string x ∈ Σ* such that δ*(p,x) ∈ F and δ*(q,x) ∉ F, or vice-versa.
Rabin-Scott "worst case" example is minimal
Reconsider the example 2.2.2 in the textbook. We want to argue the worst case scenario of Rabin-Scott construction exists, namely an NFA with O(n) states whose minimal DFA contains O(2n) states. In particular our NFA contains exactly n+1 states and the DFA exactly 2n states. The language in question is constructed from alphabet Σ with n symbols defined as follows:
{ w : w omits at least one symbol σ ∈ Σ }
Create an NFA with the non-start states of the labelled by the
alphabet symbols indicating strings from
this state which omit the symbol.
We have already visited the example of NFA
for the four-symbol alphabet {a,b,c,d} in the
Nondeterministic Finite Automata document.
The language is this:
(b∪c∪d)*∪(a∪c∪d)*∪(a∪b∪d)*∪(a∪b∪c)*and the NFA is this:
The minimal DFA has all 2n
states matching every subset of ΣUP, where n = |Σ|.
The state transitions of the DFA are easy to define based on the way
the NFA is set up with symbols corresponding to state names:
δ(Q,σ) = Q - { upper_case(σ) }, for all Q ⊆ ΣUP, σ ∈ Σ
There are two parts to proving this claim:
- The DFA constructed by the Rabin-Scott construction consists of all 2n states as indicated. We have already argued this point.
- Any pair of states of the DFA are distinguishable.
Take any two distinct states P and Q,
represented by subsets of ΣUP.
For simplicity, ignore the distinction between upper case states and lower case symbols.
Both P-Q and Q-P cannot be empty,
and so assume, without loss of generality,
that P-Q is non-empty. We're saying that
there is at least one symbol in P which is not in =Q.
Create the string
w = σ1···σk, where Q = { σ1, ..., σk } (w = ε if Q = ∅)
Then, because w contains every symbol of Q, running the DFA from Q will deplete the set, i.e.,δ*(Q,w) = ∅ (a non-final state)However, since P contains symbols which are not in Q, running the DFA from Q will leave the remaining symbols, i.e.,δ*(P,w) = P-Q ≠ ∅ (a final state)This proves that that P and Q are distinguishable.
Algorithmic construction of the Minimal DFA
The Myhill-Nerode theorem gives us a theoretical representation of the minimal DFA in terms of string equivalence classes. The previous section gives as a less theoretical representation in terms of state-equivalence classes; however, we do not yet know an algorithmic way of constructing these state-equivalence classes. Saying that two states are indistinguishable means that they both run to final states or both to non-final states for all strings. Obviously we do not test all strings. The idea is to compute the indistinguishability equivalence classes incrementally. We say:p and q are k-distinguishable if they are distinguishable by a string of length ≤ kIt's easy to understand the inductive property of this relation:
p and q are k-distinguishable if they are (k-1)-distinguishable per se, orThe construction of the equivalence classes starts like this:
δ(p,σ) and δ(q,σ) are (k-1)-distinguishable for some symbol σ ∈ Σ
- p and q are 0-indistinguishable if they are both final or both non-final. So we start the algorithm with the states divided in two partitions: final and non-final.
- within each of these two partitions: p and q are 1-distinguishable if there is a symbol σ so that δ(p,σ) and δ(q,σ) are 0-distinguishable, i.e. one is final and the other not. By doing so we futher partition each group into sets of 1-indistinguishable states.
p and q
within a partition set are k-distinguishable if there is a symbol
σ so that δ(p,σ)
and δ(q,σ) are (k-1)-distinguishable.
At some point we cannot subdivide the partitions further.
At that point terminate the
algorithm, because no further step can produce any new subdivision.
When we terminate, we have the indistinguishability equivalence classes which
form the states of the minimal DFA.
The transition from one equivalence class to another is obtained by
choosing an arbitrary state in the source class, applying the transition and
then taking the entire target class as the target state.
Example from figure 2-20
- Start by separating final and non-final: {q1,q3}, {q2,q4,q5,q6}
- Distinguish states within the groups: {q1,q3}, {q4,q6}, {q5}, {q2}
- We cannot split the two non-singleton groups further, the algorithm terminates.
| a | b | |
| {q1,q3} | {q2} | {q4,q6} |
| {q4,q6} | {q1,q3} | {q5} |
| {q2} | {q5} | {q1,q3} |
| {q5} | {q5} | {q5} |
Example 2
- Start by separating final and non-final states: {1,2}, {0,3,4,5}
-
All of 0,3,4 go to final states by some transition,
5 does not.
Basically, 5 is a dead state, but none of the others are dead.
{1,2}, {0,3,4}, {5} - An a-transition sends 3 outside the {0,3,4} set: {1,2}, {0,4}, {3}, {5}
- We cannot make any further sub-partitions.