DFA Minimization

Minimization Theory in a Nutshell

String equivalence with respect to a language

Definition 2.5.1 in the textbook. Given a language, L, write x ≈ y if
for all z ∈ Σ*, xz ∈ L  ⇔  yz ∈ L
Loosely we say that x and y "share a common fate" in terms of acceptance.

Claim: The relation is an equivalence relation.

Thus, we can talk about the equivalence classes with respect to . Write [x] as the equivalence class containing string x:
[x] = { u : x ≈ u }

The minimal DFA

It turns out that these equivalence classes, if finitely many, form the minimal DFA for L:
Min(L) = (KL,Σ,δL,sL,FL)
where:
KL = { [x] : x ∈ Σ* }
We will talk more about the other DFA structure features later.

As an example, let
L = { w ∈ {a,b}*: #a(w) is even }
Take the string "a". It is equivalent to "ab" w.r.t. L because
for all z ∈ Σ*,
az has an even # of a's if and only if abz has an even # of a's
There are only 2 equivalence classes, depending on the parity (even or odd) of the number of a's:
[ε] = { ε, aa, b, bb, aba, baa, ... }
[a] = { a, ab, ba, bab, abb, aaba, ... }
These 2 classes form the states of the minimal DFA for L:
Another related example is
L = { w ∈ {a,b}*: #a(w) is even and #b(w) is odd }
There are 4 equivalence classes based on the parity of the number of a's and the number of b's. The minimal DFA for L is:

The δ* function

Given a DFA M=(K,Σ,δ,s,F), define
δ*: K × Σ* ⇾ K
in terms of the yields relation:
δ*(p,w) = q    if and only if   (p,w) ⊢* (q,ε)

Relationship of any DFA to the DFA Min(L)

Let L be a regular language. and let this be some arbitrary DFA for L:
M=(K,Σ,δ,s,F)
Consider the mapping: T : K ⇾ KL defined by:
T( δ*(s,x) ) = [x]
We need to argue that the T function is well-defined, meaning that:
δ*(s,x) = δ*(s,y)  ⇒  x ≈ y 
This is pretty obvious. In poetic English,
If x and y run to the same state,
then x and y must share the same fate.
The fact that we can choose any string x implies that the T function is a surjection (an onto mapping) from K ⇾ KL. Consequently, Then we follow with the Myhill-Nerode Theorem which proves that the equivalence classes in fact do form the states of a DFA for L.

Myhill-Nerode Theorem

Define Min(L) = (KL,Σ,δL,sL,FL) where:
KL = { [x] : x ∈ Σ* }
sL = [ε]
FL = { [x] : x ∈ L }
δL([x],σ) = [xσ]
The theorem states that this is the unique minimal DFA for L. A key step in proving the theorem is showing that δL is well-defined, i.e.
x ≈ y  ⇒  xσ ≈ yσ
Observe that Min(L) is defined in terms of the language L per se, affording it a status of being independent of any DFA which recognizes L.

Uniqueness of Min(L)

We know that Min(L) is minimal in terms of the number of states, but it is important to understand better what it means to be the unique DFA for L. Loosely, is that any other minimal DFA M = (K,Σ,δ,s,F) is essentially the same as Min(L). More precisely, consider the mapping
T(q) = [x], where q = δ*(s,x)
T is onto (surjective) and if M is minimal then the number of states must be equal. Consequently T must be one-to-one (injective), making T a bijection, or a one-to-one correspondence between states of M and states of Min(L). The additional key point is that T is structure-preserving, i.e.,
T(δ(p,σ)) = δL(T(p),σ)
In English, this means that the states of M and Min(L) correspond to each other in a way that the transitions also correspond. In algebraic terms, T is called an isomorphism.

Recognizing a minimal DFA

Regarding the Lewis textbook, we have outlined the material through page 97. The Myhill-Nerode Theorem is theoretically useful, but it is ultimately only an existence proof, not the construction of the minimal DFA. We need to better understand how to proceed with a given DFA for a language L. To start, we need to define yet another equivalence relation among states of a DFA.

Write p ≡ q if,
for all z ∈ Σ*, δ*(p,z) ∈ F ⇔ δ*(q,z) ∈ F.
If p ≡ q, we say p and q are indistinguishable, otherwise, they are distinguishable.

This is basically expressing the same "share a common fate" idea as is expressed by , except that it is expressed in terms of states.

We want to say that the two equivalence relations, and express exactly the same idea. Technically, this is what needs to be proved:
(A)  x ≈ y   ⇔   δ*(s,x) ≡ δ*(s,y)
Here's the proof:
x ≈ y   ⇔   for all z, xz ∈ L ⇔ yz ∈ L
        ⇔   for all z, δ*(s,xz) ∈ F ⇔ δ*(s,xz) ∈ F
        ⇔   for all z, δ**(s,x),z) ∈ F ⇔ δ**(s,x),z) ∈ F
        ⇔   δ*(s,x) ≡ δ*(s,y)

Alternative representation of the minimal DFA

Given DFA M = (K,Σ,δ,s,F), the equivalence class of state p is [p] = { q : p ≡ q }. We can now give another more concrete representation of the minimal DFA:
M′ = (K′,Σ,δ′,s′,F′)
where
K′ = { [q] : q ∈ K }
s′ = [s]
F′ = { [f] : f ∈ F }
δ′([q],σ) = [ δ(q,σ) ]
In addition to other steps we have to argue that δ′ is well-defined, i.e.,
p ≡ q  ⇒  δ(p,σ) ≡ δ(q,σ)
The correspondence between the states of M′ and Min(L) is then written as follows:
T′: K′ ⇾ KL
T′( [ δ′*([s],x) ] ) = [ x ]
The statement (A) above guarantees that T′ is well-defined and one-to-one, implying that M′ is indeed (another representation of) the minimal DFA.

Test for being a minimal DFA

One consequence of this minimal DFA construction is that
A DFA is minimal if and only if all states are distinguishable.
This means that no two states are "" equivalent, or better this:
A DFA is minimal if, for every pair of states p and q, we can find a distinguishing string x ∈ Σ* such that δ*(p,x) ∈ F and δ*(q,x) ∉ F, or vice-versa.

Rabin-Scott "worst case" example is minimal

Reconsider the example 2.2.2 in the textbook. We want to argue the worst case scenario of Rabin-Scott construction exists, namely an NFA with O(n) states whose minimal DFA contains O(2n) states. In particular our NFA contains exactly n+1 states and the DFA exactly 2n states.

The language in question is constructed from alphabet Σ with n symbols defined as follows:
{ w : w omits at least one symbol σ ∈ Σ }
We have already visited the example of NFA for the four-symbol alphabet {a,b,c,d} in the Nondeterministic Finite Automata document. The language is this:
(b∪c∪d)*∪(a∪c∪d)*∪(a∪b∪d)*∪(a∪b∪c)*
and the NFA is this:
In general, the state set of the NFA is {0} ∪ ΣUP, where ΣUP signifies the upper-case version of the alphabet symbols. The derived DFA is the following, consisting of NFA state sets. All states are final except the empty set.
Our claim is that
The minimal DFA consists of all 2n states matching every subset of ΣUP, where n = |Σ|.
The state transitions of the DFA are easy to define based on the way the NFA is set up with symbols corresponding to state names:
δ(Q,σ) = Q - { upper_case(σ) }, for all Q ⊆ ΣUP, σ ∈ Σ
There are two parts to proving this claim:
  1. The DFA constructed by the Rabin-Scott construction consists of all 2n states as indicated. We have already argued this point.

  2. Any pair of states of the DFA are distinguishable.

    Take any two distinct states P and Q, represented by subsets of ΣUP. For simplicity, ignore the distinction between upper case states and lower case symbols.

    Both P-Q and Q-P cannot be empty, and so assume, without loss of generality, that P-Q is non-empty. We're saying that there is at least one symbol in P which is not in Q.

    Create the string
    w = σ1···σk,  where Q = { σ1, ..., σk }  (w = ε if Q = ∅)
    
    Then, because w contains every symbol of Q, running the DFA from Q will deplete the set, i.e.,
    δ*(Q,w) = ∅  (a non-final state)
    
    However, since P contains symbols which are not in Q, running the DFA from Q will leave the remaining symbols, i.e.,
    δ*(P,w) = P-Q ≠ ∅  (a final state)
    
    This proves that that P and Q are distinguishable.

Algorithmic construction of the Minimal DFA

The Myhill-Nerode theorem gives us a theoretical representation of the minimal DFA in terms of string equivalence classes. The previous section gives as a less theoretical representation in terms of state-equivalence classes; however, we do not yet know an algorithmic way of constructing these state-equivalence classes.

Saying that two states are indistinguishable means that they both run to final states or both to non-final states for all strings. Obviously we do not test all strings. The idea is to compute the indistinguishability equivalence classes incrementally. We say:
p and q are k-distinguishable if they are distinguishable by a string of length ≤ k
It's easy to understand the inductive property of this relation:
p and q are k-distinguishable if they are (k-1)-distinguishable per se, or
δ(p,σ) and δ(q,σ) are (k-1)-distinguishable for some symbol σ ∈ Σ
The construction of the equivalence classes starts like this:
  1. p and q are 0-indistinguishable if they are both final or both non-final. So we start the algorithm with the states divided in two partitions: final and non-final.
  2. within each of these two partitions: p and q are 1-distinguishable if there is a symbol σ so that δ(p,σ) and δ(q,σ) are 0-distinguishable, i.e. one is final and the other not.

    By doing so we further partition each group into sets of 1-indistinguishable states.
The idea then is to keep splitting these partitioning sets as follows:
p and q within a partition set are k-distinguishable if there is a symbol σ so that δ(p,σ) and δ(q,σ) are (k-1)-distinguishable.
At some point we cannot subdivide the partitions further. At that point terminate the algorithm, because no further step can produce any new subdivision. When we terminate, we have the indistinguishability equivalence classes which form the states of the minimal DFA.

The transition from one equivalence class to another is obtained by choosing an arbitrary state in the source class, applying the transition and then taking the entire target class as the target state.

Example from figure 2-20

  1. Start by distinguishing final and non-final:   {q1,q3}, {q2,q4,q5,q6}
  2. Distinguish states within the groups, to get the next partition:
    {q1,q3}, {q4,q6}, {q5}, {q2}
    
    How?

    b distinguishes q2, q4:
    δ(q2,b) ∈ {q1,q3},  δ(q4,b) ∈ {q2,q4,q5,q6}
    
    b distinguishes q2, q5:
    δ(q2,b) ∈ {q1,q3},  δ(q5,b) ∈ {q2,q4,q5,q6}
    
    a distinguishes q4, q5:
    δ(q4,a) ∈ {q1,q3},  δ(q5,a) ∈ {q2,q4,q5,q6}
    
    neither a nor b distinguishes (q4,q6)
  3. We cannot split the two non-singleton groups further, the algorithm terminates.
The minimal DFA has start state {q1,q3}, single final state {q4,q6} with this transition function:
ab
{q1,q3}{q2}{q4,q6}
{q4,q6}{q1,q3}{q5}
{q2}{q5}{q1,q3}
{q5}{q5}{q5}

Examples from NFAs Document

Recall the examples in The first was a NFA constructed to accept (a∪b)*aba(a∪b)*, here with states rename for convenience:
Our diagram suggests that the states are all equivalent. Let's follow it through a bit.
  1. The initial grouping is by final/non-final: {A,B,C} {D,E,F}
  2. a distinguishes C from {A,B}, getting {A,B}, {C}, {D,E,F}
    δ(C,a) ∈ {D,E,F},  δ(A,a), δ(B,a) ∈ {A,B,C}
    
  3. b distinguishes A and B, getting {A}, {B}, {C}, {D,E,F}
    δ(A,b) ∈ {A,B},  δ(B,b) ∈ {C}
    
  4. Nothing distinguishes states in {D,E,F}

The second example is the DFA constructed from an NFA which accepts the language
(a*∪(ab)*)b*
where the states have been relabeled after the NFA-to-DFA construction.
We want to prove that this is the minimal DFA. To do so we use the construction of the minimal DFA and discover that each state ends up in its own equivalence class, i.e., every state is distinguishable from every other state.
Final      Non-final
  

Example 2

  1. Start by separating final and non-final states:   {1,2}, {0,3,4,5}
  2. All of 0,3,4 go to final states by a b transition, 5 does not (5 is a dead state). So we have 3 groups:
    {1,2},  {0,3,4},  {5}
    
  3. An a-transition takes 3 to 5 and 0,4 to 3. So we split {0,3,4} into two groups, getting:
    {1,2},  {0,4},  {3},  {5}
    
We cannot make any further sub-partitions and so the mimimal DFA is:


© Robert M. Kline