Non-regular Languages

The existence of non-regular languages is guaranteed by the fact that the regular languages of any alphabet are countable, and we know that the set of all subsets of strings is not countable. Nevertheless, the point of establishing non-regular languages is not so much one of existence, but of illustrating that certain languages which are "computable" in some sense are not regular.

Example

The example we want to show is non-regular is this one:
L = { w ∈ {a,b}* : w = aibi, for some i ≥ 0 }
What is it about this language which makes it non-regular? Imagine processing a string w ∈ L. As we see a's we would have to remember how many we've seen and then use that number to match off b's and accept if the numbers are equal. If we knew that there were, say, no more than 5 a's, then the following DFA would serve (missing transitions go to dead states):
The problem is that we only have finite memory, and so at some point we will exhaust the number of a's we can remember. The idea is to make precise the finite memory limitation. It is done so by the following classical Pumping Lemma.

Pumping Lemma

Let L be a regular language, then there is a N > 0 which can be used as follows.

For each w ∈ L such that |w| ≥ N, w can be written as
w = xyz
such that:

Proof:

Take a DFA, M, for L. Let N = the number of states. Take a string w, let n = |w| and assume n ≥ N. Write w = σ1...σn and consider the transitions which would occur when processing w from start state q0:
δ(q01) = q1, δ(q12) = q2, ..., δ(qn-1n) = qn ∈ F
Consider the N+1 states {q0,q1,...,qN} related to the first N transitions. Here is a depiction:
By the pigeonhole principle, at least two are equal: qi = qj, where 0 ≤ i < j ≤ N. Then write w = xyz where
(q0,xyz) ⊢* (qi,yz)  ⊢* (qj,z)  ⊢* (qn,ε) 
The substring y makes the transition from qi to qj, where i < j, and so |y| > 0. Write q = qi = qj as the common value. Here is a depiction:
Rewriting the transitions with q's substitution gives:
(q0,xyz) ⊢* (q,yz)  ⊢* (q,z)  ⊢* (qn,ε) 
and so we can "pump" the substring y, looping from q back to itself any number of times:
(q,yiz)  ⊢* (q,z), for all i ≥ 0
Thus,
(q0,xyiz) ⊢* (q,yiz)  ⊢* (q,z)  ⊢* (qn,ε), for all i ≥ 0
meaning that xyiz is accepted for all i ≥ 0. ■

Common Proofs

The result of the pumping lemma says that
w = xyz  and  xyiz ∈ L
You want to create a false statement due to the extra inclusions in L, and usually you can generate the false statement by one of these two statements:
xz ∈ L     (i = 0)
xyyz ∈ L   (i = 2)

Classic non-regular example

The proof of non-regularity of a language using the pumping lemma is a proof by contradiction. The goal is to assume that the language is regular and then derive strings which are not in the language, thereby contradicting the regularity assumption.

This language is not regular:
L0 = { w ∈ {a,b}* : w = aibi, for some i ≥ 0 }
Suppose this were regular. Let N be the constant selected by the pumping lemma. Consider the string w = aNbN. Then |w| = 2·N > N, and so we can write w = xyz according to the consequences of the lemma. Since |xy| ≤ N and |y| > 0, it must be that y is a non-empty substring of the aN part, i.e. y = ak for some k > 0.

According to the pumping lemma, xyyz = aN+kbN must also belong to L for k > 0, a contradiction.

Related non-regular languages

These languages are not regular:
L = { w ∈ {a,b}* : w = ambn, for m > n }
L = { w ∈ {a,b}* : w = ambn, for m < n }
From the given N, use this as the string in the former:
w = aN+1bN
and this in the latter:
w = aNbN+1
This language is not regular:
L = { w ∈ {a,b}* : #a(w)  =  #b(w) } 
where #σ(w) is the number of occurrences of σ in w.

For a slightly different approach, this proof by contradiction relies on the previous example and closure properties. If L were regular then, since a*b* is regular, this language would also have to be regular:
L ∩ a*b* = { w ∈ {a,b}* : w = aibi, for some i ≥ 0 } = L0
We just proved the non-regularity of L0, which is a contradiction. Therefore L cannot be regular.

This language is not regular:
L = { w·wR : w ∈ {a,b}* }
This is the set of all even-length palindromes. One approach is to simplify by restricting the possibilities as we did Consider:
L ∩ a*bba*
Try completing the argument.


© Robert M. Kline