## Non-CFLs & Closure Properties

### Pumping Lemma for CFLs

For regular languages, the pumping lemma arises out of the presence of a repeated state in a DFA when the accepting string is sufficiently long. What's the analogy for CFLs? It's the occurrence of a repeating symbol in a derivation of a terminal string w from S.

The preferred model for understanding and proving the CFL pumping lemma is the parse tree. The diagram in Figure 3-9 in the textbook tells almost the whole story:
A recurring symbol A appears as a non-terminal on some path from the root (S) to a leaf (a terminal symbol or ε). Simply saying that A occurs twice in a derivation:
```S ⇒* α1Aβ1 ⇒+ α2Aβ2 ⇒* w
```
does not express the true idea we want to convey, because we really mean to say that the second A appears as a result of a derivation from the first A, i.e.,
```S ⇒* α1Aβ1
A ⇒+ α2Aβ2
α1α2Aβ2β1 ⇒* w
```
Parse trees present the idea much more succinctly. Given the structure in the diagram above we can understand the decomposition of the target string:
```w = uvxyz
```
as portions of the yield corresponding to sections of the parse tree. We can use this structure to "pump" the repeated portion, for example, deriving the strings
```uxz  and  uvvxyyz
```
by the respective "tree surgery" procedures:

Mapping this picture to derivations, we have:
```S ⇒* uAz
A ⇒* vAy
A ⇒* x
```
giving us:
```S ⇒* uAz ⇒* uvAyz ⇒* uvxyz
```

#### Pumping Lemma Statement

Let G be a CFG, then there is a number N, such that if w ∈ L(G) and |w| > N, then
```w = uvxyz
```
where v ≠ ε or y ≠ ε and
```uvixyiz ∈ L(G), for all i ≥ 0
```

#### Proof

The first thing to determine is what N should be. The idea has to do with the so-called fanout of a grammar G. Define
f = the largest number of symbols on the right-hand side of any production.
It's a fairly simple inductive argument that any parse tree with height h has at most fh leaves. This implies that the yield of a parse tree of height h has length at most fh. Now let
t = |V - Σ|, i.e., the number of non-terminal symbols.
and let N = ft.

Restating, a parse tree with height t has yield of length at most N.

Suppose we have a string w ∈ L(G) for which |w| > N. Then w has a parse tree with root S and yield w whose height is greater than t.

This means:
• there is a path from the root to a leaf of length at least t+1, and so
• there are at least t+2 nodes in the path, and so
• there are at least t+1 non-terminal symbols on this path.
By the pidgeonhole principle, there must be one non-terminal, A, which repeats.

So we conclude that in every parse tree of w, there is a repeating symbol as we have indicated. The remaining key point to argue is that there is some parse tree for which v and y are not both empty.

Consider all parse trees for w. Choose a parse tree T which has the smallest number of nodes. Then consider the first diagram above. If v and y were both empty, then we could generate the same string by replacing T′ by T″, getting a parse tree for w with fewer nodes and thereby contradicting the minimality of T. ■

### The classical non-CFL language

The classical result is to show that the language
```{ anbncn : n ≥ 0 }
```
is not context-free. The proof is obvious but quite tedious. Clearly we would have to be able to pump on 3 or 0 substrings to get a string back in the language, but the lemma allows either 1 or 2 pumping substrings. Express:
```anbncn = uvxyz
```
and look at one repeat
uv2xy2z where v ≠ ε or y ≠ ε.
No matter how it breaks down, this string cannot be in the language. Consider some possibilities:
• suppose either or both of v and y contains more than one symbol, e.g., if v ≠ ε, then
```v = rasbt, or rwhere v ≠ ε or y ≠ ε. sct, or rasct
```
where r,s,t ∈ {a,b,c}*, then v2 will have symbols out of order for L
• suppose v ∈ a+, y ∈ a*, then uv2xy2z will have more a's than b's
• suppose v ∈ a+, y ∈ b*, then uv2xy2z will have more a's than c's
• suppose v ∈ b*, y ∈ c+, then uv2xy2z will have more c's than a's

#### Failure of closure under intersection

Consider the languages
```{anbn: n ≥ 0}·c*  and  a*·{bncn: n ≥ 0}
```
These are both context free, but the intersection is the language above which is not.

#### Failure of closure under complementation

This is an application of DeMorgan's Laws. CFLs are closed under union. If they were closed under complementation then they would consequently be closed under intersection by
```L1 ∩ L2 = L1 ∪ L2
```

### CFLs closed under intersection with a regular language

This is a direct consequence of the equivalence of CFLs and PDAs. The textbooks discusses this in section 3.5. The proof uses a version of the product construction which can be used to argue that intersection of two regular languages is regular. Suppose we have:
```PDA: (K1,Σ,Γ,Δ1,s1,F1)
DFA: (K2,Σ,δ,s2,F2)
```
The product PDA is:
```(K1×K2, Σ, Γ, Δ, (s1,s2), F1×F2)
```
where Δ has the following transitions:
1. For σ ∈ Σ, for each transition pair
```((p1,σ,α),(q1,β)) ∈ Δ1
δ(p2,σ) = q2
```
```( (p1,p2),σ,α), ((q1,q2),β) ) ∈ Δ
```
2. For each transition
```((p1,ε,α),(q1,β)) ∈ Δ1
```
for all p ∈ K2, add to Δ
```( (p1,p),ε,α), ((q1,p),β) ) ∈ Δ
```
The first type of transition added reflects processing the same symbol σ ∈ Σ in both the PDA and the DFA. The second type of transition added reflects that the PDA does not consume input and therefore the DFA leaves the state alone.

#### Example:

```{ wwR : w ∈ {a,b}* } ∩ { w ∈ {a,b}*: w contains no bb }
```
Using the machine representations, we want to create:
 ∩

The outcome is:
As you see, the product construction for PDA and DFA is more like mixing apples and oranges because the DFA focuses on the states and the PDA on the stack.

#### Equal a's, b's, c's not a CFL

Using the closure under intersection with a regular language, we can argue that this is not a CFL:
```L = { w ∈ {a,b,c}*: #a(w) = #b(w) = #c(w)}
```
If it were, then so would be
```L ∩ a*b*c* = { anbncn : n ≥ 0 }
```
Note that we cannot use the Pumping Lemma directly, by pumping the string
```w = anbncn = uvxyz
```
Because we could end up with
```u = ab, v = c
```
in which case pumping will not go outside the language L!

### Closure Properties of DCFLs

DCFLs are defined in a totally different way from CFLs. A CFL is defined a being generated from a grammar and a DCFL is defined as being accepted by a DPDA. The link is the result of section 3.4:
Every CFL is accepted by a PDA, the language accepted by a PDA is a CFL (i.e., generated by a grammar).
The language { w·wR : w ∈ {a,b}* } is intuitively not a DCFL, but finding a proof of this fact appears to be very difficult.
1. DCFLs are CFLs.
This sounds right, but technically you have to argue about what to do when the "\$" is no longer part of the picture.
2. Regular Languages are DCFLs.
This is obvious because a DFA is basically a DPDA with no stack usage.
3. DCFLs are closed complementation.
This is proved in Theorem 3.7.1 in the textbook. It is a difficult technical result which illustrates how a DPDA for a language L can be used to construct a DPDA for Σ*-L.

From this we can argue that the language is a DCFL:
```L = {w ∈ {a,b}*: #a(w) ≠ #b(w)}
```
4. DCFLs are not closed under intersection.
Assuming that {anbncn: n ≥ 0} is not a CFL (argued in section 3.5), we use the argument that CFLs are not closed under intersection:
```{anbncn: n ≥ 0} = {anbn: n ≥ 0}·c*  ∩  a*·{bncn: n ≥ 0}
```
Both the component languages making up the intersection are DCFLs. It's easy to make up DPDAs for both. However the intersection is not a CFL, hence not a DCFL.
5. DCFLs are not closed under union.
If they were, then, by DeMorgan's Laws, using closure under complementation, they would have to be closed under intersection.

The failure of closure under union for DCFLs makes sense because union, offering a choice of two alternatives is precisely where determinism might break down.
6. DCFLs, like CFLs, are closed under intersection with a regular language.
Section 3.4 illustrates the product construction proving that:
The intersection of a CFL with a regular language is a CFL.
This same product construction works to prove that DCFL intersected with a regular language is a DCFL, the difference is that we start with a DPDA. A key point is that the DFA used in the product construction cannot introduce non-determinism into the constructed DPDA.
7. DCFLs are not closed under concatenation.
The problem is that a string in the first language loses the presence of the "\$" marker. For example, consider the concatenation of these two DCFLs:
```{ambn: m < n} · { wcwR: w ∈ {a,b}* }
```
This string abbb·bbbacabbb is accepted, but how would you know on which b you should "jump across" from one language to the other and stop consuming the b and start stacking it?

#### Example of a CFL language which is not DCFL

Refer to the classic non-CFL language
```L0 = {anbncn: n ≥ 0}
```
We can argue that complement Σ*-L0 is a CFL. Rewrite as follows:
```Σ* - L0 = (Σ* - a*b*c*) ∪ {ambncp: m ≠ n or n ≠ p or m ≠ p}
```
The first component of the union is regular. The second component is the union of these 6 CFLs:
```{ambncp: m < n}
{ambncp: m > n}
{ambncp: n < p}
{ambncp: n > p}
{ambncp: m < p}
{ambncp: m > p}
```
Therefore Σ* - L0 is a CFL. However, it cannot be a DCFL, because if it were, then its complement, L0, would be a DCFL and hence a CFL, a contradiction.