## Grammar Undecidable Problems

### Undecidability for unrestricted grammars

We've shown that unrestricted, type-0 grammars are equivalent to TMs in the sense that a language L is accepted by a TM if and only if it can be generated by an unrestricted grammar. Therefore the following theorem is no surprise:

Theorem 5.5.1 Each of these are undecidable:
1. Given a grammar G and string w, is w ∈ L(G)?
2. Given a grammar G, is ε ∈ L(G)?
3. Given a grammar G, is L(G) = ∅?
4. Given a grammar G, is L(G) = Σ*?
5. Given grammars G1, G2, is L(G1) = L(G2)?
These are all proved by problem reduction as described in the previous section, although there is no clear underlying language reduction.

Proof of (a). Reduce the halting problem the grammar problem (a).
```f( (M,w) ) = (G,w)
```
where G is the grammar equivalent to M. The conclusion is obvious.

Recall that these problems are decidable:
1. Given a context-free grammar G and string w, is w ∈ L(G)?

Convert to CNF, use CYK algorithm.
2. Given a context-free grammar G, is ε ∈ L(G)?

Proof: Remove useless productions, are there any remaining ε rules.
3. Given a context-free grammar G, is L(G) = ∅?

Proof: Remove useless productions. Are there any productions remaining?
However, these problems are undecidable:
1. Given a context-free grammar G, is L(G) = Σ*?
2. Given context-free grammars G1, G2, is L(G1) = L(G2)?
3. Given context-free grammars G1, G2, is L(G1) ∩ L(G2) = ∅?
These related problems about PDAs are also undecidable:
1. Given PDAs M1, M2, is L(M1) = L(M2)?
2. Find a mimimal PDA (in terms of number of states) for a context-free language.
Note that trying to use "complementation" to solve (a) will not work, because the complement of a context-free language is not necessarily context free. In fact, the following proof effectively shows that the complement of a context-free language may not even be recursive (i.e. decidable by a TM).

### Decidability issues for all grammars

The following chart is taken from Introduction to Automata Theory, Languages and Computation, by Hopcroft & Ullman. The key is:
D = Decidable
U = Undecidable
? = Open question
 w ∈ L? L = ∅? L = Σ*? L1 ⊆ L2? L1 = L2? L1 ∩ L2 = ∅? Regular(type 3) D D D D D D Det. Context Free D D D U ? U Context Free(type 2) D D U U U U Context Sensitive(type 1) D U U U U U Recursive D U U U U U Recursively Enumerable(type 0) U U U U U U

### Undecidability of L(G) = EveryThing

We want to prove is the following reduction:

Given a general grammar G, find a context-free language D, such that
L(G) = ∅   if and only if   D = Σ*
This language D, as you may expect, is rather convoluted. The idea is that its complement, D, somehow represents derivations in G. To say L(G) is empty is equivalent to saying D is empty, or that D = Σ*. The point is that, miraculously, D is context-free!

The language D

For every rule α → β, introduce a new non-terminal symbol A and replace this rule by two rules:
```α → A
A → β
```
For example, for the anbncn grammar:
```S → aSBC      ⇒    S → A1,   A1 → aSBC
S → ε         ⇒    S → A2,   A2 → ε
CB → BC       ⇒    CB → A3,  A3 → BC
aB → ab       ⇒    aB → A4,  A4 → ab
bB → bb       ⇒    bB → A5,  A5 → bb
bC → bc       ⇒    bC → A6,  A6 → bc
cC → cc       ⇒    cC → A7,  A7 → cc
```
Consider the derivation of, say, "aabbcc" with this modified grammar:
```S⇒A1⇒aSBC⇒aA1BC⇒aaSBCBC⇒aaA2BCBC⇒aaBCBC⇒aaBA3C⇒aaBBCC
⇒aA4BCC⇒aabBCC⇒aaA5CC⇒aabbCC⇒aabA6C⇒aabbcC⇒aabbA7⇒aabbcc
```
We've written this derivation smashed together for a reason. We can think of this derivation as a string in the language
```Σ∪{⇒}
```
written:
```x0 =  S⇒x1⇒x2⇒x3⇒...⇒xn
```
The characteristics of this string are:
1. starts with S, ends with a string in Σ*
2. The number of xi's, and hence the length of the derivation must be even.
3. each special non-terminal appears in each odd xi and not in any even xi
The language D is a subtle modification of these type of strings. It consists of all strings of the form:
```S⇒x1R⇒x2⇒x3R⇒...⇒xn
```
The odd positions, with the special non-terminals are reversed. This is how a string in the language D is constructed. For example, the string in D corresponding to the above derivation example is:
```S⇒A1⇒aSBC⇒CBA1a⇒aaSBCBC⇒CBCBA2aa⇒aaBCBC⇒CA3Baa⇒aaBBCC
⇒CCBA4a⇒aabBCC⇒CCA5aa⇒aabbCC⇒CA6baa⇒aabbcC⇒A7bbaa⇒aabbcc
```
This string in D can be characterized in this way:
1. The string is of the form (V*⇒)+V*
2. Starts with "S⇒"
3. Ends with "⇒w", where w ∈ Σ*
4. Contains an even number of symbols.
5. Counting strings separating occurrences, the odd-even portions are of this form:
```βRAiαR⇒αγβ
```
using the rule Ai → γ
6. Counting strings separating occurrences, the even-odd portions are of this form:
```αγβ⇒βRAiαR
```
using the rule Ai → γ
What we mean by characterized, is that if at least one of these properties does not hold, then the string is not in D. Again we repeat the mantra:
L(G) = ∅ if and only if D = ∅
And so,
L(G) = ∅ if and only if D = Σ*
To be in D means it has to violate one of the characterizations above, and so
D = "not-satisfy-0" ∪ "not-satisfy-1" ∪ "not-satisfy-2" ∪ "not-satisfy-3" ∪ "not-satisfy-4" ∪ "not-satisfy-5"
The languages "satisfy-0", "satisfy-1", "satisfy-2", "satisfy-3" are all regular, and so these as well as their complements "not-satisfy-0", "not-satisfy-1", "not-satisfy-2", "not-satisfy-3". What remains of the proof is to show that these are context-free:
"satisfy-0,1,2,3-but-not-4" and "satisfy-0,1,2,3-but-not-5"
We use a PDA to show that "satisfy-0,1,2,3-but-not-4" is context-free; the proof for "satisfy-0,1,2,3-but-not-5" is similar.

The idea is to look at each odd-even portion (which can be easily controlled by the PDA) and accept the string if at least one of the pairs is not of the form:
```βRAiαR⇒αγβ
```
It boils to to arguing that there is a PDA which accepts a substring
```x⇒y
```
if it is not of the above form. This is a fairly straightforward argument and it clarifies exactly why we need to reverse every other element so that we can stack and match off. ■