Undecidability for unrestricted grammars
We've shown that unrestricted, type-0 grammars are equivalent to TMs in the sense that a language L is accepted by a TM if and only if it can be generated by an unrestricted grammar. Therefore the following theorem is no surprise: Theorem 5.5.1 Each of these are undecidable:- Given a grammar G and string w, is w ∈ L(G)?
- Given a grammar G, is ε ∈ L(G)?
- Given a grammar G, is L(G) = ∅?
- Given a grammar G, is L(G) = Σ*?
- Given grammars G1, G2, is L(G1) = L(G2)?
f( (M,w) ) = (G,w)where G is the grammar equivalent to M. The conclusion is obvious.
Problems about context-free grammars
Recall that these problems are decidable:- Given a context-free grammar G and string w, is w ∈ L(G)? Convert to CNF, use CYK algorithm.
- Given a context-free grammar G, is ε ∈ L(G)? Proof: Remove useless productions, are there any remaining ε rules.
- Given a context-free grammar G, is L(G) = ∅? Proof: Remove useless productions. Are there any productions remaining?
- Given a context-free grammar G, is L(G) = Σ*?
- Given context-free grammars G1, G2, is L(G1) = L(G2)?
- Given context-free grammars G1, G2, is L(G1) ∩ L(G2) = ∅?
- Given PDAs M1, M2, is L(M1) = L(M2)?
- Find a mimimal PDA (in terms of number of states) for a context-free language.
Decidability issues for all grammars
The following chart is taken from Introduction to Automata Theory, Languages and Computation, by Hopcroft & Ullman. The key is:
D = Decidable
U = Undecidable
? = Open question
U = Undecidable
? = Open question
w ∈ L? | L = ∅? | L = Σ*? | L1 ⊆ L2? | L1 = L2? | L1 ∩ L2 = ∅? | |
---|---|---|---|---|---|---|
Regular (type 3) | D | D | D | D | D | D |
Det. Context Free | D | D | D | U | ? | U |
Context Free (type 2) | D | D | U | U | U | U |
Context Sensitive (type 1) | D | U | U | U | U | U |
Recursive | D | U | U | U | U | U |
Recursively Enumerable (type 0) | U | U | U | U | U | U |
Undecidability of L(G) = EveryThing
We want to prove is the following reduction: Given a general grammar G, find a context-free language D, such that
L(G) = ∅ if and only if
D = Σ*
This language D, as you may expect, is rather convoluted.
The idea is that its complement, D, somehow represents derivations in G. To say L(G) is empty
is equivalent to saying D is empty, or that
D = Σ*.
The point is that, miraculously,
D is context-free!
The language D
For every rule α → β, introduce a new non-terminal symbol A and replace this rule by two rules:α → A A → βFor example, for the anbncn grammar:
S → aSBC ⇒ S → A1, A1 → aSBC S → ε ⇒ S → A2, A2 → ε CB → BC ⇒ CB → A3, A3 → BC aB → ab ⇒ aB → A4, A4 → ab bB → bb ⇒ bB → A5, A5 → bb bC → bc ⇒ bC → A6, A6 → bc cC → cc ⇒ cC → A7, A7 → ccConsider the derivation of, say, "aabbcc" with this modified grammar:
S⇒A1⇒aSBC⇒aA1BC⇒aaSBCBC⇒aaA2BCBC⇒aaBCBC⇒aaBA3C⇒aaBBCC ⇒aA4BCC⇒aabBCC⇒aaA5CC⇒aabbCC⇒aabA6C⇒aabbcC⇒aabbA7⇒aabbccWe've written this derivation smashed together for a reason. We can think of this derivation as a string in the language
Σ∪{⇒}written:
x0 = S⇒x1⇒x2⇒x3⇒...⇒xnThe characteristics of this string are:
- starts with S, ends with a string in Σ*
- The number of xi's, and hence the length of the derivation must be even.
- each special non-terminal appears in each odd xi and not in any even xi
S⇒x1R⇒x2⇒x3R⇒...⇒xnThe odd positions, with the special non-terminals are reversed. This is how a string in the language D is constructed. For example, the string in D corresponding to the above derivation example is:
S⇒A1⇒aSBC⇒CBA1a⇒aaSBCBC⇒CBCBA2aa⇒aaBCBC⇒CA3Baa⇒aaBBCC ⇒CCBA4a⇒aabBCC⇒CCA5aa⇒aabbCC⇒CA6baa⇒aabbcC⇒A7bbaa⇒aabbccThis string in D can be characterized in this way:
- The string is of the form (V*⇒)+V*
- Starts with "S⇒"
- Ends with "⇒w", where w ∈ Σ*
- Contains an even number of ⇒ symbols.
- Counting strings separating ⇒ occurrences,
the odd-even portions are of this form:
βRAiαR⇒αγβ
using the rule Ai → γ - Counting strings separating ⇒ occurrences, the even-odd portions are of this form:
αγβ⇒βRAiαR
using the rule Ai → γ
L(G) = ∅ if and only if D = ∅
And so,
L(G) = ∅ if and only if D = Σ*
To be in D means it has to violate one of the
characterizations above, and so
D =
"not-satisfy-0" ∪ "not-satisfy-1" ∪ "not-satisfy-2" ∪ "not-satisfy-3" ∪ "not-satisfy-4" ∪ "not-satisfy-5"
The languages
"satisfy-0", "satisfy-1", "satisfy-2", "satisfy-3" are all regular,
and so these as well as their complements
"not-satisfy-0", "not-satisfy-1", "not-satisfy-2", "not-satisfy-3". What remains of the proof is
to show that these are context-free:
"satisfy-0,1,2,3-but-not-4" and "satisfy-0,1,2,3-but-not-5"
We use a PDA to show that "satisfy-0,1,2,3-but-not-4" is context-free;
the proof for
"satisfy-0,1,2,3-but-not-5"
is similar.
The idea is to look at each odd-even portion (which can be easily controlled by the PDA) and accept the string if at least one of the pairs is not of the form:
βRAiαR⇒αγβIt boils to to arguing that there is a PDA which accepts a substring
x⇒yif it is not of the above form. This is a fairly straightforward argument and it clarifies exactly why we need to reverse every other element so that we can stack and match off. ■