Undecidability for unrestricted grammars
We've shown that unrestricted, type-0 grammars are equivalent to TMs in the sense that a language L is accepted by a TM if and only if it can be generated by an unrestricted grammar. Therefore the following theorem is no surprise: Theorem 5.5.1 Each of these are undecidable:- Given a grammar G and string w, is w ∈ L(G)?
- Given a grammar G, is ε ∈ L(G)?
- Given a grammar G, is L(G) = ∅?
- Given a grammar G, is L(G) = Σ^{*}?
- Given grammars G_{1}, G_{2}, is L(G_{1}) = L(G_{2})?
f( (M,w) ) = (G,w)where G is the grammar equivalent to M. The conclusion is obvious.
Problems about context-free grammars
Recall that these problems are decidable:- Given a context-free grammar G and string w, is w ∈ L(G)? Convert to CNF, use CYK algorithm.
- Given a context-free grammar G, is ε ∈ L(G)? Proof: Remove useless productions, are there any remaining ε rules.
- Given a context-free grammar G, is L(G) = ∅? Proof: Remove useless productions. Are there any productions remaining?
- Given a context-free grammar G, is L(G) = Σ^{*}?
- Given context-free grammars G_{1}, G_{2}, is L(G_{1}) = L(G_{2})?
- Given context-free grammars G_{1}, G_{2}, is L(G_{1}) ∩ L(G_{2}) = ∅?
- Given PDAs M_{1}, M_{2}, is L(M_{1}) = L(M_{2})?
- Find a mimimal PDA (in terms of number of states) for a context-free language.
Decidability issues for all grammars
The following chart is taken from Introduction to Automata Theory, Languages and Computation, by Hopcroft & Ullman. The key is:
D = Decidable
U = Undecidable
? = Open question
U = Undecidable
? = Open question
w ∈ L? | L = ∅? | L = Σ*? | L_{1} ⊆ L_{2}? | L_{1} = L_{2}? | L_{1} ∩ L_{2} = ∅? | |
---|---|---|---|---|---|---|
Regular (type 3) | D | D | D | D | D | D |
Det. Context Free | D | D | D | U | ? | U |
Context Free (type 2) | D | D | U | U | U | U |
Context Sensitive (type 1) | D | U | U | U | U | U |
Recursive | D | U | U | U | U | U |
Recursively Enumerable (type 0) | U | U | U | U | U | U |
Undecidability of L(G) = EveryThing
We want to prove is the following reduction: Given a general grammar G, find a context-free language D, such that
L(G) = ∅ if and only if
D = Σ^{*}
This language D, as you may expect, is rather convoluted.
The idea is that its complement, D, somehow represents derivations in G. To say L(G) is empty
is equivalent to saying D is empty, or that
D = Σ*.
The point is that, miraculously,
D is context-free!
The language D
For every rule α → β, introduce a new non-terminal symbol A and replace this rule by two rules:α → A A → βFor example, for the a^{n}b^{n}c^{n} grammar:
S → aSBC ⇒ S → A_{1}, A_{1} → aSBC S → ε ⇒ S → A_{2}, A_{2} → ε CB → BC ⇒ CB → A_{3}, A_{3} → BC aB → ab ⇒ aB → A_{4}, A_{4} → ab bB → bb ⇒ bB → A_{5}, A_{5} → bb bC → bc ⇒ bC → A_{6}, A_{6} → bc cC → cc ⇒ cC → A_{7}, A_{7} → ccConsider the derivation of, say, "aabbcc" with this modified grammar:
S⇒A_{1}⇒aSBC⇒aA_{1}BC⇒aaSBCBC⇒aaA_{2}BCBC⇒aaBCBC⇒aaBA_{3}C⇒aaBBCC ⇒aA_{4}BCC⇒aabBCC⇒aaA_{5}CC⇒aabbCC⇒aabA_{6}C⇒aabbcC⇒aabbA_{7}⇒aabbccWe've written this derivation smashed together for a reason. We can think of this derivation as a string in the language
Σ∪{⇒}written:
x_{0} = S⇒x_{1}⇒x_{2}⇒x_{3}⇒...⇒x_{n}The characteristics of this string are:
- starts with S, ends with a string in Σ*
- The number of x_{i}'s, and hence the length of the derivation must be even.
- each special non-terminal appears in each odd x_{i} and not in any even x_{i}
S⇒x_{1}^{R}⇒x_{2}⇒x_{3}^{R}⇒...⇒x_{n}The odd positions, with the special non-terminals are reversed. This is how a string in the language D is constructed. For example, the string in D corresponding to the above derivation example is:
S⇒A_{1}⇒aSBC⇒CBA_{1}a⇒aaSBCBC⇒CBCBA_{2}aa⇒aaBCBC⇒CA_{3}Baa⇒aaBBCC ⇒CCBA_{4}a⇒aabBCC⇒CCA_{5}aa⇒aabbCC⇒CA_{6}baa⇒aabbcC⇒A_{7}bbaa⇒aabbccThis string in D can be characterized in this way:
- The string is of the form (V^{*}⇒)^{+}V*
- Starts with "S⇒"
- Ends with "⇒w", where w ∈ Σ*
- Contains an even number of ⇒ symbols.
- Counting strings separating ⇒ occurrences,
the odd-even portions are of this form:
β^{R}A_{i}α^{R}⇒αγβ
using the rule A_{i} → γ - Counting strings separating ⇒ occurrences, the even-odd portions are of this form:
αγβ⇒β^{R}A_{i}α^{R}
using the rule A_{i} → γ
L(G) = ∅ if and only if D = ∅
And so,
L(G) = ∅ if and only if D = Σ^{*}
To be in D means it has to violate one of the
characterizations above, and so
D =
"not-satisfy-0" ∪ "not-satisfy-1" ∪ "not-satisfy-2" ∪ "not-satisfy-3" ∪ "not-satisfy-4" ∪ "not-satisfy-5"
The languages
"satisfy-0", "satisfy-1", "satisfy-2", "satisfy-3" are all regular,
and so these as well as their complements
"not-satisfy-0", "not-satisfy-1", "not-satisfy-2", "not-satisfy-3". What remains of the proof is
to show that these are context-free:
"satisfy-0,1,2,3-but-not-4" and "satisfy-0,1,2,3-but-not-5"
We use a PDA to show that "satisfy-0,1,2,3-but-not-4" is context-free;
the proof for
"satisfy-0,1,2,3-but-not-5"
is similar.
The idea is to look at each odd-even portion (which can be easily controlled by the PDA) and accept the string if at least one of the pairs is not of the form:
β^{R}A_{i}α^{R}⇒αγβIt boils to to arguing that there is a PDA which accepts a substring
x⇒yif it is not of the above form. This is a fairly straightforward argument and it clarifies exactly why we need to reverse every other element so that we can stack and match off. ■