ContextFree Grammar
A contextfree grammar is G = (V,Σ,R,S), where V = an alphabet containing all grammar symbols
 Σ = a subset of V, called the terminal symbols
 R ⊆ (VΣ)×V* is the set of substitution rules, or productions
 S ∈ VΣ is the start symbol
It is standard to write "A ⇾ α" when (A,α) ∈ R.
Notational Conventions
It is common in practice to express a grammar simply in terms of its productions as follows: The full symbol set V consists of only symbols which appear in the grammar.
 All uppercase letters are nonterminals and everything else is a terminal symbol.
 The start symbol is the nonterminal on the left hand side of the first production rule listed.
 Rules with common left hand sides are combined with righthand sides separated by ""
S ⇾ aSb  εThe implication is that V = {S,a,b}, Σ = {a,b}, the start symbols is S and the rules are:
S ⇾ aSb S ⇾ ε
Derivation
A derivation is a sequence of steps which begins with the start symbol, uses the rules to do replacements, and ends with a terminal string (from which no further steps can be taken). Formally, onestep derivation, ⇒, is a binary relation on V*. When (α,β) belongs to this relation, we write α ⇒ β. The definition is:α ⇒ β if and only if α = α_{1}Aα_{2} and β = α_{1}γα_{2} where A ⇾ γ is a rule.The derives relation, ⇒^{*}, is the usual reflexive, transitive closure of ⇒. We write L(G) for the language generated by grammar G as { w ∈ Σ^{*}: S ⇒* w }.
The intermediate forms appearing in derivations are strings in V^{*} containing both terminals and nonterminals. These are called sentential forms.
A simple derivation using the above grammar is:S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaabbb = a^{3}b^{3}
Leftmost and Rightmost Derivations
Two particularly important derivations are these: leftmost: always replace the leftmost nonterminal
 rightmost: always replace the rightmost nonterminal
Equal a's and b's
The language of equal a's and b's and its variants are our main examples of nonregular languages. As a simple example, we'll prove that the language generated by the grammar GS ⇾ aSb  εis L = { a^{n}b^{n} : n ≥ 0 }. Have to show 2 inclusions:
 L ⊆ L(G): Everything in the language can be generated by the grammar
 L(G) ⊆ L: Everything generated by the grammar is in the language.

Everything in the language is generate by a derivation in the grammar from S.
This is usually proved by induction on the length of strings in the language. In this case, take w = a^{n}b^{n} and prove by induction that S ⇒* w. For n = 0, obvious. Otherwise, assume true for n, i.e.S ⇒* a^{n}b^{n}
ThenS ⇒ aSb ⇒* aa^{n}b^{n}b = a^{n+1}b^{n+1}
and thus S ⇒* a^{n+1}b^{n+1}. 
Everything derived by the grammar from S is in the language.
Usually, for these kinds of proofs, you have to say something about all the sentential forms derivable from all (useful) symbols. Fortunately, there's only one symbol in this simple grammar.
Claim:If S ⇒* α then either α = a^{i}b^{i}, some i ≥ 0 or α = a^{i}Sb^{i}, some i ≥ 0,
Proof by induction on the length of the derivation. For length 1, obvious. Now for n ≥ 1, take a derivation of length n+1:S ⇒^{n} α ⇒ β
By induction, and to have the ability to take one more step, α = a^{i}Sb^{i}, for some i ≥ 0. Thus for either of the replacements S ⇾ aSb or S ⇾ ε, β must be of the correct format.
Equal a's and b's in general
More difficult is generating the language of strings with equal number of a's and b's in any arrangementL = { w ∈ {a,b}* : a(w) = b(w) }We're using the notation:
a(w) = number of a's in a string w b(w) = number of b's in a string wThis language can be generated by the following grammar:
S ⇾ aSbS  bSaS  εOf the two proofs (A) and (B), (B) is simpler. We simply argue by induction that:
If S ⇒* α, then a(α) = b(α) (ignore interspersed S's, if any);
and so L(G) ⊆ L.
The harder part is showing that any string in L can be derived by this grammar.
Consider the derivation of the string aababb:
S ⇒ aSbS ⇒ aSb ⇒ aaSbSb ⇒ aabSb ⇒ aabaSbSb ⇒^{2} aababbThe proof by induction on the string length is left as an exercise. It is interesting to compare this grammar to the grammar from problem 3.1.5 which also generates the same language.
More Examples
Arithmetic expressions. The terminals are {+,*,(,),a}. The symbol a represents a token that we identify by other means. It could be a number, variable, etc.E ⇾ E + T  E  T  T E is Expression T ⇾ T * F  F T is Term F ⇾ a  ( E ) F is FactorWhen we represent grammars related to programming languages, it is assumed that the symbols are taken from a tokenized stream. Thus "x * ( 3 + z)" and "x*(3+z)", though different as strings, are the same expression. These rules dictate two major features of arithmetic expressions:
 The +,  and * operations group their operands in a leftassociative manner, i.e., x  y  z is effectively (x  y)  z. The left associativity disambiguates a sequence of equal precedence operators from a grammar point of view.
 The * operator has higher precedence than + and , i.e. x + y * z is effectively x + (y * z).
$x + (++$x) + (++$x)Consider an example of two different derivations of the expression x + y * z:
E ⇒ E + T ⇒ T + T ⇒ F + T ⇒ x + T ⇒ x + T * F ⇒ x + F * F ⇒ x + y * F ⇒ x + y * z 
E ⇒ E + T ⇒ E + T * F ⇒ E + F * F ⇒ T + F * F ⇒ T + F * z ⇒ F + F * z ⇒ F + y * z ⇒ x + y * z 
Ambiguous Arithmetic Expression Grammar
Compare the above expression grammar to a simpler one:E ⇾ E + E  E  E  E * E  a  (E)Why not use this simpler one? It generates the expression language, but it is ambiguous. The following is a proof of ambiguity once we know more about what that means:
E ⇒ E + E E is a sum of expressions ⇒ x + E ⇒ x + E * E the second is a product ⇒ x + y * E ⇒ x + y * z 
E ⇒ E * E E is a product of expressions ⇒ E + E * E the first is a sum ⇒ x + E * E ⇒ x + y * E ⇒ x + y * z 
Balanced Parentheses
The grammar of balanced parentheses can be obtained by taking the ambiguous expression grammar and remove the +, , * and a by setting them to ε. What's left boils down to this:S ⇾ S S  (S)  εA sample derivation is this:
S ⇒ ( S ) ⇒ ( S S ) ⇒ ( ( S ) S ) ⇒ ( ( S ) ( S ) ) ⇒ ( ( ) ( S ) ) ⇒ ( ( ) ( ) )
Regular Expression Grammars
See problem 3.1.4. Regular expressions over Σ are actually defined by a CFG. We use additional terminals {(,),*,∪,∅}. According to the textbook's notion of fully parenthesized regular expressions as defined in section 1.8, the associated grammar is this:R ⇾ (R∪R)  (RR)  R*  ∅  σ (for σ ∈ Σ)
The grammar is:
R ⇾ R∪B  B
B ⇾ BQ  Q
Q ⇾ A*  A
A ⇾ (R)  ∅  σ (for σ ∈ Σ)
Recall that B is Branch, Q is Quantified atom and
A is Atom.
Notice the similarity with the arithmetic expression grammars in how operator
precedence is expressed. In this case there are three operators:
∪, juxtaposition, *.
Keep in mind the relationship between regular expressions and regular languages.
Every regular expression is effectively a regular language, but the language
of all regular expressions is contextfree language and not regular
Ambiguous ifelse grammar
S ⇾ if (E) S  if (E) S else S S ⇾ other E ⇾ exprFind two leftmost derivations for:
if (expr) if (expr) other else other
#'s of a's and b's
More a's:U ⇾ TaU  TaT T ⇾ aTbT  bTaT  εMore b's:
V ⇾ TbV  TbT T ⇾ aTbT  bTaT  εUnequal:
S ⇾ U  V U ⇾ TaU  TaT V ⇾ TbV  TbT T ⇾ aTbT  bTaT  ε
LISPlike Sexpression grammar
S ⇾ a  ( L ) L ⇾ S L  εExample: derive ( ( a a ) ( a a ) () )
What makes expression languages not regular
Take the ambiguous grammar G for arithmetic expressions:E ⇾ E+E  EE  E*E  a  (E)What about L(G) makes it nonregular? It's the parentheses. Take the subset:
L(G) ∩ (*a)* = { (^{n}a)^{n} : n ≥ 0 }Without the rule E ⇾ (E), L(G) would be regular:
(a(+∪*∪))*a
Regular Languages are Context Free
S ⇾ aS  εThis theorem gives us our first result in the Chomsky Hierarchy that Regular Languages correspond to certain types of Context Free Grammars. Proof: Take an automaton (NFA or DFA) M = (K,Σ,δ,s,F) for the Language. The grammar, G, is constructed as follows:
 nonterminals = K
 start symbol = s
 productions:
p ⇾ σq if δ(p,σ) = q f ⇾ ε if f ∈ F
(p,w) * (q,ε) ⇔ p ⇒* wq