Parse Trees
A parse tree is an entity which represents the structure of the derivation of
a terminal string from some nonterminal (not necessarily the start symbol).
The definition is as in the book. Key features to define are the
root ∈ V
and
yield ∈ Σ^{*} of each tree.

For each σ ∈ Σ, there is a tree
with root σ and no children; its yield is σ

For each rule A → ε, there is a tree with
root A and one child ε; its yield is ε

If t_{1}, t_{2}, ..., t_{n}
are parse trees with roots
r_{1}, r_{2}, ..., r_{n}
and respective yields
y_{1}, y_{2}, ..., y_{n},
and A → r_{1}r_{2}...r_{n} is a production,
then there is a parse tree with root A whose children are
t_{1}, t_{2}, ..., t_{n}.
Its root is A and its yield is y_{1}y_{2}...y_{n}
Observe that parse trees are constructed from
bottom up, not top down.
The actual construction of "adding children" should be made more precise,
but we intuitively know what's going on.
As an example, here are all the parse (sub) trees used to build the parse tree
for the arithmetic expression
4 + 3 * 2
using the expression grammar
E → E + T  E  T  T
T → T * F  F
F → a  ( E )
where
a represents an operand of some type, be it a number or variable.
The trees are grouped by height.
Syntactic Expressions for Trees
It's useful to have a nongraphical, syntactic way to express trees
as lists.
Think of the recursive construction of a tree with its subtrees like this:
(root subtree_{1} ... subtree_{n})
We want to create a grammar for trees (potential parse trees)
for a given grammar
G=(V,ΣR,S).
The tree grammar uses these symbols:

terminals: V ∪ Σ ∪ { (, ) }, ∪ { 'ε' },
where 'ε' means the symbol ε
treated literally

nonterminals:
T,
S,
L,
N,
R
The rules for creating trees are these:
T → S  L a Tree is a terminal Symbol or a List
L → ( N R ) the first element of a List is a Nonterminal (the root)
R → T R  T  'ε' following N, one or more Trees or ε (literal), R serves to unRoll the Trees
N → A ∈ V – Σ the Nonterminals from V are terminals for this tree grammar.
S → σ ∈ Σ
Not every tree constructed in this way can be a parse tree for
the grammar, but this construction gives us all possible parse trees.
Let's look at how it would work out for the expression grammar
above using the sample expression.
These are the list expressions of the above trees:
4 + 2 * 3
(F 4) (F 2) (F 3)
(T (F 4)) (T (F 2))
(E (T (F 4)))
(T (T (F 2)) * (F 3))
(E (E (T (F 4))) + (T (T (F 2)) * (F 3)))
Abstract Syntax Tree
A computer language is basically a contextfree language.
The symbols are tokens without any particular semantic meaning,
namely, all numbers are the same, perhaps even all literal things (strings, numbers, etc)
are regarded equally. All variables are regarded equally, etc. So the point is that
we have a finite symbol set.
The first step of a compiler is to create a parse tree of
the program, and the second phase is to assign meaning, or semantics to the entities in
the tree.
In reality, you create an
abstract syntax tree of the the program. For example,
considering the parse tree for
4 + 2 * 3 above, an abstract syntax tree for this expression
would look like this:
Parse Trees and Derivations
Effectively, a derivation done in reverse represents the construction of a parse tree.
An intermediate stage of this reverse construction presents a ordered list of parse
trees.
Each step backwards creates another ordered list.
The construction starts with an ordered list of the singleton
trees of the terminal symbols in a target string and
ends up being the list with one tree
whose root is the a single nonterminal symbol.
[ 4, +, 2, *, 3 ]
[ 4, +, 2, *, (F 3) ]
[ 4, +, (F 2), *, (F 3) ]
[ 4, +, (T (F 2)), *, (F 3) ]
[ 4, +, (T (T (F 2)) * (F 3)) ]
[ (F 4), +, (T (T (F 2)) * (F 3)) ]
[ (T (F 4)), +, (T (T (F 2)) * (F 3)) ]
[ (E (T (F 4))), +, (T (T (F 2)) * (F 3)) ]
[ (E (E (T (F 4)) + (T (T (F 2)) * (F 3))) ]
In this way, a parse tree represents many possible
derivations. We want to characterize
two derivations "coming from the same parse tree." Technically speaking, we
want to characterize a parse tree as an equivalence class of derivations.
Although intuitively obvious, as you know, nothing is easy when you
try to make it precise.
The first step is to define the relation "more leftoriented at one step".
Assume we have two equal length derivations of length
n > 2:
D: x_{1} ⇒ x_{2} ⇒ ... ⇒ x_{n}
D′: x_{1}′ ⇒ x_{2}′ ⇒ ... ⇒ x_{n}′
where
x_{1} = x_{1}′ is a nonterminal
and
x_{n} = x_{n}′ ∈ Σ^{*}
Namely they start at the same nonterminal and end at the same terminal string and have at
least two intermediate steps.
We want to say
D < D′ if the two
derivations differ in only one step in which
there are 2 nonterminals such that
D replaces the left one before the right one and
D′
does the opposite. Formally:
D < D′ if there exists k, 1 < k < n such that
x_{i} = x_{i}′ for all i ≠ k
x_{k1} = x_{k1}′ = uAvBw, for u, v, w ∈ V*
x_{k} = uyvBw, for production A → y
x_{k}′ = uAvzw, for production B → z
Say that two derivations are
similar if they belong to the
reflexive, symmetric, transitive closure of
<.
Therefore, two derivations are similar if one can be transformed to the
other by a sequence of switching of the order in which rules are applied.
Examples
Consider the ambiguous arithmetic expression grammar:
E → E+E  EE  E*E  (E)  a
Consider these sample derivations:
D_{1}: E ⇒ E+E ⇒ EE+E ⇒ xE+E ⇒ xy+E ⇒ xy+z
D_{2}: E ⇒ E+E ⇒ EE+E ⇒ Ey+E ⇒ xy+E ⇒ xy+z
D_{3}: E ⇒ E+E ⇒ EE+E ⇒ Ey+E ⇒ Ey+z ⇒ xy+z
We see that
D_{1} < D_{2} < D_{3} and therefore
these three are mutually similar. Then consider:
D_{4}: E ⇒ EE ⇒ EE+E ⇒ xE+E ⇒ xy+E ⇒ xy+z
Although
D_{4} is identical to
D_{1} except for the first step, it is
not similar to
D_{1}. This is because
D_{1} and
D_{4} come from different parse trees.
Similarity and parse trees
We start off by claiming that
D < D′, then both derivations
construct the same parse tree in reverse. This is pretty obvious since the tree
constructions differ only in two steps, one where
A is created before
B and
the other viceversa. From this, we can deduce that any two similar derivations
create the same parse tree.
Going the other way, we want to say that if two derivations construct
the same parse tree, they must be similar. This is an inductive argument.
Consider the root of the parse tree and its children:
(S t_{1} t_{2} ... t_{n})
Inductively, we assume the two derivations are similar for all subtrees:
t_{1},
t_{2}, ...,
t_{n}.
It follows immediately by the transitivity and symmetry
of the similarity relation
that the two derivations are similar for the whole tree.
Leftmost and rightmost derivations
The similarity relation, being an equivalence relation, partitions the
set of all derivations of a given string.
These partitions are finite, and we can argue that,
within a partition, there is a unique one
D_{L} such that
D_{L} < D, for all D != D_{L}, D is similar to D_{L}
D_{L} is said to be a
leftmost derivation.
It is easy to convince yourself that any derivation of one or
two steps must be leftmost.
Similarly we can define and argue the existence of
a
rightmost derivation.
Summary
Theorem 3.2.1 in the textbook summarizes the various equivalent ways on understanding
strings derived from a grammar. For any nonterminal A, and string w ∈ Σ*
 There is a derivation from A to w
 There is a parse tree with root A and yield w
 There is a leftmost derivation from A to w
 There is a rightmost derivation from A to w
A grammar
G is ambiguous if there is a string in
L(G)
for which
 there are two distinct parse trees, or
 there two distinct leftmost derivations, or
 there are two distinct rightmost derivations
This expression grammar is ambiguous:
E → E + E  E * E  a  ( E )
Of course, it's not the fault of the language, since we can disambiguate
it by incorporating the associativity and operator precedence.
E → E + T  T
T → T * F  F
F → a  ( E )
It's actually very hard to prove that this grammar is not ambiguous!
We will talk about it later.
A contextfree language is called
inherently ambiguous
if
all possible grammars which generate it are ambiguous.
It turns out that such languages do exist!
The ambiguous ifelse grammar
Given the grammar:
S → if ( E ) S  if ( E ) S else S
S → other
E → expr
The string used to illustrate ambiguity is this:
if ( expr ) if ( expr ) other else other
Here are two parse trees for it:
We'll show two different rightmost derivations:
S ⟹ if ( E ) S
⟹ if ( E ) if ( E ) S else S
⟹^{2} if ( E ) if ( E ) other else other
⟹^{2} if ( expr ) if ( expr ) other else other
S ⟹ if ( E ) S else S
⟹ if ( E ) S else other
⟹ if ( E ) if ( E ) S else other
⟹ if ( E ) if ( E ) other else other
⟹^{2} if ( expr ) if ( expr ) other else other