Description
A pushdown automaton (PDA) is a finite state machine which has an
additional stack storage. The transitions a machine makes are based not
only on the input and current state, but also on the stack.
The formal definition (in our textbook) is that a PDA is this:
M = (K,Σ,Γ,Δ,s,F)
where
 K = finite state set
 Σ = finite input alphabet
 Γ = finite stack alphabet
 s ∈ K: start state
 F ⊆ K: final states
 The transition relation, Δ is a finite subset of
(K×(Σ∪{ε})×Γ^{*}) × (K×Γ^{*})
We have to have the
finite qualifier because the full subset is infinite
by virtue of the
Γ* component.
The meaning of the transition relation is that,
for
σ ∈ Σ, if
((p,σ,α),(q,β)) ∈ Δ:
 the current state is p
 the current input symbol is σ
 the string at the top of the stack is α
then:
 the new state is q
 replace α on the top of the stack by β
(pop the α and push the β)
Otherwise, if
((p,ε,α),(q,β)) ∈ Δ, this means that if
 the current state is p
 the string at the top of the stack is α
then (not consulting the input symbol), we can
 change the state is q
 replace α on the top of the stack by β
Machine Configuration, yields, acceptance
A machine configuration is an element of
K×Σ^{*}×Γ^{*}.
(p,w,z) = (current state, unprocessed input, stack content)
We define the usual
yields in one step relation:
(p,σw,αz) ⊢ (q,w,βz) if ((p,σ,α),(q,β)) ∈ Δ
or
(p,w,αz) ⊢ (q,w,βz) if ((p,ε,α),(q,β)) ∈ Δ
As expected, the
yields relation,
⊢*,
is the reflexive, transitive closure of
⊢.
A string
w is
accepted by the PDA if
(s,w,ε) ⊢* (f,ε,ε)
Namely, from the start state with empty stack, we

process the entire string,

end in a final state

end with an empty stack.
The language accepted by a PDA M, L(M), is the set of all accepted strings.
The empty stack is our key new requirement relative to
finite state machines.
The examples that we generate have very few states;
in general, there is so much more control from using the stack memory.
Acceptance by empty stack only or final state only is addressed
in problems 3.3.3 and 3.3.4.
Graphical Representation and εtransition
The book does not indicate so, but there is a graphical representation of PDAs. A transition
((p,x,α),(q,β)) where x = ε or x ∈ Σ
would be depicted like this (respectively):

or 

The stack usage represented by
α/β
represents these actions:
 the top of the stack must match α
 if we make the transition, pop α and push β
A PDA is nondeterministic.
There are several forms on nondeterminism in the description:
 Δ is a relation
 there are εtransitions in terms of the input
 there are εtransitions in terms of the stack contents
The true PDA
εtransition, in the sense of being equivalent to the
NFA
εtransition
is this:
because it consults neither the input, nor the stack and
will leave the previous configuration intact.
Palindrome examples
These are examples 3.3.1 and 3.3.2 in the textbook. The first is this:
{x ∈ {a,b,c}* : x = wcw^{R} for w ∈ {a,b}*} 

The machine pushes
a's and
b's in state
s, makes a transition to
f when it sees the middle marker,
c, and then
matches input symbols with those on the stack and pops the stack symbol. Nonaccepting string examples are these:
ε in state s
ab in state s with nonempty stack
abcab in state f with unconsumed input and nonempty stack
abcb in state f with nonempty stack
abcbab in state f with unconsumed input and empty stack
Observe that this PDA is deterministic in the sense that there are no choices in transitions.
The second example is:
{x ∈ {a,b}* : x = ww^{R} for w ∈ {a,b}*} 

This PDA is identical to the previous one except for the εtransition 

Nevertheless, there is a significant difference in that this PDA must
guess when to stop pushing
symbols, jump to the final state and start matching off of the stack. Therefore
this machine is decidedly nondeterministic.
In a general programming model (like Turing Machines), we have the luxury of preprocessing the string
to determine its length and thereby knowing when the middle is coming.
The a^{n}b^{n} language
The language is
L = { w ∈ {a,b}* : w = a^{n}b^{n}, n ≥ 0 }.
Here are two PDAs for L:

and 

The idea in both of these machines is to stack the
a's and match off the
b's.
The first one is nondeterministic in the sense that it could prematurely guess that the
a's are done
and start matching off
b's. The second version is deterministic in that the first
b acts as a trigger
to start matching off. Note that we have to make both states final in the second version in order
to accept
ε.
The equal a's and b's language
This is example 3.3.3 in the textbook.
Let us use the convenience notation:
#σ(w) = the number of occurrences of σ in w
The language is
L = {w ∈ {a,b}*: #a(w) = #b(w) }.
Here is the PDA:
As you can see, most of the activity surrounds the behavior in state
q. The idea is have the stack maintain
the
excess of one symbol over the other. In order to achieve our goal, we must know when the stack is empty.
Empty Stack Knowledge
There is no mechanism built into a PDA to determine whether the stack
is empty or not. It's important to realize that the transition:

x = σ ∈ Σ or ε 
means to do so
without consulting the stack;
it says nothing about whether the stack is empty or not.
Nevertheless, one can maintain knowledge of an
empty stack by using a dedicated stack symbol,
c,
representing the "stack bottom"
with the property that
it is pushed onto an empty stack by a transition from the
start state with no other outgoing or incoming transitions.
Behavior of PDA
The three groups of loop transitions in state
q represent these
respective functions:
 input a with no b's on the stack: push a
 input b with no a's on the stack: push b
 input a with b's on the stack: pop b; or,
input b with a's on the stack: pop a
For example if we have seen 5
b's and 3
a's in any order,
then the stack should be "
bbc". The transition
to the final state represents the only nondeterminism in the PDA in that it must guess when the
input is empty in order to pop off the stack bottom.
DPDA/DCFL
The textbook defines DPDAs (Deterministic PDAs) and DCFLs (Deterministic CFLs)
in the introductory part of section 3.7.
According to the textbook's definition, a DPDA is a PDA in which
no state
p has two different outgoing transitions
((p,x,α),(q,β)) and ((p,x′,α′),(q′,β′))
which are
compatible in the sense that both could be applied.
A DCFL is basically a language which accepted by a DPDA,
but we need to qualify this further.
We want to argue that the language
L = { w ∈ {a,b}* : #a(w) = #b(w)}
is deterministic context free in the sense there is DPDA which accepts it.
In the above PDA, the only nondeterminism is the issue of guessing the
end of input; however this form of nondeterminism is considered artificial.
When one considers whether a language
L supports a DPDA or not, a
dedicated
endofinput symbol is always added to strings in the language.
Formally, a language
L over
Σ is
deterministic context free, or
L is a DCFL , if
L$ is accepted by a DPDA M
where
$ is a dedicated symbol not belonging to
Σ.
The significance is that we can make intelligent usage of the knowledge
of the end of input to decide what to do about the stack.
In our case, we would simply replace the transition into the final
state by:
With this change, our PDA is now a DPDA:
a*b* examples
Two common variations on
a's followed by
b's. When they're equal,
no stack bottom is necessary. When they're unequal, you have to be prepared
to recognize that the stacked
a's have been completely matched or not.
a. { a^{n}b^{n} : n ≥ 0 }
b. { a^{m}b^{n} : 0 ≤ m < n }
Let's look at a few sample runs of (b). The idea is that you cannot
enter the final state with an "
a" still on the stack. Once
you get to the final state, you can consume remaining
b's and
end marker.
We can start from state 1 with the stack bottom pushed on:
success:
abb
state  input  stack 
1 
abb$ 
c 
1 
bb$ 
ac 
2 
b$ 
ac 
2 
$ 
c 
3 
ε 
ε 
success:
abbbb
state  input  stack 
1 
abbbb$ 
c 
1 
bbbb$ 
ac 
2 
bbb$ 
ac 
2 
bb$ 
c 
3 
b$ 
ε 
3 
$ 
ε 
3 
ε 
ε 
(fail:
ab)
state  input  stack 
1 
ab$ 
c 
1 
b$ 
ac 
2 
$ 
ac 
(fail:
ba)
state  input  stack 
1 
ba$ 
c 
2 
a$ 
c 
Observe that a string like abbba also fails due to
the inability to consume the very last a.
CFL/PDA Equivalence
The goal of section 3.4 in the textbook is to prove that CFLs and PDAs
are equivalent, namely,
for every CFL,
G, there is a PDA
M
such that
L(G) = L(M) and vice versa.
The author splits this into two lemmas:
 Lemma 3.4.1: given a grammar, construct a PDA and show the equivalence
 Lemma 3.4.2: given a PDA, construct a grammar and show the equivalence
Of the two, the first uses a very simple, intuitive construction to achieve a
mimick a leftmost derivation by using a PDA.
The converse stepis far more complicated than the first. The construction in
the first is quite simple. The construction in the second involves two steps:
Grammar to PDA construction
This construction is quite simple. Given
G = (V,Σ,R,S),
what the PDA will do is effect a
leftmost derivation of a string
w ∈ L(G).
The PDA is
M = ( {0,1}, Σ, V, Δ, 0, {1} )
Namely, there are only two states
The PDA moves immediately to the final state is
1 with the start
symbol on the stack and then stays in this state.
These are the transitions in
Δ:
1: ( (0,ε,ε), (1,S) )
2: ( (1,ε,A), (1,α) ) for A → α
3: ( (1,σ,σ), (1,ε) ) for σ ∈ Σ
This constructed PDA is inherently nondeterministic; if there is a choice of rules to apply
to a nonterminal, then there is a nondeterministic choice of processing steps.
Graphically the presentation is this:
To say that
G and
M are equivalent means
that
L(M) = L(G), or, considering an arbitrary string
w ∈ Σ^{*}:
S ⇒* w ⇔ (0,w,ε) ⊢* (1,ε,ε)
The grammar for a^{n}b^{n}
Use the grammar:
S ⟶ ε  aSb
Here is the PDA:
A simple run is:
⊢ (0, aabb, ε)
⊢ (1, aabb, S)
⊢ (1, aabb, aSb)
⊢ (1, abb, Sb)
⊢ (1, abb, aSbb)
⊢ (1, bb, Sbb)
⊢ (1, bb, bb)
⊢ (1, b, b)
⊢ (1, ε, ε)
Expression Grammar
The most informative examples are those in which there exist the possibility of not
using a leftmost derivation such as our expression grammar:
E → E + T  T
T → T * F  F
F → ( E )  a
We can readily match up a leftmost derivation of
a + a * a with the corresponding
machine configuration processing as follows:
⊢ (0, a + a * a, ε)
⊢ (1, a + a * a, E)
⊢ (1, a + a * a, E + T) E ⇒ E + T
⊢ (1, a + a * a, T + T) ⇒ T + T
⊢ (1, a + a * a, F + T) ⇒ F + T
⊢ (1, a + a * a, a + T) ⇒ a + T
⊢ (1, + a * a, + T)
⊢ (1, a * a, T)
⊢ (1, a * a, T * F) ⇒ a + T * F
⊢ (1, a * a, F * F) ⇒ a + F * F
⊢ (1, a * a, a * F) ⇒ a + a * F
⊢ (1, * a, * F)
⊢ (1, a, F)
⊢ (1, a, a) ⇒ a + a * a
⊢ (1, ε, ε)
Grammar to PDA proof
For simplicity, or simplicity, assume that
⇒
means a leftmost derivation step.
Claim: For
w ∈ Σ*,
α ∈ (VΣ)V* ∪ {ε}:
S ⇒* wα ⇔ (1,w,S) —* (1,ε,α)
(Proof ⇒):
Induction on the length of the leftmost derivation.
S ⇒^{n} α′ ⇒ wα
Then, since the last step was leftmost, we can write:
α′ = xAβ for x ∈ Σ*
and then
xzβ = wα for A → z (A)
By induction, since
S ⇒^{n} xAβ:
(1,x,S) —* (1,ε,Aβ)
Furthermore, applying the transition of type 2, we have:
(1,ε,Aβ) — (1,ε,zβ)
Putting these two together:
(1,x,S) —* (1,ε,zβ) (B)
Looking at
(A) we see that
the string
x must be a prefix of
w because
α
begins with a nonterminal, or is empty. Write
w = xy and therefore, zβ = yα (C)
As a consequence of
(B) we get:
(1,xy,S) —* (1,y,zβ) (D)
Combine
(C) and
(D) to get:
(1,w,S) —* (1,y,yα) (E)
Apply
y transitions of type 3 to get
(1,y,yα) —* (1,ε,α) (F)
Combine
(E) and
(F) to get the desired result:
(1,w,S) —* (1,ε,α)
(Proof ⇐):
The proof going this direction is by induction on the
number of
type2 steps in the derivation.
This restriction makes the entire proof simpler than the converse
that we just proved.
We'll proceed to the induction step.
Assume true for up to
n steps and prove true for
n+1 type2 steps.
Write:
(1,w,S) —* (1,y,Aβ) — (1,y,zβ) —* (1,ε,α)
where the use of the rule
A ⇒ z
represents the final type2 step and the
last part of the chain is type3 steps only.
The string
y must be a suffix of
w, and
so writing
w = xy
we have:
(1,xy,S) —* (1,y,Aβ)
and therefore,
(1,x,S) —* (1,ε,Aβ)
Therefore by induction:
S ⇒* xAβ
and consequently,
S ⇒* xzβ (A)
Now if we look at the last part:
(1,y,zβ) —* (1,ε,α)
We observe, that since this consists of
only type3
transitions, it must be that
yα = zβ (B)
and so, putting
(A) and
(B) together, we get:
S ⇒* xyα
Knowing that
w = xy gives us the result we want:
S ⇒* wα