The SetMap Application
The examples in this and other documents in this section all part of the Java Application SetMap. Download the source archive SetMap.zip and install it as a Java Application with Existing Sources. See the Using NetBeans document for details. Like others, the SetMap project has multiple Main Classes intended to illustrate various independent features. The simplest way to run the various classes is to rightclick the file and select Run File either in the file itself or from the listing in the Projects window. The Java files from the project which are referenced in this document are:adapter/ SetAdapter util/ AVLTreeSet demo/ AVLDemo
Binary Search Tree Path Lengths
We consistently cite the heuristic that searchrelated operations on a binary tree are O(log(n)) on the average. This does not mean that, for all binary trees, the average search time is O(log(n)); what it means is that the average search time is O(log(n)) for an average tree. This figure is an illustration of such a 500node random tree:
Figure 4.26 from
Weiss textbook
Weiss textbook
Average case
The search time for a single operation is effectively the path length to a node. So we have to somehow compute the total path length to all nodes in all trees and average that. The internal path length of a binary tree is defined as:
sum of the depths of all nodes in the tree
What the textbook is doing is deriving a recurrence equation for
D(n) = average internal path length of an nnode tree
If you take an arbitrary rooted tree whose lefthand size has i nodes
and whose righthand side has ni1 nodes, then
for that particular split, we write:
D(n) = D(i) + D(ni1) + n1
The final n1 term comes from considering all n1 nonroot nodes and adding 1 to their path length in the subtrees to get the path length in the full tree. Average over all possible splits gives the recurrence:
n1 D(n) = 2 * 1/n * Σ D(i) + n1 i=0This is a much more complex recurrence than we've seen, but the solution is:
D(n) = O(n * log(n))The solution falls along the lines of the more familiar recurrence: D(n) = 2 * D(n/2) + n1
The average path length in a tree is simply:
internal path length / nand so it can then be argued that:
average path length in an average tree with n nodes
= D(n)/n
= O(n*log(n))/n
= O(log(n))
Worst case
In contrast the worst case is encountered if the insertion sequence of inputs is sorted or reversesorted; the outcome is a tree skewed completely right or left, respectively. For example, inserting the sequence 10, 20, 30, 40, 50 produces the search tree:0 + 1 + 2 + … + (n1) = (n1)*n/2, i.e., O(n^{2}).
and the average path length is O(n)
The goal of advanced tree data structures is to ameliorate this
worstcase scenario by having its operations "rebalance" the tree
under certain circumstances so that
the O(n) depths are avoided or "fixed" during tree usage.
AVL Search Trees
An AVL (AdelsonVelski/Landis) tree is a binary search tree which maintains the following heightbalanced "AVL property" at each node in the tree:abs( (height of left subtree) – (height of right subtree) ) ≤ 1Namely, the left and right subtrees are of equal height, or their heights differ by 1. Recall that the height of a tree is the maximum depth (from the root) of any node. A tree with one node has height 0. We will say that the empty (null) tree has height 1. It can be proved that an AVL tree with n nodes has height O(log(n)), and so any n search/insert/delete operations ensuring worstcase search cost of O(log(n)). The key idea behind the AVL tree is how a subtree is rebalanced when a node insertion or removal causes the AVL property to fail. Like the textbook, we will consider only insertions.
Rebalancing Strategies
Suppose that a node satisfies the AVL property and that an add goes into the left subtree. There will be two separate cases to consider: the add goes into the leftleft subtree
 the add goes into the leftright subtree
 single rotate (from the) left
 double rotate (from the) left
AVL Rebalancing Operations
The recursive add operation goes down the tree to the insertion point. Any AVLoutofbalance problem will be "discovered" and dealt with as you come out of the recursive call. Consider the first such "outofbalance" discovery and subsequent rebalancing operation. We want to argue two things: the rebalanced tree satisfies the AVL property at each node, and
 the height of the rebalanced tree is the height of tree prior to insertion.
single rotate (from the) left, on a leftleft insertion
This diagram depicts the rebalance operation after an unbalancing leftleft insertion.Proof: Let x, y, z be the heights of the trees X, Y, Z, respectively after insertion. Assume that the insertion into X caused an unbalancing at k_{2} (but not at k_{1} or at any node in X). We gather this information:
 since the insertion into X caused an unbalancing height change for the full tree, the full tree's height after insertion must be x + 2, and thus x + 1 before insertion.
 x must be ≥ than y + 1, otherwise the height of the left subtree at k_{2} would not have increased
 x cannot be greater than y + 1, otherwise the "AVL problem" would have been observed at k_{1}; conclude that x = y + 1
 x + 1 ≥ z + 2: otherwise there would be no problem at k_{2}
 x + 1 cannot be greater than z + 2, otherwise the left side height prior to insertion (x) would be greater than z + 1, meaning that k_{2} was unbalanced prior to insertion; conclude that x + 1 = z + 2, i.e., x = z + 1
x = y + 1, x = z + 1, and so y = z.For the rebalanced tree:
 at k_{2}, left side height = y = z = right side height
 at k_{1}, left side height = x = y + 1 = rightside height
 height of the tree = x + 1 = height before insertion
double rotate (from the) left, on a leftright insertion
This diagram depicts the rebalance operation after an unbalancing leftright insertion.Proof: Let a, b, c, d be the heights of the trees A, B, C, D, respectively after insertion. There are 3 cases to consider:
 the inserted element is k_{2}
 the inserted element goes into B
 the inserted element goes into C
 since the insertion into B caused an unbalancing height change of the full tree, the full tree's height after insertion must be b + 3, and thus b + 2 before insertion.
 b must be > c, otherwise the left subtree at k_{3} would not have changed height.
 b cannot be > c + 1, otherwise there would be a problem at k_{2}; conclude that b = c + 1
 b + 1 must be > a, otherwise the left subtree at k_{3} would not have changed height.
 b + 1 cannot be > a + 1, otherwise there would be a problem at k_{1}; conclude that b + 1 = a + 1, i.e., b = a
 in order for there to be a "problem" at k_{3}, we must have leftside height ≥ rightside height + 2, i.e., b + 2 ≥ d + 2

we cannot have b + 2 > d + 2, otherwise
b + 1 = left side height before insertion ≥ d + 2 = right side height before insertion + 2,meaning that the tree would have been unbalanced before insertion; conclude that b + 2 = d + 2, i.e., b = d
d = a = b = c + 1For the rebalanced tree:
 at k_{1}, left side height = a = b = right side height
 at k_{3}, left side height = c, rightside height = d = c + 1
 at k_{2}, left side height = b + 1 = rightside height = d + 1
 height of the tree = b + 2 = height before insertion
Examples
An important example of AVL trees is the behavior on a worstcase add sequence for regular binary trees:1, 2, 3, 4, 5, 6, 7All insertions are rightright and so rotations are all single rotate from the right. All but two insertions require rebalancing:
at 1 ⇒  at 3 ⇒ 
at 2 ⇒  at 5 ⇒ 
single rot. left at 50 ⇒ 
double rot. left at 10 ⇒ 
single rot. left at 25 ⇒ 
add(30), add(20), add(8) need no rebalancing 
double rot. right at 7 ⇒ 
AVL Tree Game
This "game" is just a way of having you guess the outcomes of a sequence of insertions into an AVL tree. The AVL trees are displayed graphically. Download the JAR file via: On Windows you can run it directly by doubleclicking. On MAC, you have to rightclick (the first time) and select "Open with ⇾ Jar Launcher."Demo program
The class we wish to create, AVLTreeSet is an improvement of the SearchTreeSet class in the sense of always maintaining an O(log(n)) search time. The basis isAVL Tree Code
In order to keep track of height efficiently, the Node class is expanded to add a height field:private static class Node<E> { E data; Node<E> left, right; int height; Node(E data, Node<E> left, Node<E> right, int height) { this.data = data; this.left = left; this.right = right; this.height = height; } Node(E data, Node<E> left, Node<E> right) { this(data, left, right, 0); } }
private int height(Node<E> p) { return (p == null) ? 1 : p.height; }
int comp = myCompare(elt, n.data); if (comp < 0) { // insertion goes left n.left = add(elt, n.left); ...
if (height(n.left)  height(n.right) == 2) {
int comp1 = myCompare(elt, n.left.data); if (comp1 < 0) { return s_rotate_left(n); // leftleft insertion } else { return d_rotate_left(n); // leftright insertion }
return s_rotate_left(n); // leftleft insertion
private Node<E> s_rotate_left(Node<E> n) { Node<E> k2 = n; Node<E> k1 = k2.left; int x = k1.left.height; // k1.left cannot be null k2.left = k1.right; k1.right = k2; k2.height = x; k1.height = x + 1; return k1; }
n = add(n, elt);
n = s_rotate_left(n);