AVL Trees

The SetMap Application

The examples in this and other documents in this section all part of the Java Application SetMap. Download the source archive SetMap.zip and install it as a Java Application with Existing Sources. See the Using NetBeans document for details.

Like others, the SetMap project has multiple Main Classes intended to illustrate various independent features. The simplest way to run the various classes is to right-click the file and select Run File either in the file itself or from the listing in the Projects window.

The Java files from the project which are referenced in this document are:
adapter/
  SetAdapter
util/
  AVLTreeSet
demo/
  AVLDemo

Binary Search Tree Path Lengths

We consistently cite the heuristic that search-related operations on a binary tree are O(log(n)) on the average. This does not mean that, for all binary trees, the average search time is O(log(n)); what it means is that the average search time is O(log(n)) for an average tree.

This figure is an illustration of such a 500-node random tree:
Figure 4.26 from
Weiss textbook

Average case

The search time for a single operation is effectively the path length to a node. So we have to somehow compute the total path length to all nodes in all trees and average that.

The internal path length of a binary tree is defined as:
sum of the depths of all nodes in the tree
What the textbook is doing is deriving a recurrence equation for
D(n) = average internal path length of an n-node tree
If you take an arbitrary rooted tree whose left-hand size has i nodes and whose right-hand side has n-i-1 nodes, then
for that particular split, we write:
D(n) = D(i) + D(n-i-1) + n-1 

The final n-1 term comes from considering all n-1 non-root nodes and adding 1 to their path length in the subtrees to get the path length in the full tree.

Average over all possible splits gives the recurrence:
                 n-1
D(n) = 2 * 1/n * Σ D(i)  +  n-1
                 i=0
This is a much more complex recurrence than we've seen, but the solution is:
D(n) = O(n * log(n))
The solution falls along the lines of the more familiar recurrence: D(n) = 2 * D(n/2) + n-1

The average path length in a tree is simply:
internal path length / n
and so it can then be argued that:
average path length in an average tree with n nodes
= D(n)/n 
= O(n*log(n))/n 
= O(log(n))

Worst case

In contrast the worst case is encountered if the insertion sequence of inputs is sorted or reverse-sorted; the outcome is a tree skewed completely right or left, respectively. For example, inserting the sequence 10, 20, 30, 40, 50 produces the search tree:
In this case the internal path length with n nodes is:
0 + 1 + 2 + … + (n-1) = (n-1)*n/2,  i.e.,  O(n2). 
and the average path length is O(n)

The goal of advanced tree data structures is to ameliorate this worst-case scenario by having its operations "re-balance" the tree under certain circumstances so that the O(n) depths are avoided or "fixed" during tree usage.

AVL Search Trees

An AVL (Adelson-Velski/Landis) tree is a binary search tree which maintains the following height-balanced "AVL property" at each node in the tree:
abs( (height of left subtree) – (height of right subtree) ) ≤ 1
Namely, the left and right subtrees are of equal height, or their heights differ by 1. Recall that the height of a tree is the maximum depth (from the root) of any node. A tree with one node has height 0. We will say that the empty (null) tree has height -1.

It can be proved that an AVL tree with n nodes has height O(log(n)), and so any n search/insert/delete operations ensuring worst-case search cost of O(log(n)). The key idea behind the AVL tree is how a subtree is re-balanced when a node insertion or removal causes the AVL property to fail. Like the textbook, we will consider only insertions.

Re-balancing Strategies

Suppose that a node satisfies the AVL property and that an add goes into the left subtree. There will be two separate cases to consider: If either of these causes the left subtree to increase in height so that the increased height causes the AVL property to fail, we perform one of the following two respective rotation operations: In either case, the new tree will be balanced (i.e., the AVL property satisfied) and will have height equal to the height of the tree prior to insertion. This means that an add will only ever cause one re-balancing rotation to reestablish the AVL property for the entire tree.

AVL Re-balancing Operations

The recursive add operation goes down the tree to the insertion point. Any AVL-out-of-balance problem will be "discovered" and dealt with as you come out of the recursive call. Consider the first such "out-of-balance" discovery and subsequent re-balancing operation. We want to argue two things: Thus as we proceed out of the recursive add calls, no other node will detect an "out-of-balance" state. In particular, only one re-balance ever needs to be done per add.

Aside from symmetric situations, there are two distinct cases to consider, so suppose the add "goes left." It's trivial to see that if the inserted node is the left child, then the new tree cannot be unbalanced. Therefore the cases are based on whether the insertion goes to the left or right of the left child.

single rotate (from the) left, on a left-left insertion

This diagram depicts the re-balance operation after an unbalancing left-left insertion.

Proof: Let x, y, z be the heights of the trees X, Y, Z, respectively after insertion. Assume that the insertion into X caused an unbalancing at k2 (but not at k1 or at any node in X). We gather this information:
  1. since the insertion into X caused an unbalancing height change for the full tree, the full tree's height after insertion must be x + 2, and thus x + 1 before insertion.
  2. x must be ≥ than y + 1, otherwise the height of the left subtree at k2 would not have increased
  3. x cannot be greater than y + 1, otherwise the "AVL problem" would have been observed at k1; conclude that x = y + 1
  4. x + 1 ≥ z + 2: otherwise there would be no problem at k2
  5. x + 1 cannot be greater than z + 2, otherwise the left side height prior to insertion (x) would be greater than z + 1, meaning that k2 was unbalanced prior to insertion; conclude that x + 1 = z + 2, i.e., x = z + 1
Putting these facts together:
x = y + 1, x = z + 1, and so y = z.
For the re-balanced tree:
  1. at k2, left side height = y = z = right side height
  2. at k1, left side height = x = y + 1 = right-side height
  3. height of the tree = x + 1 = height before insertion

double rotate (from the) left, on a left-right insertion

This diagram depicts the re-balance operation after an unbalancing left-right insertion.

Proof: Let a, b, c, d be the heights of the trees A, B, C, D, respectively after insertion. There are 3 cases to consider:
  1. the inserted element is k2
  2. the inserted element goes into B
  3. the inserted element goes into C
In the first case, with a bit of pondering, you can convince yourself that the tree can consist only of the three nodes k1, k2 and k3 with all other subtrees empty. We will show the proof for the second case where the insertion goes into the B subtree; the proof of insertion into the C subtree is symmetric.
  1. since the insertion into B caused an unbalancing height change of the full tree, the full tree's height after insertion must be b + 3, and thus b + 2 before insertion.
  2. b must be > c, otherwise the left subtree at k3 would not have changed height.
  3. b cannot be > c + 1, otherwise there would be a problem at k2; conclude that b = c + 1
  4. b + 1 must be > a, otherwise the left subtree at k3 would not have changed height.
  5. b + 1 cannot be > a + 1, otherwise there would be a problem at k1; conclude that b + 1 = a + 1, i.e., b = a
  6. in order for there to be a "problem" at k3, we must have left-side height ≥ right-side height + 2, i.e., b + 2 ≥ d + 2
  7. we cannot have b + 2 > d + 2, otherwise
    b + 1 = left side height before insertion ≥ d + 2 = right side height before insertion + 2,
    meaning that the tree would have been unbalanced before insertion; conclude that b + 2 = d + 2, i.e., b = d
Putting these facts together:
d = a = b = c + 1
For the re-balanced tree:
  1. at k1, left side height = a = b = right side height
  2. at k3, left side height = c, right-side height = d = c + 1
  3. at k2, left side height = b + 1 = right-side height = d + 1
  4. height of the tree = b + 2 = height before insertion

Examples

An important example of AVL trees is the behavior on a worst-case add sequence for regular binary trees:
1, 2, 3, 4, 5, 6, 7
All insertions are right-right and so rotations are all single rotate from the right. All but two insertions require re-balancing:
at 1 ⇒ at 3 ⇒
at 2 ⇒ at 5 ⇒

It can be shown that inserting the sequence 1,...,2h-1 will make a perfect tree of height h-1.

Here is another example. The insertion sequence is:   50, 25, 10, 5, 7, 3, 30, 20, 8, 15
single
rot. left
at 50 ⇒
double
rot. left
at 10 ⇒
single
rot. left
at 25 ⇒
add(30), add(20), add(8) need no re-balancing
double
rot. right
at 7 ⇒

AVL Tree Game

This "game" is just a way of having you guess the outcomes of a sequence of insertions into an AVL tree. The AVL trees are displayed graphically. Download the JAR file via:
AVLTreeGame.jar
On Windows you can run it directly by double-clicking. On MAC, you have to right-click (the first time) and select "Open with ⇾ Jar Launcher."

Demo program

The class we wish to create, AVLTreeSet is an improvement of the SearchTreeSet class in the sense of always maintaining an O(log(n)) search time. The basis is
adapter.SetAdapter  
and
adapter.NavSetAdapter  
We create a new Set class:
util.AVLTreeSet  
The following main class demonstrates the rotations used by insertions using both fixed and random insertion patterns
demo.AVLDemo  
As mentioned above, the above "AVL Tree Game" does something similar, but the presentation is much more sophisticated.

AVL Tree Code

In order to keep track of height efficiently, the Node class is expanded to add a height field:
private static class Node<E> {
  E data;
  Node<E> left, right;
  int height;
 
  Node(E data, Node<E> left, Node<E> right, int height) {
    this.data = data;
    this.left = left;
    this.right = right;
    this.height = height;
  }
 
  Node(E data, Node<E> left, Node<E> right) {
    this(data, left, right, 0);
  }
}
A private member function computes the height of node so that an empty tree's height can also be computed:
private int height(Node<E> p) { 
  return (p == null) ? -1 : p.height; 
}
Let's consider how the single rotate left will take place. We start with code like this:
int comp = myCompare(elt, n.data);
if (comp < 0) {                       // insertion goes left
  n.left = add(elt, n.left);
  ...
Next, add the test for being out of balance:
  if (height(n.left) - height(n.right) == 2) {
Assuming this is true, we decide between single-rotate-left and double-rotate-left operations:
  int comp1 = myCompare(elt, n.left.data);
  if (comp1 < 0) {
    return s_rotate_left(n); // left-left insertion
  } 
  else {
    return d_rotate_left(n); // left-right insertion
  }
We are, of course, assuming that n.left != null, but this is implied by the fact that height(n.left) - height(n.right) == 2. Suppose a left-left insertion took place, so that we call:
  return s_rotate_left(n); // left-left insertion
The function code implements the effects indicated in the diagram above.
private Node<E> s_rotate_left(Node<E> n) {
  Node<E> k2 = n;
  Node<E> k1 = k2.left;
 
  int x = k1.left.height;  // k1.left cannot be null
 
  k2.left = k1.right;
  k1.right = k2;
 
  k2.height = x;
  k1.height = x + 1;
 
  return k1;
}
The original call was:
n = add(n, elt);
The outcome would be:
n = s_rotate_left(n);
with the new Node holding elt would end up in the left subtree of n.


© Robert M. Kline