## AVL Trees

### The SetMap Application

The examples in this and other documents in this section all part of the Java Application SetMap. Download the source archive and install it as a Java Application with Existing Sources. See the Using NetBeans document for details.

Like others, the SetMap project has multiple Main Classes intended to illustrate various independent features. The simplest way to run the various classes is to right-click the file and select Run File either in the file itself or from the listing in the Projects window.

The Java files from the project which are referenced in this document are:
```adapter/
util/
AVLTreeSet
demo/
AVLDemo
```

### Search Tree Path Lengths

In section 4.3.5, the textbook sets up a recurrence equation for computing the average internal path length of a binary search tree with n nodes. The internal path length of any tree is defined as:
sum of the depths of all nodes
We consider all possible binary search trees. By generating and solving (later in the textbook) a recurrence equation, it can be shown that the average internal path length over all binary trees with n nodes is
D(n) = O(n*log(n))
So D(n) averages the lengths of all paths in all trees with n nodes. It turns out that this is the correct approach to generating a recurrence formula.

This figure from the textbook is an illustration of such a random tree:
Figure 4.26 from
Weiss textbook
It can then be argued that:
```(average total path length in an average tree with n nodes)/n
= D(n)/n = O(n*log(n))/n = O(log(n))
```
In contrast the worst case is encountered if the sequence of inputs is sorted or reverse-sorted; the outcome is a tree skewed completely right or left, respectively. For example, inserting the sequence 10, 20, 30, 40, 50 produces the search tree:
In this case the internal path length with n nodes is:
```0 + 1 + 2 + … + (n-1) = (n-1)*n/2,  i.e.,  O(n2).
```
and the average path length is O(n)

The goal of advanced tree data structures is to ameliorate this worst-case scenario by having its operations "re-balance" the tree under certain circumstances so that the O(n) depths are avoided or "fixed" during tree usage.

### AVL Search Trees

An AVL (Adelson-Velski/Landis) tree is a binary search tree which maintains the following height-balanced "AVL property" at each node in the tree:
```abs( (height of left subtree) – (height of right subtree) ) ≤ 1
```
Namely, the left and right subtrees are of equal height, or their heights differ by 1. Recall that the height of a tree is the maximum depth (from the root) of any node. A tree with one node has height 0. We will say that the empty (null) tree has height -1.

It can be proved that an AVL tree with n nodes has height O(log(n)), and so any n search/insert/delete operations ensuring worst-case search cost of O(log(n)). The key idea behind the AVL tree is how a subtree is re-balanced when a node insertion or removal causes the AVL property to fail. Like the textbook, we will consider only insertions.

#### Re-balancing Strategies

Suppose that a node satisfies the AVL property and that an add goes into the left subtree. There will be two separate cases to consider:
• the add goes into the left-left subtree
• the add goes into the left-right subtree
If either of these causes the left subtree to increase in height so that the increased height causes the AVL property to fail, we perform one of the following two respective rotation operations:
• single rotate (from the) left
• double rotate (from the) left
In either case, the new tree will be balanced (i.e., the AVL property satisfied) and will have height equal to the height of the tree prior to insertion. This means that an add will only ever cause one re-balancing rotation to reestablish the AVL property for the entire tree.

### AVL Re-balancing Operations

The recursive add operation goes down the tree to the insertion point. Any AVL-out-of-balance problem will be "discovered" and dealt with as you come out of the recursive call. Consider the first such "out-of-balance" discovery and subsequent re-balancing operation. We want to argue two things:
• the re-balanced tree satisfies the AVL property at each node, and
• the height of the re-balanced tree is the height of tree prior to insertion.
Thus as we proceed out of the recursive add calls, no other node will detect an "out-of-balance" state. In particular, only one re-balance ever needs to be done per add.

Aside from symmetric situations, there are two distinct cases to consider, so suppose the add "goes left." It's trivial to see that if the inserted node is the left child, then the new tree cannot be unbalanced. Therefore the cases are based on whether the insertion goes to the left or right of the left child.

#### single rotate (from the) left, on a left-left insertion

This diagram depicts the re-balance operation after an unbalancing left-left insertion.

Proof: Let x, y, z be the heights of the trees X, Y, Z, respectively after insertion. Assume that the insertion into X caused an unbalancing at k2 (but not at k1 or at any node in X). We gather this information:
1. since the insertion into X caused an unbalancing height change for the full tree, the full tree's height after insertion must be x + 2, and thus x + 1 before insertion.
2. x must be ≥ than y + 1, otherwise the height of the left subtree at k2 would not have increased
3. x cannot be greater than y + 1, otherwise the "AVL problem" would have been observed at k1; conclude that x = y + 1
4. x + 1 ≥ z + 2: otherwise there would be no problem at k2
5. x + 1 cannot be greater than z + 2, otherwise the left side height prior to insertion (x) would be greater than z + 1, meaning that k2 was unbalanced prior to insertion; conclude that x + 1 = z + 2, i.e., x = z + 1
Putting these facts together:
```x = y + 1, x = z + 1, and so y = z.
```
For the re-balanced tree:
1. at k2, left side height = y = z = right side height
2. at k1, left side height = x = y + 1 = right-side height
3. height of the tree = x + 1 = height before insertion

#### double rotate (from the) left, on a left-right insertion

This diagram depicts the re-balance operation after an unbalancing left-right insertion.

Proof: Let a, b, c, d be the heights of the trees A, B, C, D, respectively after insertion. There are 3 cases to consider:
1. the inserted element is k2
2. the inserted element goes into B
3. the inserted element goes into C
In the first case, with a bit of pondering, you can convince yourself that the tree can consist only of the three nodes k1, k2 and k3 with all other subtrees empty. We will show the proof for the second case where the insertion goes into the B subtree; the proof of insertion into the C subtree is symmetric.
1. since the insertion into B caused an unbalancing height change of the full tree, the full tree's height after insertion must be b + 3, and thus b + 2 before insertion.
2. b must be > c, otherwise the left subtree at k3 would not have changed height.
3. b cannot be > c + 1, otherwise there would be a problem at k2; conclude that b = c + 1
4. b + 1 must be > a, otherwise the left subtree at k3 would not have changed height.
5. b + 1 cannot be > a + 1, otherwise there would be a problem at k1; conclude that b + 1 = a + 1, i.e., b = a
6. in order for there to be a "problem" at k3, we must have left-side height ≥ right-side height + 2, i.e., b + 2 ≥ d + 2
7. we cannot have b + 2 > d + 2, otherwise
b + 1 = left side height before insertion ≥ d + 2 = right side height before insertion + 2,
meaning that the tree would have been unbalanced before insertion; conclude that b + 2 = d + 2, i.e., b = d
Putting these facts together:
```d = a = b = c + 1
```
For the re-balanced tree:
1. at k1, left side height = a = b = right side height
2. at k3, left side height = c, right-side height = d = c + 1
3. at k2, left side height = b + 1 = right-side height = d + 1
4. height of the tree = b + 2 = height before insertion

### Examples

An important example of AVL trees is the behavior on a worst-case add sequence for regular binary trees:
```1, 2, 3, 4, 5, 6, 7
```
All insertions are right-right and so rotations are all single rotate from the right. All but two insertions require re-balancing:
 at 1 ⇒ at 3 ⇒
 at 2 ⇒ at 5 ⇒

It can be shown that inserting the sequence 1,...,2h-1 will make a perfect tree of height h-1.

Here is another example. The insertion sequence is:   50, 25, 10, 5, 7, 3, 30, 20, 8, 15
 single rot. left at 50 ⇒
 double rot. left at 10 ⇒
 single rot. left at 25 ⇒
 double rot. right at 7 ⇒

### Demo program

The class we wish to create, AVLTreeSet is an improvement of the SearchTreeSet class in the sense of always maintaining an O(log(n)) search time.

The basis is
We create a new Set class: Then insert the following content
util.AVLTreeSet
The following main class demonstrates the rotations effected by insertions using both fixed and random insertion patterns
demo.AVLDemo

### AVL Tree Code

In order to keep track of height efficiently, the Node class is expanded to add a height field:
```private class Node<E> {
E data;
Node<E> left, right;
int height;

Node(E data, Node<E> left, Node<E> right, int height) {
this.data = data;
this.left = left;
this.right = right;
this.height = height;
}

Node(E data, Node<E> left, Node<E> right) {
this(data, left, right, 0);
}
}```
A private member function computes the height of node so that empty tree's height can also be computed:
```private int height(Node<E> p) {
return (p == null) ? -1 : p.height;
}```
Let's consider how the single rotate left will take place. Our code uses a private helper class
```private static class MutableBoolean {
boolean value;
}```
An object of this class is passed into the tree functions. The internal value is then set according to whenter the element was found or not. This makes everything more efficient by combining the tree restructuring operations with the test for containment.

```int comp = myCompare(elt, n.data);
if (comp < 0) {                       // insertion goes left
...```
where found is of type MutableBoolean. Next, add the test for being out of balance:
`  if (height(n.left) - height(n.right) == 2) {`
Assuming this is true, in which case found.value=false, we decide between single-rotate-left and double-rotate-left operations:
```  int comp1 = myCompare(elt, n.left.data);
if (comp1 < 0) {
return s_rotate_left(n); // left-left insertion
}
else {
return d_rotate_left(n); // left-right insertion
}```
We are, of course, assuming that n.left != null, but this is implied by the fact that height(n.left) - height(n.right) == 2. Suppose a left-left insertion took place, so that we call:
`  return s_rotate_left(n); // left-left insertion`
The function code implements the effects indicated in the diagram above.
```private Node<E> s_rotate_left(Node<E> n) {
Node<E> k2 = n;
Node<E> k1 = k2.left;

int x = k1.left.height;  // k1.left cannot be null

k2.left = k1.right;
k1.right = k2;

k2.height = x;
k1.height = x + 1;

return k1;
}```
The original call was:
`n = add(elt, n, found);`
The outcome would be:
`n = s_rotate_left(n);`
with the new Node holding elt would end up in the left subtree of n.