TreeSets + SearchTreeSet
— print (last updated: Oct 14, 2009) print

Select font size:

Sets

The basic notion of a Set<E> is that it is simply a Collection<E> which does not store duplicates. A Set<E> by itself has no particular ordering of the elements, there is no notion of position, no notion of first or last, etc. The key ingredient is that there are no "duplicate" elements in the sense of the equals operator. If an element is already in the set, the add operation, attempting to add a duplicate element, returns false and the duplicate is not added.

A stronger version of the Set<E> is called a SortedSet<E> which maintains an ordering of the elements. This interface provides functions for computing the first and last elements and can keep track of the range between any two elements.

Even stronger is the NavigableSet<E> in which we an find elements in the set which are "near to" some element. A NavigableSet<E> also allows the removal of the first and last elements.

Tree Implementation

The Set<E> implementation of interest in this document is the TreeSet<E>, which is a NavigableSet<E> whose elements are distributed in a search tree. The simplest form of a search tree is a binary search tree, which is a binary tree maintaining the following ordering property at each node:
data values
of all nodes in
the left subtree
of node n
< data value
at node n
< data values
of all nodes in
the right subtree
of node n
Here is an example of a tree with integer values:
There is a wide variety of advanced search tree implementations which enhance the binary model primarily by increasing the branching. A well-known example is the B-tree.

When working with trees, you should become familiar with the terminology:
root,
children,
leaf (node with no children),
interior node (non-leaf),
a node's level or depth,
a tree's height (maximum depth),
a nodes's height (height of the subtree at that node),
a perfect tree (all interior nodes have two children & all leaves at same level)
For example, some facts about the above tree:
node 20, the root, is at level 0, 
         it has height 3, 
         it has two children: nodes 15 and 30
node 25  is at level 2, 
         it has height 1, and is a perfect subtree
node 17  is a leaf at level 3 and has height 0

Comparability

Any SortedSet<E>, whatever the implementation may be, must somehow have the ability to determine an ordering of the elements. This usually means that the element type implements the Comparable<E> interface, implying that the compareTo function is defined:
                      < 0  means e1 is less that e2
e1.compareTo(e2)      = 0  means that e1 equals e2
                      > 0  means that e1 is greater than e2
Alternatively, a Comparator<E> object can be used to define the relation between two elements. A Comparator<E> object, cmp, is one which defines the compare function. Using such an object we can defined the relation between two elements:
                      < 0  means e1 is less that e2
cmp.compare(e1,e2)    = 0  means that e1 equals e2
                      > 0  means that e1 is greater than e2

TreeSet operation restrictions

Because of the need for comparability, the TreeSet<E> class does not support adding null — it throws an exception. This is not a restriction across all Set<E> implementations. For example, it is OK to store null into a HashSet<E>. Another restriction is that the remove(Object obj) function will fail if called with an object which is not of the element type in use.

Demo Project Setup

Do the following to create demos for trees: Make the code of Main be the content below. There are initially three main programs to test-run:
  1. Illustration of basic Set operations
  2. How to use a TreeSet to create an ordered list of "unique words". Make a comparison of times with this approach compared with how we might do so using Lists alone.
  3. Illustration of advanced Set operations used in the SortedSet and NavigableSet interfaces.
The main1 function makes use of a Comparator object defined like this:
Comparator<Integer> cmp = new Comparator<Integer>() {
  public int compare(Integer lhs, Integer rhs) {
    String lhs_str = lhs.toString();
    String rhs_str = rhs.toString();
    return lhs_str.compareTo(rhs_str);
  }
};
If you've never seen this construction before, the object cmp is created using an anonymous inner class. Comparator is an interface which requires the definition of the function compare. Java creates a class which implements Comparator on the fly and makes cmp an instance of this class. The class is local to the main1 function and is never given a name.

Here is the code:

tsetdemo.Main
package tsetdemo; import java.util.*; import util.WordProcess; public class Main { public static void main(String[] args) { main1(args); //main2(args); //main3(args); } public static void main1(String[] args) { // A set of Strings: Set<String> str_set = new TreeSet<String>(); System.out.println("add \"mm\": " + str_set.add("mm")); System.out.println("add \"mm\": " + str_set.add("mm")); for (String s: new String[] {"aa", "xx", "dd", "pp", "cc", "qq", "nn"} ) { str_set.add(s); } System.out.println(str_set); System.out.println("remove \"mm\": " + str_set.remove("mm")); System.out.println(str_set); System.out.println(); // A set of Integers: Set<Integer> int_set = new TreeSet<Integer>(); int[] samples = { 10, 5, 15, 6, 19, 33, 4, 2 }; for (int i: samples ) { int_set.add(i); } System.out.println("int_set: " + int_set); // A set of Integers with "lexicographic" comparison defined: Comparator<Integer> cmp = new Comparator<Integer>() { public int compare(Integer lhs, Integer rhs) { String lhs_str = lhs.toString(); String rhs_str = rhs.toString(); return lhs_str.compareTo(rhs_str); } }; Set<Integer> int_set_lex = new TreeSet<Integer>(cmp); for (int i: samples ) { int_set_lex.add(i); } System.out.println("int_set_lex: " + int_set_lex); } public static void main2(String[] args) { String url; url = "http://www.nytimes.com/"; url = "http://en.wikipedia.org/wiki/Jimi_Hendrix"; url = "http://en.wikipedia.org/wiki/Pink_Floyd"; List<String> allWords = new ArrayList<String>(); try { WordProcess.getWordsFromURL(allWords, url); // create a sorted list of unique words from allWords using an ArrayList List<String> uniqueWordList1 = new ArrayList<String>(); long use_list_start = System.currentTimeMillis(); for (String word : allWords) { if (!uniqueWordList1.contains(word)) uniqueWordList1.add(word); } Collections.sort(uniqueWordList1); long use_list_end = System.currentTimeMillis(); // create a sorted list of unique words from allWords using a TreeSet List<String> uniqueWordList2 = new LinkedList<String>(); Set<String> uniqueWordSet = new TreeSet<String>(); long use_set_start = System.currentTimeMillis(); for (String word : allWords) { uniqueWordSet.add(word); } for (String word: uniqueWordSet) { uniqueWordList2.add(word); } long use_set_end = System.currentTimeMillis(); System.out.println("using list: " + (use_list_end - use_list_start)); System.out.println("using set: " + (use_set_end - use_set_start)); boolean show_words = false; if (show_words) { System.out.println("---------- UNIQUE WORDS ----------"); WordProcess.printInRows(uniqueWordList1, 80); System.out.println("-------------------------------"); } } catch (Exception x) { x.printStackTrace(); } } public static void main3(String[] args) { SortedSet<String> sset = new TreeSet<String>(); for (String s: new String[] {"aa", "xx", "dd", "pp", "cc", "qq", "nn"} ) { sset.add(s); } System.out.println(sset); System.out.println("------ SortedSet properties:"); System.out.println("first= " + sset.first()); System.out.println("last= " + sset.last()); System.out.println("headSet(\"m\")= " + sset.headSet("m")); System.out.println("tailSet(\"m\")= " + sset.tailSet("m")); NavigableSet<String> nset = (NavigableSet<String>) sset; System.out.println("pollFirst = " + nset.pollFirst()); System.out.println(nset); System.out.println("------ NavigableSet properties:"); System.out.println("ceiling(\"m\") = " + nset.ceiling("m")); System.out.println("floor(\"m\") = " + nset.floor("m")); } }

Adapters

As with the AListAdapter, LListAdapter, and DequeAdapter we create a vacuous basis of the TreeSet class from which we will derive our user-defined version. The TreeSet effectively only needs to implement the NavigableSet interface, and so we call our class NavSetAdapter.

As a useful variation, we will write both a simpler SetAdapter class and make NavSetAdapter an extension. This will serve several purposes later down the line as well.

Create the Java Class:
Class Name: SetAdapter
package:    set
Then insert the following content ( click to show )

Create the Java Class:
Class Name: NavSetAdapter
package:    set
Then insert the following content ( click to show )

SearchTreeSet Implementation

Create a Java Class:
Class Name: SearchTreeSet
package:    tset
Then insert the following content ( click to show )

Initial test-drive of our implementation

To verify that SearchTreeSet "works" like TreeSet (to some degree):
  1. add this import statement to Main.java:
    import tset.SearchTreeSet;
    
  2. Replace each occurrence of TreeSet by SearchTreeSet.
Our SearchTreeSet does not implement all the functions necessary to execute main3 (but you can try it to see what happens).

SearchTreeSet Discussion

Our implementation uses a simple binary tree constructed with nodes defined by this inner class:
private class Node {
  E data;
  Node left, right;

  Node(E data, Node left, Node right) {
    this.data = data;
    this.left = left;
    this.right = right;
  }
}
The data members of our class are these:
  private Node root = null;
  private int size = 0;
  private Comparator<? super E> cmp = null;

Comparability

We remarked above that a key feature of a TreeSet is the ability to order the elements. Thus they must either be Comparable themselves, as in this instantiation:
new TreeSet<SomeComparableElementType>();
or a Comparator must be passed to the constructor as in:
Comparator<SomeElement> my_cmp 
  = new Comparator<SomeElement>() {
    public int compare(SomeElement lhs, SomeElement rhs) {
      //...
    }
}
new TreeSet<SomeElementType>(my_cmp);
In the former case the internal private data member cmp is the default null value. In the latter case, we would have
cmp = my_cmp
making cmp a non-null value. In either case, the private helper function, myCompare, does the right thing:
private int myCompare(E lhs, E rhs) {
  if (cmp == null) {
    // no Comparator passed in constructor,
    // assume element type is Comparable
    return ((Comparable)lhs).compareTo(rhs);
  } else {
    // use the Comparator to compare elements
    return cmp.compare(lhs, rhs);
  }
}

Textbook comparison

The textbook's class declaration of the search tree is like this:
public class SearchTreeSet<E extends Comparable<? super E>> ...
which implies that its element type must be Comparable! This implementation unnecessarily restricts the element types a search tree can hold, a restriction not made by the java.util.TreeSet class in which we can pass a Comparator for a non-Comparable class.

The add function

Our goal in this section is to detail the writing of the add function:
boolean add(E elt);
This is commonly implemented as a recursive helper function:
boolean add(Elt, Node n);
called at the root:
add(elt, root);
The recursive helper traverses the tree looking for an occurence of elt. If found, return true, otherwise, add a new Node to the tree containing elt and return false. At issue is the fact that a "found/not found" boolean value must be returned and pointer in the tree must be reset if not found.

Reference parameters

Other languages, like C++, provide reference parameters in functions which provide the ability to change the value of a specific variable passed into the function. Java only provides pass-by-value, meaning that the function only ever works with a copy of the variable, never permitting a change to the original variable's value. If we want a function to have an effect on a variable, our choices are limited to these:
  1. pass a variable to the function (i.e., pass its value); return an updated value and reset the variable from the return value.
  2. pass an object containing the variable as member data and reset the value using permitted member operations.
For example, consider the add function we need to write. It must have the ability to create a new Node in the tree, thus changing the value of a Node parameter. It also must return a boolean value indicating whether the element to be inserted was already present or not.

Consider this psuedo-code:
private boolean add(E elt, Node & n) // n is a "reference" parameter 
{
  if (n == null) {
    n = new Node(elt, null, null);
    return false;
  }
  int comp = myCompare(elt, n.data);
  if (comp < 0) {
    return add(elt, n.left);
  } else if (comp > 0) {
    return add(elt, n.right);
  } else {
    return true;
  }
}
If we had the ability to create the above function, we would simply call:
boolean found = add(elt, root);
If necessary, root would be reset by the call to this add function and the return value would indicate whether the inserted value was already found.

Unfortunately, since reference parameters do not exist in Java, we must work harder to create this effect. The first step is to ignore the boolean return value and concentrate getting the Node reference parameter replaced. We write (as in the textbook):
private Node add(E elt, Node n) {
  if (n == null) {
    return new Node(elt, null, null);
  }
  int comp = myCompare(elt, n.data);
  if (comp < 0) {
    n.left = add(elt, n.left);
  } else if (comp > 0) {
    n.right = add(elt, n.right);
  }
  return n;
}
Given this version of add, the call must be written:
root = add(elt, root);
Of the parameter-passing choices mentioned above, this is of the first style. You can see this style repeated in the statements:
n.left = add(elt, n.left);
n.right = add(elt, n.right);
return n;
The "proof" that this Java-based algorithm works is by induction. Assuming root is the top of a binary search tree, The claim is that the statement
n = add(elt, n);
correctly inserts e into the search tree rooted at Node n
Proof:
  if n == null, clearly it works

  if n != null, and elt is found at n, it works also
  
  otherwise, the algorithm correctly calls one of these two:
  
     n.left = add(elt, n.left)    or     n.right = add(elt, n.right)
    
  which, by the inductive assumption, must work, 
  in that case n itself is not changed.

boolean return value using a mutable boolean object

Going beyond what the textbook does, we also want to return whether the element was found or not. Since we do not have reference variables, the way to get around this problem uses the second alternative mentioned above: manipulate the boolean variable as a member of an object.

What we have in mind is to create a simple wrapper class around a boolean variable. The java.lang.Boolean class seems like a logical choice, but its instances are not mutable. This means that once the boolean member data is set in the constructor call:
Boolean bool = new Boolean(true);
the internal boolean value can never be changed. All the standard java.lang wrapper classes are immutable. Finding a suitable mutable boolean object is somewhat of a challenge. We could easily create what is needed ourselves like this:
public class MyBoolean {
  private boolean value;
  public MyBoolean( boolean value ) { this.value = value; } 
  public MyBoolean( ) { this(false); } 
  public boolean get() { return value; }
  public void set(boolean value) { this.value = value; } 
}
but we'd like to use something that is already built into Java. Given that constraint, the class
java.util.concurrent.atomic.AtomicBoolean;
suits our needs, although it is intended for other, more complex, situations involving multiple threads. Its member functions give us just what we need:
AtomicBoolean bool = new AtomicBoolean();  // initial value is optional
bool.set( either-true-or-false );        // set value to true or false
boolean b = bool.get()                     // retrieve the internal value
Using this, we can now rewrite add to achieve our goal:
private Node add(E elt, Node n, AtomicBoolean found) {
  if (n == null) {
    found.set(false);
    return new Node(elt, null, null);
  }
  int comp = myCompare(elt, n.data);
  if (comp < 0) {
    n.left = add(elt, n.left, found);
  } else if (comp > 0) {
    n.right = add(elt, n.right, found);
  } else {
    found.set(true);
  }
  return n;
}
The code of the public add function goes like this:
public boolean add(E elt) {
  AtomicBoolean found = new AtomicBoolean();
  root = add(elt, root, found);
  if (!found.get()) {
    ++size;
  }
  return !found.get();
}
In particular the public function's return value is the opposite of the private helper's boolean "return" value. The public function wants to say "true" if the element was not found (and therefore inserted), and "false" if the element was found (and not inserted).

Traversals / Displaying Structure

Obtaining information about the contents of a tree structure is usually accomplished by traversing the tree in one of several standard methods:
  1. preorder traversal (root, left, right):
     visit the root
     visit the left subtree recursively
     visit the right subtree recursively
    
  2. inorder traversal (left, root, right):
     visit the left subtree recursively
     visit the root
     visit the right subtree recursively
    
  3. postorder traversal (left, right, root):
     visit the left subtree recursively
     visit the right subtree recursively
     visit the root
    
  4. levelorder traversal (this is not a recursive)
     visit the root
     visit the children of the root
     visit the children's children
     ...
    
In the simplest form, each of the first three traversals is easy to implement. For example, an inorder printing would be this:
inorderOutput(root);

// using the function ...
void inorderOutput(Node n) {
  if (n == null)
    return;
  inorderOutput(n.left);     // left
  System.out.println(n.data);  // root
  inorderOutput(n.right);    // right
}
Consider the effect on the following binary search tree:
One can validate that using the above inorder algorithm would print out the elements in their correct comparable order (one per line):
inorder listing: 5 15 17 18 20 22 25 27 30
If you're trying to reveal the structure of the tree you also want to add indentation corresponding to a node's level. To do this, modify the code to the following:
inorderOutput(root,0);

// using the function ...
void inorderOutput(Node n, int level) {
  if (n == null)
    return;
  inorderOutput(n.left, level+1);
  // print indentation corresponding to current level
  System.out.println(n.data);
  inorderOutput(n.right, level+1);
}
Assuming that "print indentation corresponding to current level" means print 3*level blanks, the output would be this:
      5
   15
          17
      18
20
          22
      25
          27
   30
This yields a pretty good correspondence to the actual tree, but it is easier to "see" the tree if one used a reverse-inorder printing corresponding the "right, root, left:"
reverseInorderOutput(root,0);

// using the function ...
void reverseInorderOutput(Node n, int level) {
  if (n == null)
    return;
  reverseInorderOutput(n.right, level+1);
  // print indentation corresponding to current level
  System.out.println(n.data);
  reverseInorderOutput(n.left, level+1);
}
Compare the output of this function with the actual tree:
   30
          27
      25
          22
20
      18
          17
   15
      5
Although generally not as useful for visualization, the preorder and postorder traversal methods are nevertheless very important in tree-processing algorithms. Here are their listings of our sample tree:
preorder listing:  20 15 5 18 17 30 25 22 27
postorder listing: 5 17 18 15 22 27 25 30 20

Test program

The following main4 program creates a random binary search tree with up to 10 elements and then calls the implementation-dependent traverse-output functions
public String inorderOutput();
public String preorderOutput();
public String preorderOutput();
public String reverseInorderOutput();
Double-check that you have the import line:
import tset.SearchTreeSet;
Then add main4 to the Main class, and have main call it:
  public static void main4(String[] args) {
    SearchTreeSet<Integer> tree = new SearchTreeSet<Integer>();

    int num_samples = 10;
    Random rand = new Random();
    for (int i = 0; i < num_samples; ++i) {
      tree.add(rand.nextInt(2 * num_samples));
    }

    System.out.println("-------- reverse inorder --------");
    tree.reverseInorderOutput();

    System.out.println("-------- preorder --------");
    tree.preorderOutput();
    
    System.out.println("-------- postorder --------");
    tree.postorderOutput();
  }

The remove function

Removing an element from a search tree, although tricky, is conceptually straight-forward with one (common) exception: removing the element at a node which has both children. In this case, the solution is either: In either case the search tree's order structure is preserved.

RemoveMin

The idea of removeMin from a given node is simple: navigate to the leftmost node; capture the value, delete the node. Unfortuately, Java's lack of a reference parameters makes it tricky.

The first step is to understand the removal operation without returning the value. To do removeMin from the right subtree of Node n, we would execute:
n.right = removeMin( n.right );
using the recursive function:
Node removeMin( Node n ) {
  if (n.left == null)
    return n.right;
  else {
    n.left = removeMin( n.left ); 
    return n;
  }
}
In order to capture the minimum value, we could, as is done in the textbook, call two operations, one to find the minimum value, a second to remove it. This is probably acceptable; however it is inefficient because we are navigating down to the same point in the tree twice.

As an alternative, we will use a wrapper object around a generic type to capture the minimum value. As before, we could easily define such a class ourselves, but we prefer what is already written via the class:
java.util.concurrent.atomic.AtomicReference<E> 
The modified code should capture the value prior to deleting the node:
private Node removeMin(Node n, AtomicReference<E> save) {
  if (n.left == null) {
    save.set(n.data);
    return n.right;
  } else {
    n.left = removeMin(n.left, save);
    return n;
  }
}
After executing:
AtomicReference<E> save = new AtomicReference<E>();
n.right = removeMin( n.right, save );
we can obtain the captured value as save.get() and replace Node n's value with it:
n.data = saved.get();

The private (recursive) remove operation

Like the add operation, remove is called like this:
AtomicBoolean found = new AtomicBoolean();
root = remove( elt, root, found );
Like add, we navigate down from the root looking for an occurrence of elt. The base case is an empty tree in which we do nothing. If the element is found, then the "hard" case is when the node has both children. At this point we can choose to do either removeMin or removeMax. There are two reasonable ways to proceed: The textbook uses the former alternative and notes the potential "skewing" effect which can occur with certain add/remove operation sequences. The code we use effects the latter alternative.
private Random flipACoin = new Random();

private Node remove(E elt, Node n, AtomicBoolean found) {
  if (n == null) {
    found.set(false);
    return null;
  }
  int comp = myCompare(elt, n.data);
  if (comp < 0) {
    n.left = remove(elt, n.left, found);
    return n;
  } 
  if (comp > 0) {
    n.right = remove(elt, n.right, found);
    return n;
  } 
  found.set(true);
  if (n.left == null) {
    return n.right;
  } 
  if (n.right == null) {
    return n.left;
  } 
  // the most difficult situation is upon us!!
  AtomicReference<E> save = new AtomicReference<E>();
      
  boolean choose_min = (flipACoin.nextInt(2) == 1);
  
  if (choose_min) {
    n.right = removeMin(n.right, save);
  } 
  else {
    n.left = removeMax(n.left, save);
  }
  n.data = save.get();
  return n;
}
Other alternatives might be to try to make a "best" choice between removeMin or removeMax based on, say, the height of the left and right subtrees. To make such a modified algorithm efficient you should have an extra field in the Node class which keeps track of the height of the subtree at that node; furthermore, adds and removes would have to maintain this height field correctly.

The public remove operation

The public member function which uses has the following code:
AtomicBoolean found = new AtomicBoolean();
root = remove(elt, root, found);
if (found.get()) {
   --size;
}
return found.get();     // if found, a successful removal
Note the semantic difference between add and remove in that finding the element returns false for add and true for remove. The actual public remove operation in full is this:
  public boolean remove(Object obj) {
    if (isEmpty()) {
      return false;
    }
    E elt = (E) obj;
    AtomicBoolean found = new AtomicBoolean();
    root = remove(elt, root, found);
    if (found.get()) {
      --size;
    }
    return found.get();
  }
The prototype of remove(Object obj) forces us to cast obj to the generic type E. This which will generate an exception if obj is not of type E to begin with! Alternatively, you could test
if (obj instanceOf E)
and return false if this fails. However Java's java.util.TreeSet implementation appears to prefer the former approach of throwing an exception on type mismatch.

Test program

The following main5 program creates a "random" tree of up to num_samples integers and removes one of the integers at random. The tree is shown before and after removal. Add main5 to the Main class and have main call it.
  public static void main5(String[] args) {
    SearchTreeSet<Integer> tree = new SearchTreeSet<Integer>();
    List<Integer> list = new ArrayList<Integer>();

    int num_samples = 10;

    Random rand = new Random();
    for (int i = 0; i < num_samples; ++i) {
      int n = rand.nextInt(2 * num_samples);
      list.add(n);
      tree.add(n);
    }

    System.out.println("---------------");
    tree.reverseInorderOutput();
    System.out.println("---------------");

    int pos = rand.nextInt(num_samples);
    int num = list.get(pos);

    System.out.println("remove: " + num);
    tree.remove(num);

    System.out.println("---------------");
    tree.reverseInorderOutput();
    System.out.println("---------------");
  }


© Robert M. Kline