The SetMap Application
The examples in this and other documents in this section all part of the Java Application SetMap. Download the source archive SetMap.zip and install it as a Java Application with Existing Sources. See the Using NetBeans document for details. Like others, the SetMap project has multiple Main Classes intended to illustrate various independent features. The simplest way to run the various classes is to rightclick the file and select Run File either in the file itself or from the listing in the Projects window. The Java files from the project which are referenced in this document are:adapter/ SetAdapter NavSetAdapter util/ SearchTreeSet demo/ TreeDemo1 TreeDemo2 SearchTreeTraversals SearchTreeRemovals
Sets
The basic notion of a Set is that it is simply a Collection which does not store duplicates. A Collection has no notion of of position (like a List) and no notion of first and last (like a Deque). It only expresses aggregate properties with the operations add, remove, contains, size, isEmpty, etc.
There are three Set implementation classes of interest to us: TreeSet: An implementation of NavigableSet in which the sorted order of the elements is maintained.
 HashSet: An implementation of Set in which no particular ordering of elements is maintained.
 LinkedHashSet: An implementation of Set in which, like a LinkedList, the the entry order of elements is maintained.
 for TreeSets, equality is measured by comparability of the elements, or through an external comparator.
 for HashSets, equality is measured by the equals operator.
An ArrayList or a LinkedList could be used to implement a Set. The add(elt) member function would have to scan the list searching for to avoid adding a duplicate. If the list is an unordered ArrayList, or a LinkedList, the search for duplicates requires linear time, or O(n), time.
Alternatively, we could maintain an ArrayList with elements in sorted order. In this case the duplicate search cost is only O(log(n)) time, but it still takes O(n) data movements to actually add the element.
We will see that we can do much better, making add/remove/contains all work in sublinear time.
Ordered Set interfaces
A stronger version of the Set<E> interface is the SortedSet interface which assumes an ordering of the elements. This interface provides functions for computing the first and last elements and can keep track of the range between any two elements. Even stronger is the NavigableSet interface in which we can locate elements in the set which are "near to" some element. A NavigableSet<E> also allows the removal of the first and last elements.Simple TreeSet Implementation
The Set implementation of interest in this document is the TreeSet class which implements the NavigableSet interface. Its elements are distributed in a search tree structure of some kind which permits an efficient O(log(n)) search. The simplest form of a search tree is a binary search tree, which is a binary tree that maintains the following ordering property at each node:data values of all nodes in the left subtree of node n 
<  data value at node n 
<  data values of all nodes in the right subtree of node n 
root,For example, some facts about the above tree:
children,
a descendant of a node (a path exists from the node to it),
subtree (a node and all of its descendants),
leaf (a node with no children),
interior node (a nonleaf),
a node's level or depth (the pathdistance from the root),
the full tree's height (the maximum depth of any node),
a node's height (the height of the subtree at that node),
a perfect tree (all interior nodes have two children & all leaves at same level)
node 20, the root, is at level 0, it has height 3, it has two children: nodes 15 and 30 node 25 is at level 2, it has height 1, and is a perfect subtree node 17 is a leaf at level 3 and has height 0
Comparability
Any SortedSet<E>, whatever the implementation may be, must somehow have the ability to determine an ordering of the elements. This usually means that the element type implements the Comparable<E> interface, implying that the compareTo function is defined:< 0 means e1 is less that e2 e1.compareTo(e2) = 0 means that e1 equals e2 > 0 means that e1 is greater than e2Alternatively, a Comparator<E> object can be used to define the relation between two elements. A Comparator<E> object, cmp, is one which defines the compare function. Using such an object we can defined the relation between two elements:
< 0 means e1 is less that e2 cmp.compare(e1,e2) = 0 means that e1 equals e2 > 0 means that e1 is greater than e2
TreeSet operation restrictions
Because of the need for comparability, the TreeSet<E> class does not support adding null — it throws an exception. Disallowing null values is not a restriction across all Set<E> implementations. For example, it is OK to store null into a HashSet<E>. Another restriction is that the remove(Object obj) function will fail if called with an object which is not of the element type in use.Demo Project
Here are the relevant initial demo programs:Comparator<Integer> cmp = new Comparator<Integer>() { public int compare(Integer lhs, Integer rhs) { String lhs_str = lhs.toString(); String rhs_str = rhs.toString(); return lhs_str.compareTo(rhs_str); } };
TreeSet Implementation
We're going to use our own implementation of the TreeSet called SearchTreeSet. This is created is the same manner as in Array Lists As with the AListAdapter, LListAdapter, and DequeAdapter we create a vacuous basis of the TreeSet class from which we will derive our userdefined version. The TreeSet only needs to implement the NavigableSet interface, and so we call our class NavSetAdapter. As a useful variation, we will write both a simpler SetAdapter class and make NavSetAdapter an extension. This will serve several purposes later down the line as well.Initial testdrive
To verify that SearchTreeSet "works" like TreeSet (to some degree), replace the three occurrences of TreeSet by SearchTreeSet in the TreeDemo1 program:// Set<String> str_set = new TreeSet<>(); // Set<Integer> int_set = new TreeSet<>(); // Set<Integer> int_set_lex = new TreeSet<>(cmp); Set<Integer> int_set_lex = new SearchTreeSet<Integer>(cmp); Set<String> str_set = new SearchTreeSet<>(); Set<Integer> int_set = new SearchTreeSet<Integer>();
SearchTreeSet Discussion
Our implementation uses a simple binary tree constructed with nodes defined by this inner class:private static class Node<E> { E data; Node<E> left, right; Node(E data, Node<E> left, Node<E> right) { this.data = data; this.left = left; this.right = right; } }
private Node<E> root = null; private int size = 0; private Comparator<? super E> cmp = null; // passed into constructor
Comparability
We remarked above that a key feature of a TreeSet is the ability to order the elements. Thus they must either be
Comparable themselves, as in this instantiation:
new TreeSet<MyComparableClass>();

or a Comparator must be passed to the constructor as in:
Comparator<MyClass> my_cmp = new Comparator<MyClass>() { public int compare(MyClass lhs, MyClass rhs) { //... } } new TreeSet<MyClass>(my_cmp);
cmp = my_cmpmaking cmp a nonnull value. In either case, we use a private helper function, myCompare, to do the right thing:
private int myCompare(E lhs, E rhs) { if (cmp == null) { // no Comparator passed into constructor, // assume element type, E, is Comparable return ((Comparable)lhs).compareTo(rhs); } else { // use the Comparator to compare elements of type E return cmp.compare(lhs, rhs); } }
The textbook's approach
The textbook's class declaration of the search tree is like this:public class SearchTreeSet<E extends Comparable<? super E>> { ... }
The contains function
The contains function generalizes what we would do with a singlylinked list. Keep in mind that the Java API requires the parameter type to be Object even though we only ever should use the E type. We can write it iteratively:public boolean contains(Object obj) { E elt = (E) obj; Node<E> n = root; while (n != null) { int comp = myCompare(elt, n.data); if (comp < 0) { n = n.left; } else if (comp > 0) { n = n.right; } else { return true; } } return false; }
public boolean contains(Object obj) { E elt = (E) obj; return contains(root, elt); } private boolean contains(Node<E> n, E elt) { if (n == null) { return false; } int comp = myCompare(elt, n.data); if (comp < 0) { return contains(n.left, elt); } else if (comp > 0) { return contains(n.right, elt); } else { return true; } }
The add function
The add function:boolean add(E elt) { ... }
root = add(root, elt);
Reference parameters (nonexistent in Java)
Other languages like C++ have reference parameters for functions which provide the ability to modify a variable argument. If this existed in Java, we could write the recursive helper like this:boolean add(Node n, E elt);
boolean found = add(root, elt);
private boolean add(Node & n, E elt) // n is a "reference" parameter { if (n == null) { n = new Node(elt, null, null); // reset n's value return false; } int comp = myCompare(elt, n.data); if (comp < 0) { return add(n.left, elt); } else if (comp > 0) { return add(n.right, elt); } else { return true; } }
Simulate a reference parameter
The Java language only provides passbyvalue, meaning that the function only ever works with a copy of the variable, never permitting a change to the original variable's value. If we want a function to have an effect on a variable, we must, in effect, simulate a reference parameter. Our choices are these: Pass a variable to the function
(i.e., pass its value); return an updated value
and reset the variable from the return value. The call will look like this:
SomeType some_var = foo( some_var, ... );
The variable some_var passes its value into the function, the function computes and returns a new value, and then the assignment statement resets some_var to the new value.  Pass an object containing the variable as member data and reset the value using permitted member operations.
Javavalid binary search tree add
At first, we want to ignore the issue of reporting whether the element was found or not and concentrate on simulating the Node reference parameter. We write (as in the textbook):private Node<E> add(Node<E> n, E elt) { if (n == null) { return new Node<>(elt, null, null); } int comp = myCompare(elt, n.data); if (comp < 0) { // go left n.left = add(n.left, elt); // modify subtree on left return n; } else if (comp > 0) { // go right n.right = add(n.right, elt); // modify subtree on right return n; } else { // equal, do not add return n; // no changes to n } }
root = add(root, elt);You can see this style repeated in each of the recursive calls:
n.left = add(n.left, elt);and
n.right = add(n.right, elt);The proof that this Javabased algorithm works is by induction. Assuming root is the top of a binary search tree, the claim is that the statement
n = add(n, elt);correctly inserts elt into the search tree rooted at Node n
if n == null, clearly it works
if n != null, and elt is found at n, it works also
otherwise, the algorithm correctly calls one of these two:
if n != null, and elt is found at n, it works also
otherwise, the algorithm correctly calls one of these two:
n.left = add(n.left,elt) or n.right = add(n.right,elt)
which, by the inductive assumption, must work, and
in that case n itself is not changed.
Determining the add return value using contains
Java's public add function needs to know whether the private add function actually added the element or not because: if the element was found during the add it means it was not added, and so we do not increase size and return false to indicate a failed add.
 if the element was not found during the add, it means that it was added, and so we increase the size and return true to indicate a successful add
public boolean add(E elt) { if (contains(elt)) { return false; } root = add(root, elt); ++size; return true; // successful add if not found } private Node<E> add(Node<E> n, E elt) { if (n == null) { return new Node<>(elt, null, null); } int comp = myCompare(elt, n.data); if (comp < 0) { // go left n.left = add(n.left, elt); // modify subtree on left return n; } else if (comp > 0) { // go right n.right = add(n.right, elt); // modify subtree on right return n; } // we never see "equals" in this scenario }
Timing Considerations
The add function as well as contains are considered to be logarithmic time, i.e. O(log(n)). This of course, assumes that the tree is "well balanced" in some sense, however, in any realistic implementation, this will be the case. What can specifically be said is this: the average time to do add or contains in an average binary tree is O(log(n)).Traversals / Displaying Structure
Obtaining information about the contents of a tree structure is usually accomplished by traversing the tree in one of several standard methods:
preorder traversal (root, left, right):
visit the root visit the left subtree recursively visit the right subtree recursively

inorder traversal (left, root, right):
visit the left subtree recursively visit the root visit the right subtree recursively

postorder traversal (left, right, root):
visit the left subtree recursively visit the right subtree recursively visit the root

levelorder traversal (this is not recursive)
visit the root visit the children of the root visit the children's children ...
preorderOutput(root); // using the function ... private void preorderOutput(Node n) { if (n == null) { return; } System.out.println(n.data); // root preorderOutput(n.left); // left preorderOutput(n.right); // right }
inorderOutput(root); // using the function ... private void inorderOutput(Node n) { if (n == null) { return; } inorderOutput(n.left); // left System.out.println(n.data); // root inorderOutput(n.right); // right }
postorderOutput(root); // using the function ... private void postorderOutput(Node n) { if (n == null) { return; } postorderOutput(n.left); // left postorderOutput(n.right); // right System.out.println(n.data); // root }Consider the effect on the following binary search tree:
inorder listing: 5 15 17 18 20 22 25 27 30In order to reveal the structure of the tree you also want to add indentation corresponding to a node's level. To do this, modify the code as follows:
inorderOutput(root,0); // using the function ... private void inorderOutput(Node n, int level) { if (n == null) { return; } inorderOutput(n.left, level+1); // print indentation corresponding to current level System.out.println(n.data); inorderOutput(n.right, level+1); }Assuming that "print indentation corresponding to current level" means print 3*level blanks, the output would be this:
5 15 17 18 20 22 25 27 30This yields a pretty good correspondence to the actual tree, but it is easier to "see" the tree if one used a reverseinorder printing corresponding to a "right, root, left" visitation:
reverseInorderOutput(root,0); // using the function ... private void reverseInorderOutput(Node n, int level) { if (n == null) { return; } reverseInorderOutput(n.right, level+1); // print indentation corresponding to current level System.out.println(n.data); reverseInorderOutput(n.left, level+1); }Compare the output of this function with the actual tree:
30 27 25 22 20 18 17 15 5 
preorder listing: 20 15 5 18 17 30 25 22 27 postorder listing: 5 17 18 15 22 27 25 30 20
Test program
The following demo program creates a random binary search tree with up to 10 elements and then calls the implementationdependent traverse/output functionspublic String reverseInorderOutput(); public String preorderOutput(); public String preorderOutput();Here is the program:
Timing Considerations
For any of the traversals on a tree with N nodes, we can prove the following:
the number of function calls = 1 + 2*N
The proof is by induction. It is easy to verify for N=0
or N=1. Then for N ≥ 1 write:
N = 1+L+Rwhere
L = # nodes in left subtree R = # nodes in right subtreeConsidering the initial call to the node as 1, we can see that
total calls = 1 + total calls on left side + total calls on right side
By induction,
total calls on left = 1+2*L
total calls on right = 1+2*R
giving us:
total calls on right = 1+2*R
total calls = 1 + (1+2*L) + (1+2*R) = 1 + 2*(1+L+R) = 1 + 2*N
In particular, we say that traversals are linear time, or O(n) in terms of the number, n of nodes.
The remove function
Removing an element from a search tree, although tricky, is conceptually straightforward with one (common) exception: removing the element at a node with two nonnull children. In this case, the solution is either: findMin/removeMin: remove the minimum (leftmost) node from the right subtree and replace the root's value with the value of the removed node.
 findMax/removeMax: remove the maximum (rightmost) node from the left subtree and replace the root's value with the value of the removed node.
removeMin and removeMax
The idea of removeMin from a given node is simple: navigate to the leftmost node; capture the value, delete the node. Unfortunately, Java's lack of a reference parameters causes problems. The first step is to understand the removal operation without capturing the value. To do removeMin from the right subtree of Node n, we would execute:n.right = removeMin( n.right );
private Node<E> removeMin( Node<E> n ) { // assumes n != null if (n.left == null) { return n.right; } else { n.left = removeMin( n.left ); return n; } }
p.right = removeMin(p.right);
In order to capture the minimum value and remove the node containing that value, we can, as is done in the textbook, call two operations, one to find the minimum value, a second to remove it. Here is the findMin function:
private E findMin(Node<E> n) { // assumes n != null if (n.left == null) { return n.data; } else { return findMin(n.left); } }
The private remove helper function
Like the add operation, remove needs to know whether the element was found or not in order to provide the correct return value. As with add, we can use contains function to first discover if the remove operation is necessary. Like add, we navigate down from the root looking for an occurrence of elt. The base case is an empty tree in which we do nothing. If the element is found, then the "hard" case is when the node has both children. At this point we can choose to do either removeMin or removeMax. There are two reasonable ways to proceed: choose to do one of removeMin or removeMax and always make the same choice
 choose one of removeMin or removeMax based on a random value
private Node<E> remove(Node<E> n, E elt) { if (n == null) { // elt not in tree: you should never get here if you return null; // use contains prior to using this helper function } int comp = myCompare(elt, n.data); if (comp < 0) { n.left = remove(n.left, elt); // go left recursively return n; // n is unchanged } if (comp > 0) { n.right = remove(n.right, elt); // go right recursively return n; // n is unchanged } // else, comp == 0, Node n contains elt if (n.left == null) { return n.right; // remove the node containing elt } if (n.right == null) { return n.left; // remove the node containing elt } n.data = findMin(n.right); // replace n.data with value on right n.right = removeMin(n.right); // remove node with value on right return n; // meaning n (holding removed value) is still there }
private final Random flipACoin = new Random(); private Node<E> remove(Node<E> n, E elt) { if (n == null) { // elt not in tree: you should never get here if you return null; // use contains prior to using this helper function } int comp = myCompare(elt, n.data); if (comp < 0) { n.left = remove(n.left, elt); // go left recursively return n; // n is unchanged } if (comp > 0) { n.right = remove(n.right, elt); // go right recursively return n; // n is unchanged } // else, comp == 0, Node n contains elt if (n.left == null) { return n.right; // remove the node containing elt } if (n.right == null) { return n.left; // remove the node containing elt } // the node containing elt has 2 nonnull children boolean choose_min = (flipACoin.nextInt(2) == 1); if (choose_min) { n.data = findMin(n.right); // replace n.data with value on right n.right = removeMin(n.right); // remove node with value on right } else { n.data = findMax(n.left); // replace n.data with value on left n.left = removeMax(n.left); // remove node with value on left } return n; // meaning n (holding removed value) is still there }
Figure 4.27 from
Weiss textbook
Weiss textbook
The public remove operation
The public remove member function uses the following code:public boolean remove(Object obj) { E elt = (E) obj; if (contains(elt)) { root = remove(root, elt); size; return true; // successful remove if found } return false; }