Closed Hashing

The SetMap Application

The examples in this and other documents in this section all part of the Java Application SetMap. Download the source archive SetMap.zip and install it as a Java Application with Existing Sources. See the Using NetBeans document for details.

Like others, the SetMap project has multiple Main Classes intended to illustrate various independent features. The simplest way to run the various classes is to right-click the file and select Run File either in the file itself or from the listing in the Projects window.

The Java files from the project which are referenced in this document are:
adapter/
  SetAdapter
util/
  ClosedHashSet
demo/
  ClosedHashDemo

Closed Hashing

A closed hashing implementation is one in which the elements stay in the array rather than being placed in an auxiliary collision set, such as a linked list. In this case, we would make our internal data elements be something like this:
private float loadFactor;
private int size = 0;
private E[] data;       // array of Elements, not of "collision chains"
The implication of closed hashing is that collisions must be resolved by finding alternative array locations. These alternative locations are obtained by "probing" a sequence of possible indices. Following this probe sequence we look for the occurrence of our element or else null, where encountering null means that we have reached the end of the sequences of possible locations and conclude that the element is not in the hash table.

Probing techniques

There are three standard alternatives for probing algorithms: linear, quadratic, double-hash.

Linear probing

The simplest idea is called linear probing. Upon collision, look for an alternative position in the sequence of indices following the hashed index.
hash(obj) + 1
          + 2
          + 3
          + ... increase by adding 1, 
            mod by capacity=data.length

Clustering Effects

There are two so-called clustering effects that come about from linear probing (our textbook doesn't discuss them):
  1. primary clustering: when a key hashes into a probe path of other collisions, it follows the same path.
  2. secondary clustering: two or more colliding elements must follow the same probe path.
If you consider the open hashing model, it does not have primary clustering, but it does have secondary clustering. So the term primary in some sense signifies "worse."

Quadratic probing

A technique which eliminates primary clustering is called quadratic probing in which the following sequence of alternative locations is explored on collision:
hash(obj) + 1 
          + 4  = 22
          + 9  = 32
          + 16 = 42
          + ... increase by adding successive odd numbers, 
            mod by capacity=data.length
One issue which must be resolved is whether we will actually be able to find an open slot with this method under reasonable circumstances. A relatively easy number-theoretic result ensures that if the capacity is prime and the load is no more than 50%, then the probe sequence is guaranteed to succeed.

Double-hash probing

A technique which eliminates both primary and secondary clustering is double-hashing. The idea is to compute a second hash value of the element to be inserted.
a = hash2(obj)
We want 0 < a < capacity. Using this value, search this sequence of alternative locations:
hash(obj) + a
          + 2 * a
          + 3 * a
          + ....   increase by adding a, 
                   mod by capacity=data.length
Because the "a" value is likely to differ for different elements, different probe sequences are used even when elements collide.

This second hash function needs to generate integers somewhere in the range from 1 (not 0) to capacity - 1. A likely choice might be:
max = prime2
hash2(obj) = max - abs(obj.hashCode() % max)
where prime2 is the first prime "down from" capacity (which itself is a prime). This is more-or-less the approach that the textbook uses. Nevertheless, it is not really necessary that max be a prime. Our approach is simpler:
max = capacity - 2;
hash2(obj) = max - abs(obj.hashCode() % max)
This choice makes max an odd number, but not a prime, based on the way primes are found. Nevertheless, if the load factor is maintained correctly, hash2 will be called infrequently, making the chances of duplicating the hash2 values of colliding elements unlikely.

Return value of probe functions

When we search for an element or key in a map, it is useful to be able to indicate the final index of the search. If the element was not found, this is where we would insert it, if it is found, say in a map, then the index gives us access to the value which can be altered. We can follow a logic similar to what happens in binary search (inverted), interpreting non-negative values as "not found" and negative values as "found"; For example, here is how double-hash probe might be written:
private int hash2(Object obj) {
  int max = data.length - 2;
  return max - Math.abs(obj.hashCode() % max);
}
 
private int double_hash_probe(Object obj) {
  int pos = hash(obj);
  int amt = hash2(obj);
  while (true) {
    if (data[pos] == null) {
      return pos;
    } else if (data[pos].equals(obj)) {
      return -(pos+1);
    } else {
      pos = (pos + amt) % data.length;
    }
  }
}
The contains and add operations would look like this:
private int probe(Object obj) { 
  return double_hash_probe(obj);
}
 
public boolean contains(Object obj) {
  return probe(obj) < 0;
}
 
public boolean add(E elt) {
  int pos = probe(elt);
  if (pos < 0) {
    return false;
  } else {
    ++size;
    data[pos] = elt;
    return true;
  }
}

Hash table load

Maintaining an acceptable load is even more critical in closed hashing than open hashing, because there is an absolute limit on what can be put in the table, whereas with open hashing, we simply would make the collision chains larger.

Deletion

Removal of elements is tricky in closed hashing because the removal of an element in the middle of a probe sequence would cause the search for elements further on in the probe sequence to be "lost."

Given this limitation, deletion is best done by the so-called lazy deletion method in which a deleted element is simply marked as removed. The best way to mark an object removed is to retain its value wrapped within a dedicated class used for marking. One possibility is this:
private class Deleted<E> {
  E value;
  Deleted(E value) {
    this.value = value;
  }
}
private Object[] data;  // data can now be of type E or Deleted<E>
We must adapt the data array to reflect the new polymorphic nature of the elements. Thus, removing an element, elt, means finding it at some position, pos,
data[pos].equals(elt)
and then simply replacing this position by the wrapped element:
data[pos] = new Deleted(data[pos]);

Coding complications

The hard part is figuring out precisely how to use these deleted markers. In particular, when we are probing, say for the add operation
add(elt);
we want this to happen: In particular, a successful add operation will always reuse a deleted position if there is one to avoid filling up the table with deleted elements. It is programmatically quite tricky to achieve all these ends, and the double_hash_probe function expands accordingly:
private int double_hash_probe(Object obj) {
  int pos = hash(obj);
  int amt = hash2(obj);
  int first_del_pos = -1; // first deleted position marker
 
  while (true) {
    if (data[pos] == null) {  // reached end of the probe path
      if (first_del_pos == -1) {
        return pos;               // no deleted elements discovered
      } else {
        return first_del_pos;     // found deleted element != obj
      }
    } 
 
    else if (data[pos].equals(obj)) { // obj found
      return -(pos+1);
    } 
 
    else if (data[pos] instanceof Deleted) {
      if (((Deleted<E>)data[pos]).value.equals(obj)) {  // prev. deleted
        return pos;
      } 
      else if (first_del_pos == -1) {  // first deleted position unset
        first_del_pos = pos;
      }
      pos = (pos + amt) % data.length;    // keep going
    } 
    else {                             // obj not found
      pos = (pos + amt) % data.length;
    }
  }
}
Note that the hash table should maintain information about the number of deleted elements as well as size and combine these counts when computing the load, because deleted elements contribute equally to filling up the table.

Implementation and Demo

The implementation class is based on this adapter:
adapter.SetAdapter  
The ClosedHashSet implementation is:
util.ClosedHashSet  
The ClosedHashSet class implements closed hashing with all three probing versions possible. Choose a probing version by setting the private data member:
  private String whichProbe
    = "linear"
  //= "quadratic"
  //= "double-hash"
    ;
The following test function, mainClosedHashCompare, is manufactured to illustrate the clustering effects and thereby compare the differences of the three probing methods. Install the following main class into the HashDemo project:
demo.ClosedHashDemo  
To make the comparison, run it three times using each of the three valid values of the private data member whichProbe.


© Robert M. Kline