## 14. Shellsort & Algorithmic Comparisons

### The Sorting Application

The examples in this document all part of the Java Application Sorting. Download the source archive and install it as a Java Application with Existing Sources. See the Using NetBeans document for details.

Like others, the Sorting project has multiple Main Classes intended to illustrate various independent features. The simplest way to run the various classes is to right-click the file and select Run File either in the file itself or from the listing in the Projects window.

### Algorithmic comparisons

Sorting algorithms provide the ability for one to impress another computer scientist with his or her knowledge of algorithmic understanding. First of all, O(n*log(n)) acts as a lower bound to how quickly we can sort using a comparison-based sorting algorithm (we can sort faster than that in certain special cases).
algorithm ¿stable? best time average time worst time extra memory
selectionsort no O(n2) O(n2) O(n2) O(1)
insertionsort yes O(n) O(n2) O(n2) O(1)
shellsort no O(n*log(n)) O(n1.25) O(n1.5) O(1)
quicksort no O(n*log(n)) O(n*log(n)) O(n2) O(log(n))
mergesort yes O(n*log(n)) O(n*log(n)) O(n*log(n)) O(n)
heapsort no O(n) O(n*log(n)) O(n*log(n)) O(1)
conjectured
Of the algorithms which share the same order class, a second consideration is then the value of the big-O order constant. Even though it can have quadratic worst-case time, Quicksort is often considered the fastest algorithm on random arrays; the implication is that it has the smaller order constant, than, say Heapsort.

The Shellsort algorithm discussed in this document uses the so-called Hibbard increments (see below). Although Shellsort is not theoretically as fast as others, it is still comparable in speed when arrays are not huge and enhanced sorting increments are employed.

### Shellsort

Shellsort, named after its inventor, Donald Shell, relies upon the fact that insertion sort does very well if the array is nearly sorted. Another way of saying this, is that insertion sort does well if it does not have to move each item "too far" in the sense of "too many comparisons."

The idea is to repeatedly do insertion sort on all elements at fixed, decreasing distances apart: hk, hk-1, ..., h1= 1. The fact that the last increment used should be 1 means that regular insertion sort is done at the last step, guaranteeing that the array will be sorted.

The choice of increments turns out to be crucial.

#### Shell's Original Increments

Donald Shell initially proposed these increments based on the size, N, of the array:
```N/2
N/4
N/8
...
1
```
For example, if N = 50, the increments would be
```25, 12, 6, 3, 1
```
The ShellSort example in the textbook uses these increments and notes that "better increments are possible."

#### Hibbard's Increments

It turns out that a better choice of increments are the so-called Hibbard increments:
```hk= 2k–1 < N
...
h3= 7
h2= 3
h1= 1
```
To be able to use the hk increment, you need an array of size at least hk+1.

#### Psuedo-code

The psuedo-code for shellSort using the Hibbard increments is as follows:
```find k0 so that 2k0 - 1 < size
for (k = k0; k > 0; --k) {  // from larger increments to smaller
inc = 2k- 1;              // get array increment
for (i = 0; i < inc; ++i) {
insertionSort( [ a[i], a[i+inc], a[i+2*inc], ... ] );
}
}
```
The fact that the last increment in the sequence is 1 means that regular insertion sort is done at the last step and therefore the array is guaranteed to be sorted by this procedure. The point is that when the increments are larger, there are fewer elements and they will be moved further than simply interchanging adjacent elements. At the last step, we do regular insertion sort and hopefully the array is "nearly sorted" which makes insertion sort come close to its best case behavior of running in linear time.

The notion that this is an speed improvement seems initially far-fetched. There are two enclosing for loops to get to an insertion sort, thus this algorithm has four enclosing loops.

#### Demo

The following is a demo of the sorting process of an array of size 11. Only 4 subarrays of 7 elements apart fit.
```   0  1  2  3  4  5  6  7  8  9  10
-  -  -  -  -  -  -  -  -  -  --
[6, 7, 8, 6, 9, 7, 2, 2, 2, 9, 8]
^                    ^
[2, 7, 8, 6, 9, 7, 2, 6, 2, 9, 8]
^                    ^
[2, 2, 8, 6, 9, 7, 2, 6, 7, 9, 8]
^                    ^
[2, 2, 8, 6, 9, 7, 2, 6, 7, 9, 8]
^                    ^

[2, 2, 8, 6, 9, 7, 2, 6, 7, 9, 8]  7's now sorted
^        ^        ^        ^      ---- 4 elements 3 apart
[2, 2, 8, 2, 9, 7, 6, 6, 7, 9, 8]
^        ^        ^        ^   ---- 4 elements 3 apart
[2, 2, 8, 2, 6, 7, 6, 8, 7, 9, 9]
^        ^        ^         ---- 3 elements 3 apart

[2, 2, 7, 2, 6, 7, 6, 8, 8, 9, 9]  3's now sorted

[2, 2, 2, 6, 6, 7, 7, 8, 8, 9, 9]  regular insertion sort
```

### ShellSort Code

Here is the code:
Shell Sort: Show Hide
```  // permit us to specify the shell sort increments externally

static void shellSort(int[] a, int from, int to) {
int size = to - from;

if (increments == null) {
// set increments to the default
for (int inc = 1; inc < size; inc = 2 * (inc + 1) - 1) {
}
}
java.util.Iterator<Integer> iter = increments.descendingIterator();

while (iter.hasNext()) {
int gap = iter.next();
for (int k = 0; k < gap; ++k) {
// insertionSort on ( a[fromIndex+k], a[fromIndex+k+gap], ...  )
for (int i = from + k + gap; i < to; i += gap) {
int insert_value = a[i];
int j = i;
for (; j >= from + k + gap; j -= gap) {

if (a[j - gap] > insert_value) {
a[j] = a[j - gap];
}
else {
break;
}
}
if (j != i) {
a[j] = insert_value;
}
}
}
}
}```
To get a sense of how it works, test it in ShowSorts by setting
```choice = "shell"
```
Our code permits the user to specify the increments which the algorithms uses. If none are specified, the Hibbard increments are used by default. You can compare the behavior using a different set of increments by creating your own LinkedList of increments. For example, to use Shell's original increments (1, 2, 4, 8, ...), add this code prior to the call:
```//------------- add this (and Fix Imports):
for (int inc = 1; inc < size; inc = 2 * inc) {
}
ShowAlgorithms.setShellSortIncrements(shellIncrements);
//------------- just before the actual sort is called

choices.get(choice).sort(A);
}```

### Empirical Comparisons

The two valuable empirical measures of an sorting algorithm are
• how long it takes to execute (real time).
• home many operations, i.e. comparisons and data movements, that it requires (abstract time).
We'll work with measuring the real time. There are two timing test programs, one for primitive types (int only), and the second for Object types (Integer only).
demo.TimeIntSorts
and
demo.TimeGenericSorts
You can compare against any of 8 algorithms, although you will see a striking difference between the first 3 (slow) sorts and the remaining 5.
```    choices.put("select", A -> Algorithms.selectionSort(A));
choices.put("insert", A -> Algorithms.insertionSort(A));
choices.put("bubble", A -> Algorithms.bubbleSort(A));   // int only

//    choices.put("shell", A -> Algorithms.shellSort(A));
//    choices.put("quick", A -> Algorithms.quickSort(A));
//    choices.put("merge", A -> Algorithms.mergeSort(A));
//    choices.put("heap", A -> Algorithms.heapSort(A));
//    choices.put("java", A -> Arrays.sort(A));```

### ShellSort Analysis

#### Stability

Shellsort (with Hibbard increments or others) is not stable. It can be readily demonstrated with an array of size 4 (the smallest possible). Instability is to be expected because the increment-based sorts move elements distances without examining of elements in between.

#### Shellsort has O(n*log(n)) best case time

The best case, like insertion sort, is when the array is already sorted. Then the number of comparisons for each of the increment-based insertion sorts is the length of the array. Therefore we can determine:
comparisons =
n, for 1 sort with elements 1-apart (last step)
+ 3 * n/3, for 3 sorts with elements 3-apart (next-to-last step)
+ 7 * n/7, for 7 sorts with elements 7-apart
+ 15 * n/15, for 15 sorts with elements 15-apart
+ ...
Each term is n. The question is how many terms are there? The number of terms is the value k such that
```2k - l  < n
```
So k < log(n+1), meaning that the sorting time in the best case is less than n * log(n+1) = O(n*log(n)).

#### Shellsort worst case time is no worse than quadratic

The argument is similar as previous, but with a different overall computation. In a very worst-case scenario (which doesn't exist), each sort would be quadratic time.
comparisons ≤
n2, for 1 sort with elements 1-apart (last step)
+ 3 * (n/3)2, for 3 sorts with elements 3-apart (next-to-last step)
+ 7 * (n/7)2, for 7 sorts with elements 7-apart
+ 15 * (n/15)2, for 15 sorts with elements 15-apart
+ ...
And so, with a bit of arithmetic, we can see that the number of comparisons is bounded by:
```  n2 * (1 + 1/3 + 1/7 + 1/15 + 1/31 + ...)
< n2 * (1 + 1/2 + 1/4 + 1/8 + 1/16 + ...)
= n2 * 2
```
The last step uses the sum of the geometric series.

#### Shellsort worst and average times

The point about shell sort is that the initial sorts, acting on elements at larger increments apart involve fewer elements, even considering a worst-case scenario. At the larger increments, "far-away" elements are swapped so that the number of inversions is dramatically reduced. At the later sorts using smaller increments, the behavior then comes closer to optimal.

It can be proved that the worst-case time is sub-quadratic at O(n3/2) = O(n1.5). As can be expected, the proof is quite difficult. The textbook remarks that the average case time is unknown although conjectured to be O(n5/4) = O(n1.25). The textbook also mentions other increment sequences which have been studied and seen to produce even better performance.