User-defined Sorting

The Sorting Application

The examples in this document all part of the Java Application Sorting. Download the source archive Sorting.zip and install it as a Java Application with Existing Sources. See the Using NetBeans document for details.

Like others, the Sorting project has multiple Main Classes intended to illustrate various independent features. The simplest way to run the various classes is to right-click the file and select Run File either in the file itself or from the listing in the Projects window.

Sorting Algorithms

This document introduces two sorting algorithms: selectionSort and insertionSort. It also introduces the following classes which provide the code basis for all sorts discussed:

The ShowSorts demo program

This program, using the sorting algorithms from sort.ShowAlgorithms gives demonstrations of the sorting algorithms written:
demo.ShowSorts  
There are 6 sorting algorithms:
selectionSort
insertionSort
shellSort
quickSort
mergeSort
heapSort
Choose an algorithm to demonstrate by altering these settings:
    String choice =  // choose one
        "select"
//        "insert"
//        "shell"
//        "quick"
//        "merge"
//        "heap"
;
...
    int size = 15;
...
  // choose the method of generating the array
  public static int[] getIntArray(int size) {
    Random r = new Random();
    int[] A = new int[size];
    for (int i = 0; i < A.length; ++i) {
      A[i] = 1 + r.nextInt(2 * size);
      //A[i] = 10;
      //A[i] = 10 + i;
      //A[i] = 2*NUM - i;
    }
    return A;
  }

Sorting Scenarios

When you analyze sorting algorithms, like any other algorithm, the three cases are of interest:
  1. best case
  2. worst case
  3. average case
The randomly-generated arrays usually give us a good idea of the average case. However another factor to consider is how the sorting algorithm behaves in certain "prepared" situations:
  1. the array is already sorted
  2. the array is reverse sorted
  3. all the elements in the array are equal
Although these might be considered unlikely, they do represent situations which, although not likely to occur in a strict sense, can occur in a looser sense, namely, an array may be "nearly" sorted, or may have "many equals." In a number of situations the best and worst cases will match up with one or more of the situations (a) – (c).

Selection Sort

We start with this algorithm because it is probably the simplest one. The idea is this: If we had an array, a[], of int elements, the code would look like this (using n = a.length):
for (int i = 0; i < n - 1; ++i) {
  // set selected_index so that a[selected_index] 
  // is smallest in a[i]..a[n-1]
  int selected_index = i;
  for (int j = i + 1; j < n; ++j) {
    if (a[j] < a[selected_index]) {
       selected_index = j;
    }
  }
  if (selected_index != i) {      
    // swap a[i] & a[selected_index]
    int tmp = a[selected_index];
    a[selected_index] = a[i];
    a[i] = tmp;
  }
}

Array-based sorting functions

Java treats sorting algorithms in two very different ways:
  1. Sorting primitive types. Java writes one algorithm in 7 different ways, one for each primitive type. Here stability is not an issue.
  2. Generic types. Here stability may matter. A single version of the algorithm can suffice.

Sorting generic (non-primitive) types

There are 4 generic array-based sort operations in Java:
1. public static <E> void somethingSort(
        E[] a, int fromIndex, int toIndex, Comparator<? super E> c);
2. public static <E> void somethingSort(E[] a, Comparator<? super E> c);
3. public static void somethingSort(Object[] a, int fromIndex, int toIndex);
4. public static void somethingSort(Object[] a);
Only the first of these contains actual sort code; the other 3 are derived:
2. public static <E> void somethingSort(E[] a, Comparator<? super E> c) {
     somethingSort(a, 0, a.length, c);
   }

3. public static void somethingSort(Object[] a, int fromIndex, int toIndex) {
     Comparator<Object> c = new Comparator<Object>() {
       @Override
       public int compare(Object lhs, Object rhs) {
         return ((Comparable)lhs).compareTo(rhs);
       }
     };
     somethingSort(a, fromIndex, toIndex, c);
   }

4. public static void somethingSort(Object[] a) {
     somethingSort(a, 0, a.length);
   }
In the case of selection sort, the primary generic sorting algorithm would be this:
public static <E> void selectionSort(
          E[] a, int from, int to, Comparator<? super E> c) 
{
  for (int i = from; i < to - 1; ++i) {
    int selected_index = i;
    for (int j = i + 1; j < to; ++j) {
      if (c.compare(a[j], a[selected_index]) < 0) {
        selected_index = j;
      }
    }
    if (selected_index != i) {
      swap(a, selected_index, i);
    }
  }
}

List-based sorting functions

Recall Java's sort function for List classes:
List L<SomeClass> = /* an ArrayList or LinkedList */;
Collections.sort(L);
Collections.sort(L, cmp);
If we were to write our own versions, there would be two generic list sort operations required:
public static <E> void somethingSort(List<E> list, Comparator<? super E> c);
public static <E extends Comparable<? super E>> void somethingSort(List<E> list);
Like arrays, only the most general version is written directly. In this case the latter version is derived from the former:
public static <E extends Comparable<? super E>> void somethingSort(List<E> list) {
  Comparator<E> c = new Comparator<E>() {
    @Override
    public int compare(E lhs, E rhs) {
      return lhs.compareTo(rhs);
    }
  };
  somethingSort(list, c);
}
The list usage introduces some overhead. What has to be done is to transfer the list conents into an array, sort the array and then write the sorted array back to the list.

Insertion Sort

The idea is like this: Moving the element into "correct position" could mean that if the left neighbor is less than the element, swap them and go down to the next position, repeat. The code could look something like this:
for (int i = 1; i < n; ++i) {
  for (int j = i; j > 0; --j) {
    if (a[j-1] > a[j]) {  // a[j-1] >  a[j]
      swap(a, j-1, j);
    } 
    else {     // a[j-1] <= a[j], and since the array up to j-1 is sorted
      break;   // we are done with this subarray
    }
  }
}
Each swap costs 3 data movement, but we can economize on this sequence by Here is the modified algorithm:
for (int i = 1; i < n; ++i) {
  int insert_value = a[i]; // pull the i-th value "out"
  int j;
  for (j = i; j > 0; --j) {         // j = "a first correct position"
    if (a[j-1] > insert_value) {    // a[j-1] is bigger
      a[j] = a[j-1];                // shift the value a[j-1] up
    } 
    else {
      break;
    }
  }
  if (j != i) {           // if j is not the same as i
    a[j] = insert_value;  // insert into the j-th position
  }
Here is the general form we'd use for an array:
public static <E> void insertionSort(
        E[] a, int from, int to, Comparator<? super E> c) 
{
  for (int i = from + 1; i < to; ++i) {
 
    E insert_value = a[i]; // pull the i-th value "out"
 
    int j;
    for (j = i; j > from; --j) {  // j = "a first correct position"
 
      if (c.compare(a[j - 1], insert_value) > 0) { // a[j-1] is bigger
 
        a[j] = a[j - 1];    // shift the value up
      } 
      else {
        break;
      }
    }
    if (j != i) {           // if j is not the same as i
      a[j] = insert_value;  // insert into the j-th position
    }
  }
}

Analysis and Comparison

When we consider timing analysis of sorting algorithms both comparisons and data movements count. We also want to consider the behavior of an array which is already sorted or nearly sorted, which is usually the best case in these algorithms.

For selection sort, the number of comparisons is always the same, regardless of the array input. The number is:
(n-1) + (n-2) + ... + 1  =  n*(n-1)/2
The number of data movements is at best 0 if the array is already sorted. The worst case is a reverse-sorted array in which case we must do a swap on each iteration. Counting a swap as 3 data movements we get a worst case of 3*(n-1).

For insertion sort, the number of comparisons and data movement may both vary. In the best case of a sorted array, there are n-1 data movements (to save the insert element) and n-1 comparisons. In the worst case of a reverse-sorted array there n*(n-1)/2 comparisons and the same number of shifts, each counting for 1 data movement.

A final table illustrates the analysis of the two algorithms:
best case
(sorted)
worst case
(reverse sorted)
comparisons data movements comparisons data movements
selection sort n*(n-1)/2 0 n*(n-1)/2 3*(n-1)
insertion sort n-1 n-1 n*(n-1)/2 2*(n-1) + n*(n-1)/2
In particular, if you know that your array is nearly sorted, insertion sort is close to an O(n) time. This fact about insertion sort is used in the shell sort algorithm to achieve a speed-up. Selection sort, although slow w.r.t comparisons, is fast w.r.t data movements. The heap sort algorithm uses the same selection idea but improves upon the manner of selection.

Stability

Selectsort is not stable. We can see this in a simple example:
[ 3, 3′, 2 ] ↠ [ 2, 3′, 3 ]
The "2" exchanges with the first "3" making "3" elements change order from the original array.

In contrast, insertion sort is stable. The element comparison is crucial for maintaining the stability:
if (c.compare(a[j–1], insert_value) > 0) {
  a[j] = a[j–1];
} 
If we were to change ">" to ">=", then equal elements would "slide" across each other and in the final array be reverse-ordered in the sorted output relative to the original array (this would also cause an unnecessary extra number of comparisons and data movements).

Inversions and Average time

In selection sort, the number of comparisons is always the same, n*(n–1)/2 n2/2, thus the average time is quadratic. In insertion sort, we see that the time can vary between linear (best) and quadratic (worst). To show that the average time is quadratic we count the average number of swaps of adjacent elements — recall that insertion sort doesn't actually swap the elements.

Consider all possible permutations of an array of n elements, assuming for simplicity that the elements are all distinct. Group each array with its reverse. The total number of inversions which appear in any array and its reverse is exactly n*(n–1)/2. The averaging, considering only this pair, is n*(n–1)/4. So, since this is true for every array/reverse pair, we conclude the average number of inversions in an array is n*(n–1)/4 n2/4.

Each swap in insertion sort removes one inversion, thus the average number of swaps ≈ n2/4. Since each swap was preceded by a comparison, this number also reflects the number of comparisons, yielding an total average cost of n2/2, which is quadratic of the same order constant as selection sort.

The preceding argument holds for any algorithm that sorts by swapping pairs of elements: it has a quadratic worst-case and average-case time. In particular, if you know about bubble sort, it sorts also by swapping pairs of elements. So how does bubble sort stack up against insertion sort: the answer is the insertion sort is the clear winner. Here are some excerpts from the Wikipedia article, http://en.wikipedia.org/wiki/Bubble_sort
Even among simple O(n2) sorting algorithms, algorithms like insertion sort are usually considerably more efficient [than bubble sort].

Due to its simplicity, bubble sort is often used to introduce the concept of an algorithm, or a sorting algorithm, to introductory computer science students. However, some researchers such as Owen Astrachan have gone to great lengths to disparage bubble sort and its continued popularity in computer science education, recommending that it no longer even be taught.

The Jargon file ... calls bubble sort "the generic bad algorithm". Donald Knuth, in his famous book The Art of Computer Programming, concluded that "the bubble sort seems to have nothing to recommend it, except a catchy name and the fact that it leads to some interesting theoretical problems", some of which he then discusses.
Despite this commentary, one thing you can say about bubble sort is that it is stable, like insertion sort.


© Robert M. Kline