User-defined Sorting
— print (last updated: Nov 18, 2009) print

Select font size:

NetBeans Installation

This document introduces two new sorting algorithms, selection sort and insertion sort. The support code will be used for other algorithms which we'll discuss later. We want to create a new Java project MySorts.

Click to download these two zipped packages:
show.zip     and     count.zip
Install them by extracting the contents into the src folder of the NetBeans MySorts project. Alternatively, you can create the files from the source code like this:
  1. Create the class:
    Class Name: Algorithms
    package:    show
    
    Then insert the following content ( click to show )
    This class illustrates the algorithm behavior visually in standard output. It is intended for usage only in small demonstration arrays.
  2. Create the class:
    Class Name: Algorithms
    package:    count
    
    Then insert the following content ( click to show )
    This class can be used to count the number of comparisons and data movements in sorting. The overhead is minimal, and so it can be used as the actual sorting algorithms.
  3. Create the class:
    Class Name: Counter
    package:    count
    
    Then insert the following content ( click to show )
    This is an auxiliary class used by count.Algorithms to provide external counters representing the comparisons and data movements which we want to count.

ShowSorts main program

Start by creating the following Main class with function mainShowSorts which serves for all future sorting algorithms. It gives demonstrations of five sorting algorithms: selectionSort, insertionSort, shellSort, quickSort, mergeSort. To choose one, simply set the WHICH value to one of the constants SELECT, INSERT, SHELL, QUICK, MERGE defined in the show.Algorithms class. Also set the NUM value (array size) appropriately to give a decent demonstration.

Here is the code: ( click to show )

Selection Sort

We start with this algorithm because it is probably the simplest one. The idea is this: If we had an array, a, of int elements, the code would look like this (using n = a.size):
for (int i = 0; i < n - 1; ++i) {
  // set selected_index so that a[selected_index] smallest in a[i]..a[n-1]
  int selected_index = i;
  for (int j = i + 1; j < n; ++j) {
    if (a[j] < a[selected_index]) {
       selected_index = j;
    }
  }
  if (selected_index != i) {      // swap a[i] & a[selected_index]
    int tmp = a[selected_index];
    a[selected_index] = a[i];
    a[i] = tmp;
  }
}

Array calls

There are 4 generic array sort operations in the Java scheme:
1. public static <E> void somethingSort(
        E[] a, int fromIndex, int toIndex, Comparator<? super E> c);
2. public static <E> void somethingSort(E[] a, Comparator<? super E> c);
3. public static void somethingSort(Object[] a, int fromIndex, int toIndex);
4. public static void somethingSort(Object[] a);
Only the first of these contains actual sort code; the other 3 are derived:
2. public static <E> void somethingSort(E[] a, Comparator<? super E> c) {
     somethingSort(a, 0, a.length, c);
   }

3. public static void somethingSort(Object[] a, int fromIndex, int toIndex) {
     Comparator<Object> c = new Comparator<Object>() {
       @Override
       public int compare(Object lhs, Object rhs) {
         return ((Comparable)lhs).compareTo(rhs);
       }
     };
     somethingSort(a, fromIndex, toIndex, c);
   }

4. public static void somethingSort(Object[] a) {
     somethingSort(a, 0, a.length);
   }
In the case of selection sort, the primary sorting algorithm is this:
  public static <E> void selectionSort(
          E[] a, int fromIndex, int toIndex, Comparator<? super E> c) {
    for (int i = fromIndex; i < toIndex - 1; ++i) {
      int selected_index = i;
      for (int j = i + 1; j < toIndex; ++j) {
        if (c.compare(a[j], a[selected_index]) < 0) {
          selected_index = j;
        }
      }
      if (selected_index != i) {
        swap(a, selected_index, i);
      }
    }
  }

List calls

There are 2 generic list sort operations in the Java scheme:
1. public static <E> void somethingSort(List<E> list, Comparator<? super E> c);
2. public static <E extends Comparable<? super E>> void somethingSort(List<E> list);
The second is derived from the first:
2. public static <E extends Comparable<? super E>> void somethingSort(List<E> list) {
     Comparator<E> c = new Comparator() {
       @Override
       public int compare(E lhs, E rhs) {
         return lhs.compareTo(rhs);
       }
     };
     somethingSort(list, c);
   }
There are two conceivable ways to write the primary list sorting alorithm:
  1. transfer the list to an array; sort the array; write the sorted array back to the list like this:
    public static <E> void selectionSort(List<E> list, Comparator<? super E> c) {
      E[] a = (E[]) list.toArray();
      selectionSort(a, c);
      for (int i=0; i < a.length; ++i)
        list.set(i, a[i]); 
    }
    
  2. sort the list in place, replacing element accesses of the form x = a[i] (read) by x = L.get(i) and a[i] = x (write) by L.set(i,x), like this:
    public static <E> void selectionSort(List<E> list, Comparator<? super E> c) {
      for (int i = 0; i < list.size() - 1; ++i) {
        int selected_index = i;
        for (int j = i + 1; j < toIndex; ++j) {
          if (c.compare(list.get(j), list.get(selected_index)) < 0) {
            selected_index = j;
          }
        }
        if (selected_index != i) {
          // this convenience operator is basically 3 set operations:
          Collections.swap(list, selected_index, i);
        }
      }
    }
    
Both ways involve overhead when compared to sorting an array, but it turns out that the first choice is favored. Consider the total overhead for the latter approach. The best sorting algorithms require O(n*log n) comparisons on average and each comparison requires 2 get operations, producing overhead of O(n*log n). In contrast, the former approach requires n get operations to read the list into an array and n sets to write the sorted array back to the list, thus a total of O(n) overhead. Perhaps the main downside of the first approach is the fact that we need extra space for the copied array, although the array elements themselves are the originals.

In the case of the list being a LinkedList, the latter approach is probably out of the question since get and set operations may not be O(1) time.

If the list were an ArrayList, the most efficient thing would be to sort the internal array directly, but there is apparently no way to do so, since the toArray() member returns a copy of the internal array.

ArrayList vs. LinkedList

Suitable coding should consider the target list class. The code above which rewrites the sorted array back to the list is this:
for (int i = 0; i < a.length; ++i)
  list.set(i, a[i]);
which suits a RandomAccess object like an ArrayList. In contrast, for a LinkedList object this code would be better:
list.clear();
for (int i = 0; i < a.length; ++i)
  list.add(a[i]);
This latter code, however, does not suit an Arrays.ArrayList object (as is returned by Arrays.asList(A)) which does not support removal needed the clear() function.

A compromise solution might be to write the code like this:
public static <E> void selectionSort(List<E> list, Comparator<? super E> c) {
  E[] a = (E[]) list.toArray();
  selectionSort(a, c);
  if (list instanceof RandomAccess) {
    for (int i = 0; i < a.length; ++i) {
      list.set(i, a[i]);
    }
  } else {
    list.clear();
    for (int i = 0; i < a.length; ++i)
      list.add(a[i]);
    }
  }
}

Insertion Sort

The idea is like this: Inserting into the correct position means examining each array element from the top, one by one, and shifting this element up if it is greater than then element to insert.

We could effectively write the code by having an element move down to its correct position by swapping with the one below and then moving down, etc. However, swapping is more costly than simply shifting an element (by a factor of 3) and so it is more efficient to think of shifting the element up.

Here is the code we'd use for an array:
  public static <E> void insertionSort(
          E[] a, int fromIndex, int toIndex, Comparator<? super E> c) {
    for (int i = fromIndex + 1; i < toIndex; ++i) {

      E insert_value = a[i]; // pull the i-th value "out"

      int j = i;
      for (; j > fromIndex; --j) {  // j = "a first correct position"

        if (c.compare(a[j - 1], insert_value) > 0) {

          a[j] = a[j - 1];    // shift the value up

        } else {
          break;
        }
      }
      if (j != i) {           // if j is not the same as i
        a[j] = insert_value;  // insert into the j-th position
      }
    }
  }
}

Analysis & Comparison

When we consider timing analysis of sorting algorithms both comparisons and data movements count. We also want to consider the behavior of an array which is already sorted or nearly sorted, which is usually the best case in these algorithms.

For selection sort, the number of comparisons is always the same, regardless of the array input. The number is:
(n-1) + (n-2) + ... + 1  =  n*(n-1)/2
The number of data movements is at best 0 if the array is already sorted. The worst case is a reverse-sorted array in which case we must do a swap on each iteration. Counting a swap as 3 data movements we get a worst case of 3*(n-1).

For insertion sort, the number of comparisons and data movement may both vary. In the best case of a sorted array, there are n-1 data movements (to save the insert element) and n-1 comparisons. In the worst case of a reverse-sorted array there n*(n-1)/2 comparisons and the same number of shifts, each counting for 1 data movement.

A final table illustrates the analysis of the two algorithms:
best case
(sorted)
worst case
(reverse sorted)
comparisons data movements comparisons data movements
selection sort n*(n-1)/2 0 n*(n-1)/2 3*(n-1)
insertion sort n-1 n-1 n*(n-1)/2 (n-1) + n*(n-1)/2
In particular, if you know that your array is nearly sorted, insertion sort is close to an O(n) time. This fact about insertion sort is used in the shell sort algorithm to achieve a speed-up. Selection sort, although slow w.r.t comparisons, is fast w.r.t data movements. The heap sort algorithm uses the same selection idea but improves upon the manner of selection.

Average time

In selection sort, the number of comparisons is always the same, n*(n–1)/2 n2/2, thus the average time is quadratic. In insertion sort, we see that the time can vary between linear (best) and quadratic (worst). To show that the average time is quadratic we count the average number of swaps of adjacent elements — recall that insertion sort doesn't actually swap the elements.

Consider all possible permutations of an array of n elements, assuming for simplicity that the elements are all distinct. Group each array with its reverse. The total number of inversions which appear in any array and its reverse is exactly n*(n–1)/2. The averaging, considering only this pair, is n*(n–1)/4. So, since this is true for every array/reverse pair, we conclude the average number of inversions in an array is n*(n–1)/4 n2/4.

Each swap in insertion sort removes one inversion, thus the average number of swaps ≈ n2/4. Since each swap was preceded by a comparison, this number also reflects the number of comparisons, yielding an total average cost of n2/2, which is quadratic of the same order constant as selection sort.

Stability

Both selection sort and insertion sort are stable sorting algorithms. The element comparison used in each is crucial for maintaining the stability: If we were to change "<" to "<=", or ">" to ">=", then equal elements would be reverse-ordered in the sorted output relative to the original array.

The other positive effect of stability is that the necessary usage of comparions makes the code slightly more efficent in selecttionSort and considerably more efficient in insertionSort. Changing ">" to ">=" in insertionSort would create more comparisons and data moves when the array has a significant number of "equals."

Counting Operations

Counting is done by inserting "counting code" into the regular sort routines. We need to count 1 for every comparison, 1 for each element assignment. In particular, count 3 for each swap. A Counter object is initialized inside the Main class:
Counter counter = new Counter();
and then made available for usage by the algorithm by the function:
count.setCounter( counter );
A single comparison is registered by the execution:
incComparisons(1);
A single data movement is registered by the execution:
incDataMoves(1);
Here are the details of the usage with counting code inserted into selectionSort and insertionSort:
( click to show )
The actual incrementing only takes place if counter is non-null, giving us the flexibility of easily turning counting off and on as desired. Additionally, we can move counter objects in and out to suit our needs.


© Robert M. Kline