Sorting algorithms every developer and CS student should know.

·

8 min read

Sorting algorithms

bubble sort

type: in-place complexity: worst case: O(n^2) comparisons, O(n^2) swaps.

  • Repeatedly looping through the list and comparing adjacent elements.
  • If the adjacent elements are unsorted, they are swapped with each other.
  • With each iteration, nth element is compared with (n+1)th element, if they are out of order, they are swapped.
  • Version 1: here the elements are compared even though the array is already sorted. Thus, the inner if statement would run (n-1) * approx(n - 1) times, no matter what.
  • Version 2: here we are keeping a variable to know whether we did any swapping inside the innermost loop. Only then, we go to next loop. If we didn't do any swapping in the innermost loop, it shows that the array is sorted, so there is no use going through remaining iterations.
//version 1 using bounds of lists : hi, lo
void bubblesort(int *arr, int lo, int hi) {
    //n - 1 pass
    for (int i = lo; i < hi; i++) {
        for (int j = lo; j < (hi - i); j++) {
            if (arr[j] > arr[j + 1]) {
                swap(&arr[j], &arr[j + 1]);
            }
        }
    }
}
//version 2 wikipedia version
void nbubble(int *arr, int size) {
    int swapped = 0;
    do {
        swapped = 0;
        for (int i = 1; i < size - 1; i++) {
            if (arr[i] > arr[i + 1]) {
                swap(&arr[i], &arr[i + 1]);
                swapped = 1;
            }
        }
        size--;
    } while (swapped);
}

selection sort

type: in-place complexity: worst-case: O(n^2) comparisons, O(n) swaps

  • The algorithm divides the input list into two parts: a sorted sublist of items which is built up from left to right at the front (left) of the list and a sublist of the remaining unsorted items that occupy the rest of the list.
  • Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest (or largest, depending on sorting order) element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted element (putting it in sorted order), and moving the sublist boundaries one element to the right.
void selectionsort(int *arr, int lo, int hi) {
    int n = 0;
    //only n-1 passes
    for (int i = lo; i < hi; i++) {
        int minindex = i;
        for (int j = i + 1; j <= hi; j++) {
            if (arr[j] < arr[minindex]) {
                minindex = j;
            }
        }
        //swap only when the element is out of place
        if (minindex != i) {
            swap(&arr[i], &arr[minindex]);
            n++;
        }
    }
    printf("%d swaps\n", n);
}

insertion sort

type: in-place complexity: worst-case: O(n^2) comparisons, O(n^2) swaps

  • This is similar to selection sort, where it builds a sorted array from left till the end.
  • Insertion sort iterates, consuming one input element each repetition, and grows a sorted output list. At each iteration, insertion sort removes one element from the input data, finds the location it belongs within the sorted list, and inserts it there. It repeats until no input elements remain.
  • Sorting is done in-place, by iterating up the array, growing the sorted list behind it. At each array-position, it checks the value there against the largest value in the sorted list (which happens to be next to it, in the previous array-position checked). If larger, it leaves the element in place and moves to the next. If smaller, it finds the correct position within the sorted list, shifts all the larger values up to make a space, and inserts into that correct position.

Optimized version prevents one extra write each inner loop.

void insersion_sort(int *arr, int size) {
  for (int i = 1; i < size; i++) {
    for (int j = i; j > 0 && arr[j - 1] > arr[j]; j--) {
      swap(&arr[j - 1], &arr[j]);
    }
  }
}

void insersion_sort_opt(int *arr, int size) {
    for (int i = 1; i < size; i++) {
        int temp = arr[i], j = i - 1;
        for (; j >= 0 && arr[j] > temp; j--) {
            arr[j + 1] = arr[j];
        }
        arr[j + 1] = temp;
    }
}

quick sort

type: in-place, divide and conquer. complexity: worst case: O(n^2) Quick sort is a divide and conquer algorithm.

  • A pivot element is selected first. It can be any randomly chosen element. Most implementations use the last element of the subarray.
  • The array is divided into two subarrays, one containing elements less than pivot and the other containing elements greater than pivot; equal elements can be added to either side(the array is re-ordered in this way). Now, pivot will be at its sorted position, and index of pivot is returned.
  • The above process is called on subarrays from low → pivot - 1 and pivot + 1 → high. The partition procedure is implemented in different ways. The one shown below is less efficient Lumato's partition scheme, but easier to follow.
int partition(int *arr, int lo, int hi) {
    int i = lo - 1;
    //choosing the pivot element
    int pivot = arr[hi];
    for (int j = lo; j < hi; j++) {
        if (arr[j] < pivot) {
            i++;
            swap(&arr[i], &arr[j]);
        }
    }
    i++;
    swap(&arr[i], &arr[hi]);
    return i;
}

void quicksort(int *arr, int lo, int hi) {
    if (lo < hi) {
        int pivot = partition(arr, lo, hi);
        quicksort(arr, lo, pivot - 1);
        quicksort(arr, pivot + 1, hi);
    }
}

merge sort

type: not in-place(need O(n) space), divide and conquer. complexity: worst case: O(n logn)

  • Recursively divide the list into sublists. The list is divided until the sublists of length 1 is achieved.
  • Repeatedly merge sublists to produce new sorted sublists until there is only one sublist remaining, that is the original list but sorted.
  • The whole algorithm consists of two routines. An array dividing routine which recursively calls itself and calls the merging routine.
  • The merging routine is the heart of the algorithm. It merges two sub arrays into a sorted subarray.
  • The merge routine takes an array and the indexes of two sub arrays as its argument. Here, it takes the lower-bound(l), mid(m), upper-bound.
  • l->m is considered as one array, and rest from m+1→h is another subarray.
  • We need to keep in mind that those to sub arrays are merged in place. The end result is we get sorted subarray from l → h.
  • We can see in the code that those two sub-arrays are copied into two separate new arrays before and then are copied in sorted order to the main array.
  • We can see that the merge routine is at the end after two recursive calls to itself in the main msort routine. That is because the array is subdivided until the subarray length is one, and then the merge routine is called.
  • The merging routine deals with subarrays from length 1 to length n/2 at which it merges them to give a fully sorted array.
void merge(int *arr, int l, int m, int h) {
    //finding two temperory array sizes
    int ls = m - l + 1;
    int rs = h - m;
    //creating temp arrays
    int L[ls];
    int R[rs];
    //copying elements to subarrays
    for (int i = l, q = 0; i <= m; i++) {
        L[q++] = arr[i];
    }
    for (int i = m + 1, q = 0; i <= h; i++) {
        R[q++] = arr[i];
    }
    //merging arrays
    int v1 = 0, v2 = 0, k = l;
    //putting them back into original array by comparing elements
    while (v1 < ls && v2 < rs) {
        if (L[v1] <= R[v2]) {
            arr[k++] = L[v1++];
        } else {
            arr[k++] = R[v2++];
        }
    }
    //adding remaining elements if subarrays are of unequal size
    while (v1 < ls) {
        arr[k++] = L[v1++];
    }
    while (v2 < rs) {
        arr[k++] = R[v2++];
    }
}
void msort(int *arr, int l, int h) {
    if (l < h) {
        int mid = (l + h)/2;
        msort(arr, l, mid);
        msort(arr, mid+1, h);
        merge(arr, l, mid, h);
    }
}

shell sort

type: in-place complexity: depends on the choice of gap sequence

  • Shellsort is an optimization of insertion sort that allows the exchange of items that are far apart.
  • The idea is dividing a list into sublists whose elements are at a fixed distance from each other, and sorting them.
  • This fixed distance is reduced according to the sequence of your choice(ex: n/2, n/4, ... 1) till the distance is only 1, which reduces the sorting procedure to normal insertion sort.
  • This algorithm sorts faster than insertion sort because the far apart elements are sorted initially, leaving less work for lower distance passes.
  • Loop 1: for choosing sequences.
  • Loop 2: this is where shell sort seem a bit confusing. There is offset variable which increments by one each loop. In the next loop there is i variable which increments by the value of gap from the loop 1. The sorting has to run on elements offset, offset + gap, offset + 2*gap...up to ≤ size of list. In the next loop, offset is incremented by one, now it will be (offset+1)+gap, (offset+1)+2*gap .... up to ≤ size.
    • Thus, the loop 2 runs for gap number of times each gap.
  • Loop 3 and 4: they are loops for insertion sort on: offset, offset+gap, offset+2*gap... ≤ size elements. When the gap reaches 1, it is nothing more than a vanilla insertion sort, but there is less swapping since the elements are mostly sorted by previous gap passes.
  • The choice of gap sequence heavily affects the running time of the algorithm. The one chosen here is called Ciura gap sequence.
  • Any sequence can be chosen, provided the last element is 1.
int spseq[8] = {701, 301, 132, 57, 23, 10, 4, 1};
void shellsort(int *arr, int lo, int hi) {
    //for getting gaps
    //int siz = hi - lo + 1;
    //size/2, size/4, ... 1 sequence
    //for (int x = siz/2; x >=  1; x = x/2) {
    for (int x = 0; x <  8; x++) { //loop 1
        //takes up element group for a gap
        for (int offset = lo; offset < spseq[x]; offset++) {//loop 2
            //for jumping between element groups
            for (int i = offset + spseq[x]; i <= hi; i+= spseq[x]) {//loop 3
                int j = i - spseq[x];
                int temp = arr[i];
                for (; j >= lo && arr[j] > temp; j -= spseq[x]) { //loop 4
                    arr[j + spseq[x]] = arr[j];
                }
                arr[j + spseq[x]] = temp;
            }
        }
    }
}