Sorting Algorithms

Friday, May 8

In today's lecture we will delve into sorting algorithms, a group of real-world algorithms with wide-reaching applications across computer science.

📚 Readings: Text 10.1-10.5
📝 Lecture quiz on Canvas

Contents

1. Overview

2. Selection Sort and Insertion Sort

3. Comparison-Heavy vs. Swap-Heavy

4. Other Trade-offs

5. Merge Sort

6. Midpoint Formula (and Integer Overflow)

7. Some Downsides to Merge Sort

8. Implementations

9. Runtime Comparisons

10. Aside: Quicksort

11. What's next?

12. Exam Prep

Overview

Attachment: sorting-stuff.zip

We talked about three sorting algorithms today: selection sort, insertion sort, and merge sort. The slides and code for these sorting algorithms are included in the zip file attached above. A summary of key points is included below.

Selection Sort and Insertion Sort

The first two sorting algorithms we talked about today were selection sort and insertion sort. The slides attached above walk through how the algorithms work. Here are some key points I made about each algorithm along the way:

Selection Sort
- Overview: Each pass "selects" the smallest elements and then swaps it down into the first position of the unsorted portion of the vector.
- Worst-case runtime: O(n²). The first pass examines n elements, then performs a single swap, the next examines (n - 1) elements before performing a single swap, and so on. The number of comparisons is represented with the Gauss sum we keep seeing: 1 + 2 + ... + n = n * (n + 1) / 2, which is O(n²).
- Best-case runtime: O(n²). Every call to selection sort follows the pattern described above for the worst-case runtime.
- Operations: Heavy on comparisons, light on swaps. I described this as the "wait and see" algorithm. It spends a lot of time comparing elements in each pass before settling on which one to swap into the next target position.
Insertion Sort
- Overview: Each pass takes the first element from the unsorted portion of the vector and "inserts" it into the sorted portion by dragging it down until it either encounters a smaller value smaller to its left or reaches the beginning of the vector.
- Worst-case runtime: O(n²). The first pass could move an element down one position, the second could move an element down two positions, the third could move an element down three positions, and so on. It's that Gauss sum once again. The last pass of insertion sort will only move an element down by (n - 1) positions at most, so the sum is slightly modified, but the big-oh result is the same: 1 + 2 + ... + (n - 1) = (n - 1) * n / 2, which is O(n²).
- Best-case runtime: O(n). This is a vast improvement over selection sort's best-case runtime! Insertion sort encounters this runtime if it receives an already-sorted vector. In that case, each pass stops after a single comparison without having to do any swaps.
- Operations: Heavy on swaps, potentially light on comparisons. I described this as the "let's gooooooooo!" algorithm. It picks up an element at the beginning of each pass and drags it down through the vector, performing a swap at each index it passes. (*swap!* *swap!* *swap!* *swap!*) A pass of insertion sort stops as soon as it encounters an element smaller than the one it's dragging down, though, meaning that it could potentially stop each pass early and end up doing fewer comparisons than selection sort. Neat.

Recall that even though these algorithms are not the fastest ones we'll see this quarter, one of the reasons we study them in CS106B is so you'll see lots of different algorithmic approaches to the same problem. The idea here is that the more algorithms you see, the more likely you are to ramp up your algorithmic thinking and come up with your own algorithms to solve any new problems you encounter in the future. These algorithms also provided us with reasonable big-oh analysis exercises, and they gave us an opportunity to explore the kind of comparative analysis we often do when we have multiple solutions to the same problem in CS, as we discussed the comparative advantages and disadvantages of these two algorithms.

Comparison-Heavy vs. Swap-Heavy

I gave a few examples in class of situations in which we might prefer one of these algorithms over the other. See the lecture slides for a recap. Specifically, we talked about the following examples:

Moving refrigerators: if we want to sort a bunch of refrigerators in a warehouse and line them up from lowest price to highest price, the comparison is really easy, but the actual movement is very costly.
Fastest runner: if we want to figure out who is the fastest runner within a group of people by having individuals race one another, the actual comparison is costly and exhausting for the runners involved, but swapping two people in rank is very easy in comparison.
Computational biology: costly comparisons with expensive simulations that we run on very large sets of genomic data, but perhaps our vector only holds a small, unique, numeric ID for each of those sequences, so the swaps are actually not costly at all.
Student records: low-cost comparisons between unique, numeric student IDs, but perhaps we have a ton of data associated with each student record, and so the write operations are someone intensive compared to our comparison operations.

Other Trade-offs

A few other trade-offs we discussed with these algorithms included:

Consistency vs. ultra-fast performance in a limited set of special cases.

If you know you're working on an application where you know that you often encounter special cases that trigger a particular algorithm's best-case runtime, you might want to select that algorithm over other candidates. For example, if you're working on an application where the vectors you receive are usually sorted already, you might want to use insertion sort over selection sort to capitalize on insertion sort's best-case runtime of O(n).
If you're working on an application where runtime predictability is key, or where you need to stay under a particular runtime for a certain operation, you might choose an algorithm whose runtime is very consistent over one that has an amazing best-case runtime in special cases but exceeds your runtime limits in others.

Ease of coding and debugging vs. performance.

Do you need to quickly code up a proof-of-concept that gets a task done before a meeting with your research advisor? You might choose an algorithm that is more straightforward to implement and whose logic is really easy to follow, rather than trying to code up a slightly faster algorithm that is really tricky and hard to understand.
Do you want to hyper-optimize a piece of code because you're working on an application that needs a particular task to be accomplished as quickly as possible? It might be worth it to spend time implementing and testing a really complicated algorithm if that leads to runtime gains compared to an algorithm that would have been easier to implement but slower to run.

Merge Sort

We then saw merge sort, whose O(n log n) runtime is a drastic improvement over the sorting algorithms above. For a review of how merge sort works, see the slides attached toward the top of today's lecture notes.

The runtime for merge sort is revealed in the tree diagram we used to trace through its recursive calls:

The runtime for the work being performed at each individual layer of the diagram is O(n). At the top layer, we have an O(n) merge operation to merge the left and right halves of the array. At the bottom layer, we have n base cases, each of which take O(1) time to execute -- so, a total runtime of O(n) for all of the base cases in that btotom layer. In the layers between, the runtimes of the repeated but smaller merge operations add up to O(n) for each layer.
The number of layers in the diagram is approximately log₂(n) + 1, which is O(log n).
Multiplying the number of layers by the runtime associated with each layer gives us our overall runtime of O(n log n).

This O(n log n) runtime is awesome! If n ≈ 1,000,000,000 (that's 1 BILLlIionnNnnn), the question of an O(n²) runtime vs. an O(n log n) runtime is a question of doing on the order of 1,000,000,000 * 1,000,000,000 = 10¹⁸ steps of work or just 30 * 1,000,000,000 (30 billion) steps of work. That is an ENORMOUS difference.

Midpoint Formula (and Integer Overflow)
Not mentioned in class. Worth reviewing briefly. The material in this section won't be tested on the final exam, but if you're planning to go into software engineering as a career, this is very much a detail you want to be familiar with, as it could come up in technical interviews, and it could be relevant to applications you develop in the future.

While talking about merge sort, I also brought up that the midpoint formula mid = (hi + lo) / 2 is susceptible to integer overflow -- something that was mentioned earlier this quarter when talking about binary search. Recall that the algebraically equivalent formula that is not susceptible to that problem: mid = lo + (hi - lo) / 2.

See the notes from the binary search lecture for more on integer overflow. This might seem a bit pedantic, but it has been known to come up in tech interviews, and this is an important detail to have in mind when developing software that has to process potentially very large integer values in the real world.

Note that we can observe the effects of integer overflow (and underflow) and get a program to spit out the max and min values for an int like so:

#include <iostream>
#include "console.h"
using namespace std;

int main()
{
   // 2 billion
   int i = 2000000000;

   while (i > 0)
   {
      i++;
   }

   // When we get here, we will have just overflowed. That means we expect i to be
   // the smallest integer C++ can store in an int type.
   cout << i << endl;

   // If we subtract 1 from i, we encounter integer underflow and expect to get the
   // largest integer C++ can store in an int type.
   cout << (i - 1) << endl;

   return 0;
}

output:

-2147483648
2147483647

Some Downsides to Merge Sort
Not mentioned in class. Worth reviewing briefly. The material in this section won't be tested on the final exam, but it's good to be aware of these tradeoffs.

So, that O(n log n) runtime is looking super awesome. So, why would we ever consider using selection or insertion sort over merge sort?! Well, merge sort does have a few drawbacks:

It takes up extra space for the auxiliary vector used when merging the two halves back together. Tight on space? Merge sort might not be the right algorithm for you. The recursive calls to merge sort also take up space in the call stack.
Speaking of the call stack, all the recursive calls we make to merge sort take time! We have to put a new call on our program stack, instantiate local variables, and then almost immediately to all of that over again to make two more recursive calls. The overhead involved with those recursive calls is significant enough that for small(ish) vectors, selection and insertion sort can actually end up out-performing merge sort.
As mentioned earlier, the ease of coding up and debugging an algorithm can also play a role in whether it's worthwhile to use it for a particular application. Sometimes when we're trying to rapidly prototype software or run a quick one-off experiment, we just need a quick and dirty solution to some sub-problem we've encountered, and it's a better use of our time to implement a slow but straightforward and easy-to-debug algorithm than one that is more complex, more difficult to implement, and more likely to have bugs in obscure corner cases that we don't think to test.

Implementations

Below are my implementations of these three sorting algorithms. See also the file attached above, which has complete implementations and some functions for testing.

Selection Sort:

void swap(Vector<int>& v, int i, int j);

void selectionSort(Vector<int>& v)
{
    for (int start = 0; start < v.size(); start++)
    {
        int indexOfMin = start;

        for (int j = start + 1; j < v.size(); j++)
        {
            if (v[j] < v[indexOfMin])
            {
                indexOfMin = j;
            }
        }

        swap(v, start, indexOfMin);
    }
}

// Swaps values at indices i and j in given vector. Assumes indices are valid.
void swap(Vector<int>& v, int i, int j)
{
    int temp = v[i];
    v[i] = v[j];
    v[j] = temp;
}

Insertion Sort:

void insertionSort(Vector<int>& v)
{
    for (int start = 1; start < v.size(); start++)
    {
        int peach = v[start];
        int gap;

        for (gap = start; gap > 0 && peach < v[gap - 1]; gap--)
        {
            v[gap] = v[gap - 1];
        }

        v[gap] = peach;
    }
}

Merge Sort:

void mergeSort(Vector<int>& v, int lo, int hi)
{
    // Base case: one (or fewer) elements in this portion of the vector.
    if (lo >= hi)
    {
        return;
    }

    int mid = lo + (hi - lo) / 2;

    // Sort left and right halves recursively.
    mergeSort(v, lo, mid);
    mergeSort(v, mid + 1, hi);

    // Merge results back together in auxiliary vector.
    Vector<int> aux;

    int i = lo;
    int j = mid + 1;

    while (i <= mid || j <= hi)
    {
        if (i > mid)
        {
            // If i is invalid, j must be valid.
            aux.add(v[j]);
            j++;
        }
        else if (j > hi)
        {
            // If j is invalid, i must be valid.
            aux.add(v[i]);
            i++;
        }
        else if (v[i] < v[j])
        {
            // If both i and j are valid, we check whether v[i] < v[j] or not.
            // We want the smaller value.
            aux.add(v[i]);
            i++;
        }
        else
        {
            // This is the case where i and j are valid, and v[j] <= v[i].
            aux.add(v[j]);
            j++;
        }
    }

    // Copy everything from the auxiliary vector back into the original.
    for (i = lo; i <= hi; i++)
    {
        v[i] = aux[i - lo];
    }
}

Runtime Comparisons

Check out this runtime comparison I performed, which shows both the dominance of insertion sort over merge sort for small values of n (≤ 100) and the extreme dominance of merge sort over both selection and insertion sort for larger values of n. These results are from randomized vectors and are averaged over several trials for each algorithm and each value of n. Recall that merge sort is a bit slower with very small vectors because of the overhead associated with all the recursive calls it has to make.

Vector Length	AVERAGE RUNTIME PER TRIAL
Vector Length	Selection Sort	Insertion Sort	Merge Sort
10	0.001076 ms	0.000793 ms	0.001027 ms
100	0.014081 ms	0.007409 ms	0.007910 ms
200	0.048799 ms	0.024599 ms	0.016929 ms
1,000	1.042306 ms	0.555023 ms	0.101754 ms
5,000	25.808860 ms	13.696100 ms	0.592700 ms
10,000	99.094250 ms	54.985860 ms	1.275500 ms
50,000	2433.940600 ms	1375.351200 ms	7.449200 ms

Aside: Quicksort
Not mentioned in class. Worth reviewing briefly.

The merge sort algorithm we saw today is an "easy split, hard join" sorting algorithm. The split is really easy: we just make two recursive calls. Joining the two sorted halves together is the more complicated part of the algorithm.

There's another very popular sorting algorithm called quick sort that does rather the opposite: it's a "hard split, easy join" sorting algorithm. The split involves an O(n) partitioning algorithm that feels a bit unruly when you first see it, but the join is super simple: you just return from your recursive calls, and that's it; the join happens as a natural consequence of the recursive nature of the algorithm.

You will encounter quicksort a bit later in our curriculum if you end up taking CS161, but you could also read up on it in your spare time if you're interested in seeing it.

What's next?

Next week, we'll delve into a data structure that relies heavily on pointers and dynamic memory management: linked lists.

Exam Prep

1. Code up all the sorting algorithms we talked about today from scratch! There's some subtlety to each of those algorithms that might be worth confronting in code, and you have my implementations to refer to if you get stuck

2. After implementing the above sorting algorithms (or downloading my implementations attached at the top of today's notes, if you prefer), use the TIME_OPERATION macro from our testing framework to get a handle on the runtimes for each of those algorithms on vectors of different sizes, and see if you can experimentally verify their O(n²) and O(n log n) runtimes.