Sorting Algorithms

Tuesday, July 15

In today's lecture we will delve into sorting algorithms, a group of real-world algorithms with wide-reaching applications across computer science.

📚 Readings: Text 10.1-10.5
📝 Lecture quiz on Canvas
🎬 Lecture video on Canvas [toggle visibility]

Contents

1. Overview

2. Selection Sort and Insertion Sort

3. Comparison-Heavy vs. Swap-Heavy

4. Other Trade-offs

5. Merge Sort

6. Midpoint Formula (and Integer Overflow)

7. Some Downsides to Merge Sort

8. Implementations

9. Runtime Comparisons

10. Aside: Quicksort

11. What's next?

12. Exam Prep

Overview

Attachment: sorting-stuff.zip

We talked about three sorting algorithms today: selection sort, insertion sort, and merge sort. The slides and code for these sorting algorithms are included in the zip file attached above. A summary of key points is included below.

Selection Sort and Insertion Sort

The first two sorting algorithms we talked about today were selection sort and insertion sort. The slides attached above walk through how the algorithms work. Here are some key points I made about each algorithm along the way:

Selection Sort
- Overview: Each pass "selects" the smallest elements and then swaps it down into the first position of the unsorted portion of the vector.
- Worst-case runtime: O(n²). The first pass examines n elements, then performs a single swap, the next examines (n - 1) elements before performing a single swap, and so on. The number of comparisons is represented with the Gauss sum we keep seeing: 1 + 2 + ... + n = n * (n + 1) / 2, which is O(n²).
- Best-case runtime: O(n²). Every call to selection sort follows the pattern described above for the worst-case runtime.
- Operations: Heavy on comparisons, light on swaps. I described this as the "wait and see" algorithm. It spends a lot of time comparing elements in each pass before settling on which one to swap into the next target position.
Insertion Sort
- Overview: Each pass takes the first element from the unsorted portion of the vector and "inserts" it into the sorted portion by dragging it down until it either encounters a smaller value smaller to its left or reaches the beginning of the vector.
- Worst-case runtime: O(n²). The first pass could move an element down one position, the second could move an element down two positions, the third could move an element down three positions, and so on. It's that Gauss sum once again. The last pass of insertion sort will only move an element down by (n - 1) positions at most, so the sum is slightly modified, but the big-oh result is the same: 1 + 2 + ... + (n - 1) = (n - 1) * n / 2, which is O(n²).
- Best-case runtime: O(n). This is a vast improvement over selection sort's best-case runtime! Insertion sort encounters this runtime if it receives an already-sorted vector. In that case, each pass stops after a single comparison without having to do any swaps.
- Operations: Heavy on swaps, potentially light on comparisons. I described this as the "let's gooooooooo!" algorithm. It picks up an element at the beginning of each pass and drags it down through the vector, performing a swap at each index it passes. (*swap!* *swap!* *swap!* *swap!*) A pass of insertion sort stops as soon as it encounters an element smaller than the one it's dragging down, though, meaning that it could potentially stop each pass early and end up doing fewer comparisons than selection sort. Neat.

Recall that even though these algorithms are not the fastest ones we'll see this quarter, one of the reasons we study them in CS106B is so you'll see lots of different algorithmic approaches to the same problem. The idea here is that the more algorithms you see, the more likely you are to ramp up your algorithmic thinking and come up with your own algorithms to solve any new problems you encounter in the future. These algorithms also provided us with reasonable big-oh analysis exercises, and they gave us an opportunity to explore the kind of comparative analysis we often do when we have multiple solutions to the same problem in CS, as we discussed the comparative advantages and disadvantages of these two algorithms.

Comparison-Heavy vs. Swap-Heavy

I gave a few examples in class of situations in which we might prefer one of these algorithms over the other. See timestamp 36:05 through 49:15 in today's lecture video for a recap. There, we talked about:

Moving refrigerators: if we want to sort a bunch of refrigerators in a warehouse and line them up from lowest price to highest price, the comparison is really easy, but the actual movement is very costly.
Fastest runner: if we want to figure out who is the fastest runner within a group of people by having individuals race one another, the actual comparison is costly and exhausting for the runners involved, but swapping two people in rank is very easy in comparison.
Computational biology: costly comparisons with expensive simulations that we run on very large sets of genomic data, but perhaps our vector only holds a small, unique, numeric ID for each of those sequences, so the swaps are actually not costly at all.
Student records: low-cost comparisons between unique, numeric student IDs, but perhaps we have a ton of data associated with each student record, and so the write operations are someone intensive compared to our comparison operations.

Other Trade-offs

A few other trade-offs we discussed with these algorithms included:

Consistency vs. ultra-fast performance in a limited set of special cases.
Ease of coding and debugging vs. performance.

Merge Sort

We then saw merge sort, whose O(n log n) runtime is a drastic improvement over the sorting algorithms above. For a review of how merge sort works and a discussion of the runtime, see the slides attached toward the top of today's lecture notes, as well as the segment starting at timestamp 56:35 in today's lecture. A discussion of the runtime begins at timestamp 1:05:35.

Midpoint Formula (and Integer Overflow)

While talking about merge sort, I also brought up that the midpoint formula mid = (hi + lo) / 2 is susceptible to integer overflow -- something we saw earlier this quarter when talking about binary search. Recall that the algebraically equivalent formula that is not susceptible to that problem: mid = lo + (hi - lo) / 2.

See the notes from the binary search lecture for more on integer overflow. This might seem a bit pedantic, but it has been known to come up in tech interviews.

(Not mentioned in class. Worth reviewing briefly.) Note that we can observe the effects of integer overflow (and underflow) and get a program to spit out the max and min values for an int like so:

#include <iostream>
#include "console.h"
using namespace std;

int main()
{
   // 2 billion
   int i = 2000000000;

   while (i > 0)
   {
      i++;
   }

   // When we get here, we will have just overflowed. That means we expect i to be
   // the smallest integer C++ can store in an int type.
   cout << i << endl;

   // If we subtract 1 from i, we encounter integer underflow and expect to get the
   // largest integer C++ can store in an int type.
   cout << (i - 1) << endl;

   return 0;
}

output:

-2147483648
2147483647

Some Downsides to Merge Sort
Not all of these bullet points were mentioned in class. Worth reviewing briefly.

So, that O(n log n) runtime is looking super awesome. So, why would we ever consider using selection or insertion sort over merge sort?! Well, merge sort does have a few drawbacks:

It takes up extra space for the auxiliary vector used when merging the two halves back together. Tight on space? Merge sort might not be the right algorithm for you. The recursive calls to merge sort also take up space in the call stack.
Speaking of the call stack, all the recursive calls we make to merge sort take time! We have to put a new call on our program stack, instantiate local variables, and then almost immediately to all of that over again to make two more recursive calls. The overhead involved with those recursive calls is significant enough that for small(ish) vectors, selection and insertion sort can actually end up out-performing merge sort.
As mentioned earlier, the ease of coding up and debugging an algorithm can also play a role in whether it's worthwhile to use it for a particular application. Sometimes when we're trying to rapidly prototype software or run a quick one-off experiment, we just need a quick and dirty solution to some sub-problem we've encountered, and it's a better use of our time to implement a slow but straightforward and easy-to-debug algorithm than one that is more complex, more difficult to implement, and more likely to have bugs in obscure corner cases that we don't think to test.

Implementations

Below are my implementations of these three sorting algorithms. See also the file attached above, which has complete implementations and some functions for testing.

Selection Sort:

void swap(Vector<int>& v, int i, int j);

void selectionSort(Vector<int>& v)
{
    for (int start = 0; start < v.size(); start++)
    {
        int indexOfMin = start;

        for (int j = start + 1; j < v.size(); j++)
        {
            if (v[j] < v[indexOfMin])
            {
                indexOfMin = j;
            }
        }

        swap(v, start, indexOfMin);
    }
}

// Swaps values at indices i and j in given vector. Assumes indices are valid.
void swap(Vector<int>& v, int i, int j)
{
    int temp = v[i];
    v[i] = v[j];
    v[j] = temp;
}

Insertion Sort:

void insertionSort(Vector<int>& v)
{
    for (int start = 1; start < v.size(); start++)
    {
        int peach = v[start];
        int gap;

        for (gap = start; gap > 0 && peach < v[gap - 1]; gap--)
        {
            v[gap] = v[gap - 1];
        }

        v[gap] = peach;
    }
}

Merge Sort:

void mergeSort(Vector<int>& v, int lo, int hi)
{
    // Base case: one (or fewer) elements in this portion of the vector.
    if (lo >= hi)
    {
        return;
    }

    int mid = lo + (hi - lo) / 2;

    // Sort left and right halves recursively.
    mergeSort(v, lo, mid);
    mergeSort(v, mid + 1, hi);

    // Merge results back together in auxiliary vector.
    Vector<int> aux;

    int i = lo;
    int j = mid + 1;

    while (i <= mid || j <= hi)
    {
        if (i > mid)
        {
            // If i is invalid, j must be valid.
            aux.add(v[j]);
            j++;
        }
        else if (j > hi)
        {
            // If j is invalid, i must be valid.
            aux.add(v[i]);
            i++;
        }
        else if (v[i] < v[j])
        {
            // If both i and j are valid, we check whether v[i] < v[j] or not.
            // We want the smaller value.
            aux.add(v[i]);
            i++;
        }
        else
        {
            // This is the case where i and j are valid, and v[j] <= v[i].
            aux.add(v[j]);
            j++;
        }
    }

    // Copy everything from the auxiliary vector back into the original.
    for (i = lo; i <= hi; i++)
    {
        v[i] = aux[i - lo];
    }
}

Runtime Comparisons

Check out this runtime comparison I performed, which shows both the dominance of insertion sort over merge sort for small values of n (≤ 100) and the extreme dominance of merge sort over both selection and insertion sort for larger values of n. These results are from randomized vectors and are averaged over several trials for each algorithm and each value of n:

Vector Length	AVERAGE RUNTIME PER TRIAL
Vector Length	Selection Sort	Insertion Sort	Merge Sort
10	0.001076 ms	0.000793 ms	0.001027 ms
100	0.014081 ms	0.007409 ms	0.007910 ms
200	0.048799 ms	0.024599 ms	0.016929 ms
1,000	1.042306 ms	0.555023 ms	0.101754 ms
5,000	25.808860 ms	13.696100 ms	0.592700 ms
10,000	99.094250 ms	54.985860 ms	1.275500 ms
50,000	2433.940600 ms	1375.351200 ms	7.449200 ms

Aside: Quicksort
Not mentioned in class. Worth reviewing briefly.

The merge sort algorithm we saw today is an "easy split, hard join" sorting algorithm. The split is really easy: we just make two recursive calls. Joining the two sorted halves together is the more complicated part of the algorithm.

There's another very popular sorting algorithm called quick sort that does rather the opposite: it's a "hard split, easy join" sorting algorithm. The split involves an O(n) partitioning algorithm that feels a bit unruly when you first see it, but the join is super simple: you just return from your recursive calls, and that's it; the join happens as a natural consequence of the recursive nature of the algorithm.

You will be implementing quicksort in an assignment later this quarter, and you'll encounter it again a bit later in our curriculum if you end up taking CS161.

What's next?

Tomorrow (Wednesday), we'll talk a bit about expectations for the midterm exam and possibly solve a few more problems to reinforce some of the topics we've been exploring this quarter.

There is no lecture Thursday, and your midterm exam is this Friday.

Exam Prep

1. Code up all the sorting algorithms we talked about today from scratch! There's some subtlety to each of those algorithms that might be worth confronting in code, and you have my implementations to refer to if you get stuck

2. After implementing the above sorting algorithms (or downloading my implementations attached at the top of today's notes, if you prefer), use the TIME_OPERATION macro from our testing framework to get a handle on the runtimes for each of those algorithms on vectors of different sizes, and see if you can experimentally verify their O(n²) and O(n log n) runtimes.