Section 7. Hash Tables and Linked Lists

Section materials curated by Jonathan Coronado, drawing upon materials from previous quarters.

Grab a pencil and paper, or a whiteboard, or a reed and papyrus when you’re reading over these solutions. The pointer juggling here is intricate but beautiful, like a delicate flower. If you work through everything one line at a time, you’ll get a much deeper sense for what’s going on here. Simply reading over this code and trying to keep track of everything in your head is not likely to work out very well.

📦 Starter code

Hash Tables

Problem One: Linear Probing Review

In this exercise, we'll assume we're working with a linear probing hash table with ten slots that store ints. Our hash function will be to take the last digit of the number to hash. (This is a terrible hash function, by the way, but it'll make it easier to place things in the table.)

Insert the following ten values, in the order they're listed, into an initially-empty linear probing table with ten slots.

16, 44, 93, 40, 66, 84, 82, 26, 43, 64
Solution
The final table looks like this:
Value: 40 64 82 93 44 84 16 66 26 43 Index: 0 1 2 3 4 5 6 7 8 9

Show what the table looks like after deleting 16, 93, and 66.
Solution
Here, T denotes a tombstone.
Value: 40 64 82 T 44 84 T T 26 43 Index: 0 1 2 3 4 5 6 7 8 9

Show what the table looks like after inserting 72, 41, and 51, in that order.
Solution
Value: 40 64 82 72 44 84 41 51 26 43 Index: 0 1 2 3 4 5 6 7 8 9

Problem Two: Linear Probing Sleuthing

In this problem, we will be exploring a linear probing table of integers. The hash code for each integer is formed by taking its last digit; for example, the hash code of 137 is 7, and the hash code of 106 is 6. Empty slots in the table are represented as blank spots, filled slots with the number they contain, and tombstones with the 墓 symbol.

18	37	墓			95	16	5	56	39
[0]	[1]	[2]	[3]	[4]	[5]	[6]	[7]	[8]	[9]

We filled this table in by starting with an empty table and using the standard linear probing algorithm from lecture without any modifications. The specific sequence of commands we issued to the table is shown below in the order in which those commands were performed. As you can see, some of the arguments to the commands have not been given.

Fill in the blanks such that executing the sequence of commands shown below builds the linear probing table shown above. As a hint, you’ll never need to insert something twice, and you’ll never need to remove something that isn’t already in the table. No justification is needed. Incorrect answers will not receive points, but otherwise there is no penalty for an incorrect guess.

Insert 28.
Insert 32.
Insert __.
Remove __.
Insert 18.
Remove __.
Insert 95.
Insert __.
Insert __.
Insert __.
Insert __.

We recommend that you draw out an empty, ten-slot linear probing table like as you’re working through this problem.

Solution

Insert 28.
Insert 32.
Insert 39.
Remove 32.
Insert 18.
Remove 28.
Insert 95.
Insert 16.
Insert 5.
Insert 56.
Insert 37.

To see where this comes from, let’s begin with step 3. We see that, in step 5, we insert 18, and that element ends up at slot 0. The only way that can happen is if slots 8 and 9 are filled in, and so the only item we can insert that would end up in slot 9 is 39.

In step 4, we can similarly reason by a process of elimination. The only elements we could remove would be 32 and 28, since neither end up in the finished table. If we removed 28, then when we inserted 18 it would overwrite the tombstone rather than ending up in slot 0. Therefore, we had to have removed 32. That then means step 6 must have been to remove 28, since 28 doesn’t appear in the final table.

At the point at which we begin doing the insertions for steps 8 and onward, we know that we have to insert 37, 16, 5, and 56 in some order, since they’re the only elements not yet in the table. So what order do they go in? If we insert 37 now, it’ll end up in slot 7, which is the wrong place. Inserting 16 places it in the right place. Inserting 5 would put it into slot 6 (the wrong place), and inserting 56 would put it into slot 6 as well (also wrong). That means 16 has to go first.

Once we’ve done that, we’re left with 37, 5, and 56 to insert. Inserting 37 still will put 37 in the wrong place – for it to end up in slot 0, slots 7, 8, and 9 have to be filled first. If we insert 56 before 5, it’ll end up in the wrong slot. So that means we have to insert 5 to fill slot 6, then 56 to fill slot 7, and finally 37 to fill slot 0.

Pointers and Linked List Mechanics

Problem Three: Tracing Linked List Code

For each of the following diagrams, draw a picture of what the given nodes would look like after the given line of code executes. Does any memory get orphaned as a result of the operations? Assume the Cell type is the same as struct covered in lecture that stores integers.

Given this diagram:

What is the result of executing this code? Is anything orphaned?
```
list->next = new Node;
list->next->value = 3;
```
Solution

Memory now looks like this:

There is an orphaned node containing the value 2.

Given this diagram:

What is the result of executing this code? Is anything orphaned?
```
list->next->next = nullptr;
```
Solution

Memory now looks like this:

There is an orphaned node containing the value 3.

Problem Four: Rewiring Linked Lists

For each of the following diagrams, write the code that will produce the given "after" result from the given "before" starting point by modifying the links between the nodes shown and/or creating new nodes as needed. There may be more than one way to write the code, but do not change the data field of any existing nodes. If a variable doesn't appear in the "after" picture, it doesn't matter what value it has after changes are made.

Solution

Here's one possible solution for image a).

Cell* temp = list->next->next;
temp->next = list->next;
list->next->next = list;
list->next->next->next = nullptr;
list = temp;

Here's one possible solution for image b).

list->next->next->next = list;
list = list->next->next;
Cell* list2 = list->next->next;
list->next->next = nullptr;

Problem Five: Pointers by Reference

One of the trickier nuances of linked lists comes up when we start passing around pointers as parameters by reference. To better understand exactly what that’s all about, trace through the following code and show what it prints out. Also, identify any memory leaks that occur in the program.

void confuse(Cell* list) {
    list->value = 137;
}

void befuddle(Cell* list) {
   list = new Cell;
   list->value = 42;
   list->next = nullptr;
}

void confound(Cell* list) {
    list->next = new Cell;
    list->next->value = 2718;
    list->next->next = nullptr;
}

void bamboozle(Cell*& list) {
    list->value = 42;
}

void mystify(Cell*& list) {
    list = new Cell;
    list->value = 161;
    list->next = nullptr;
}

int main() {
    Cell* list = /* some logic to make the list 1 → 3 → 5 → null */

    confuse(list);
    printList(list); // from lecture

    befuddle(list);
    printList(list);

    confound(list);
    printList(list);

    bamboozle(list);
    printList(list);

    mystify(list);
    printList(list);

    freeList(list); // from lecture
    return 0;
}

Solution

Let’s go through this one step at a time.

The call to confuse updates the first element of the list to store 137, so the call to printList will print out 137, 3, 5.
The call to befuddle takes its argument by value. That means it’s working with a copy of the pointer to the first element of the list, so when we set list to be a new cell, it doesn’t change where the list variable back in main is pointing. The cell created here is leaked, and the next call to printList will print out 137, 3, 5.
The call to confound takes its argument by value. However, when it writes to list->next, it’s following the pointer to the first element of the linked list and changing the actual linked list cell it finds there. This means that the list is modified by dropping off the 3 and the 5 (that memory gets leaked) and replacing it with a cell containing 2718. Therefore, the next call to printList will print out 137, 2718.
The call to bamboozle takes its argument by reference, but notice that it never actually reassigns the next pointer. However, it does change the memory in the cell at the front of the list to hold 42, so the next call to printList will print 42, 2718.
The call to mystify takes its argument by reference and therefore when it reassigns list it really is changing where list back in main is pointing. This leaks the memory for the cells containing 42 and 2718. The variable list back in main is changed to point at a new cell containing 161, so the final call to printList prints 161.
Finally, we free that one-element list. Overall, we’ve leaked a lot of memory!

Linked Lists

Problem Six: Linked List Mechanics

This section handout is almost exclusively about linked lists, so before we jump into some of their applications, let’s start off by reviewing some of the basic mechanics about how they work!

To begin with, let’s imagine we have a linked list of integers. Go and define a Cell struct representing a single cell in the linked list. Then, write a function

int sumOfElementsIn(Cell* list);

that adds up the values of all the elements in the linked list. Write this function two ways – first, do it iteratively; then, do it recursively. Which one did you think was easier to write? Why?

Next, write a function

Cell* lastElementOf(Cell* list);

that returns a pointer to the last element of a linked list (and reports an error if the list is empty). Again, write this function two ways, iteratively and recursively. Which one did you think was easier to write?

Solution

First, we need a Cell structure! Here’s one possibility:

struct Cell {  
    int value;
    Cell* next;
};             

Most linked list cells look more or less the same – they have some data and a pointer to the next element in the list. Here’s two version of the code to sum up the elements of one of these lists:

/* Iterative version */
int sumOfElementsIn(Cell* list) {
    int result = 0;
    for (Cell* curr = list; curr != nullptr; curr = curr->next) {
        result += curr->value;
    }
    return result;
}

/* Recursive version. */
int sumOfElementsIn(Cell* list) {
    /* The sum of the elements in an empty list is zero. */
    if (list == nullptr) return 0;

    /* The sum of the elements in a nonempty list is the sum of the elements in
     * the first cell plus the sum of the remaining elements.
     */
    return list->value + sumOfElementsIn(list->next);
}

And two versions of a function to get the last element of a linked list.

/* Iterative version */
Cell* lastElementOf(Cell* list) {
    if (list == nullptr) error("Empty lists have no last element.");

    /* Loop forward until the current cell’s next pointer is null. That’s the
     * point where the list ends.
     */
    Cell* result = list;
    while (result->next != nullptr) {
        result = result->next;
    }
    return result;
}

/* Recursive version. */
Cell* lastElementOf(Cell* list) {
    /* Base Case 1: The empty list has no last element. */
    if (list == nullptr) error("Nothing can come from nothing.");

    /* Base Case 2: The only element of a one-element list is the last element. */
    if (list->next == nullptr) return list;

    /* Recursive Case: There’s at least two cells in this list. The last element
     * of the overall list is the last element of the list you get when you drop
     * off the first element.
     */
    return lastElementOf(list->next);
}

Problem Seven: Tail Pointers

In lecture, we wrote a function to read a list of values from the user and return a linked list containing those values (and we wrote it two different ways, too!) The iterative version of that function had the odd property that it returned the elements that were read in reverse order, which was a consequence of the fact that we kept adding elements at the front of the list that we’d made.

Write an iterative function

Cell* readList();

that reads a list of values from the user. It should return a linked list containing those values in the order in which they were entered. To make it run in time O(n), where n is the number of elements read, maintain a tail pointer keep track of the very last element in the list.

Solution

Here’s one option, which is closely related to our logic to get the queue to work in worst-case O(1) time. I’m assuming the list holds strings, but really this’ll work for any type as long as there’s a sentinel value:

Cell* readList() {
    Cell* head = nullptr;
    Cell* tail = nullptr; // There is no last element... at least, not yet. :-)

    while (true) {
        string line = getLine("Next entry: ");
        if (line == "") break;

        /* Get the cell basics set up. It always goes on the end, so its next
         * pointer is always null.
         */
        Cell* cell  = new Cell;
        cell->value = line;
        cell->next  = nullptr;

        /* If the list is empty, this is now both the head and the tail. */
        if (head == nullptr) {
            head = tail = cell;
        }
        /* Otherwise, splice this element in right after the tail. */
        else {
            tail->next = cell;
            tail = cell;
        }
    }
    return head;
}

Problem Eight: Double-Ended Queues

This problem concerns a data structure called a double-ended queue, or deque for short (it’s pronounced “deck,” as in a deck of cards). A deque is similar to a stack or queue in that it represents a sequence of elements, except that elements can be added or removed from both ends of the deque. Here is one possible interface for a Deque class:

class Deque {
public:
    Deque();
    ~Deque();

    /* Seems like all containers have the next two functions. :-) */
    bool isEmpty() const;
    int  size() const;

    /* Adds a value to the front or the back of the deque. */
    void pushFront(int value);
    void pushBack(int value);

    /* Looks at, but does not remove, the first/last element of the deque. */
    int peekFront() const;
    int peekBack() const;

    /* Returns and removes the first or last element of the deque. */
    int popFront();
    int popBack();
};

One efficient way of implementing a deque is as a doubly-linked list. The deque stores pointers to the head and the tail of the list to support fast access to both ends. Design the private section of the Deque class, then implement the above member functions using a doubly-linked list. As a hint, this is way easier to do using dummy nodes!

Solution

If you think about it, the logic from the previous problem really nicely lets us build a deque – we have the ability to splice things in anywhere we want and splice things out anywhere we want, which is precisely what we’d need to do here.

This solution uses a doubly-linked list with a dummy head and tail node, as described in Problem Eight.

class Deque {
public:
    Deque();
    ~Deque();

    /* Seems like all containers have the next two functions. :-) */
    bool isEmpty() const;
    int  size() const;

    /* Adds a value to the front or the back of the deque. */
    void pushFront(int value);
    void pushBack(int value);

    /* Looks at, but does not remove, the first/last element of the deque. */
    int peekFront() const;
    int peekBack() const;

    /* Returns and removes the first or last element of the deque. */
    int popFront();
    int popBack();

private:
    struct Cell {
        int value;
        Cell* next;
        Cell* prev;
    };
    Cell* head;
    Cell* tail;
    int   numElems; // Cache for efficiency; makes size() run in time O(1).

    /* Creates a new cell initialized to a given value, but whose next and prev
     * pointers are uninitialized. This cell is intended to be used inside the
     * linked list, and therefore the size field is adjusted appropriately.
     */
    Cell* makeCell(int value);

    /* Destroys the given cell, which is presumed to be in the linked list. */
    void destroy(Cell* toRemove);
};

Now, the .cpp file:

#include "error.h" // Because bad things happen.

Deque::Deque() {
    /* Set up the empty dummied, doubly-linked list. */
    head = new Cell;
    tail = new Cell;
    head->next = tail;
    tail->prev = head;
    head->prev = tail->next = nullptr;

    numElems = 0;
}

Deque::~Deque() {
    /* Delete all cells in the list, including the head and tail. */
    while (head != nullptr) {
        Cell* next = head->next;
        delete head;
        head = next;
    }
}

/* Because we’ve cached the size, we don’t need to scan through the list when
 * we want to determine how many elements there are.
 */
int Deque::size() const {
    return numElems;
}

/* Good programming exercise: suppose that we didn’t cache the number of elements
 * in the list and that size() had to scan over the entire list in time O(n). How
 * would you implement isEmpty() given that we have a head and tail pointer?
 */
bool Deque::isEmpty() const {
    return size() == 0;
}

/* Our helper function that makes a cell. We could have alternatively defined a
 * constructor on the Cell type (yes, you can do that!), but I chose to do things
 * this way to show off Yet Another Memorable Piece of C++ Syntax. Since the
 * Cell type is nested inside Deque, the return type of this function has to be
 * Deque::Cell*, indicating that Cell is defined inside of Deque.
 */
Deque::Cell* Deque::makeCell(int value) {
    Cell* result = new Cell;
    result->value = value;

    numElems++;
    return result;
}

/* pushFront is essentially insertAfter with the head pointer. */
void Deque::pushFront(int value) {
    Cell* cell = makeCell(value);
    cell->prev = head;
    cell->next = head->next;
    cell->prev->next = cell;
    cell->next->prev = cell;
}

/* pushBack is essentially insertBefore with the tail pointer. */
void Deque::pushBack(int value) {
    Cell* cell = makeCell(value);
    cell->next = tail;
    cell->prev = tail->prev;
    cell->prev->next = cell;
    cell->next->prev = cell;
}

/* To look at the front or back, we have to skip over the head or tail nodes,
 * since they’re dummies and don’t actually have any data in them.
 */
int Deque::peekFront() const {
    if (isEmpty()) error("This is why we can’t have nice things.");
    return head->next->value;
}

int Deque::peekBack() const {
    if (isEmpty()) error("This is why we can’t have nice things.");
    return tail->prev->value;
}

/* The destroy operation is essentially our remove function from earlier.
 * Notice that we do not need to use the full name Deque::Cell here because Cell
 * is an argument to a member function that’s part of the Deque type.
 */
void Deque::destroy(Cell* toRemove) {
    toRemove->next->prev = toRemove->prev;
    toRemove->prev->next = toRemove->next;

    /* At this point it’s spliced out of the list. */
    delete toRemove;
    numElems--;
}

/* popFront and popBack are essentially just wrapped splice-outs. */
int Deque::popFront() {
    int result = peekFront();
    destroy(head->next);
    return result;
}

int Deque::popBack() {
    int result = peekBack();
    destroy(tail->prev);
    return result;
}

Problem Nine: Complementary Strands

In humans, DNA strands are always paired with a complementary strand. This is a new strand of DNA linked alongside the original strand, except with different nucleotides. Specifically, any time there’s an A in the initial strand there’s a T in the complementary strand, and any time there’s a C in the initial strand there’s a G in the complementary strand (and vice-versa). Here’s an example of what this might look like:

Two strings of nucleotides, each represented as a doubly linked list. Each nucleotide in the top string is doubly linked to the nucleotide at the same position in the bottom string. The top string starts with G T C and the bottom string starts with C A G.

We can represent a double-stranded DNA sequence as a modified doubly-linked list of nucleotides, where each nucleotide links to the nucleotides before and after it in the sequence, as well as to the nucleotide across from it:

struct Nucleotide {                                         
    char value;         // 'A', 'C', 'G', or 'T'           
    Nucleotide* next;   // To the right                    
    Nucleotide* prev;   // To the left                     
    Nucleotide* across; // Vertically, to the other strand.
};                                                          

For example, in the picture shown above, the nucleotide G in the upper-left corner would have its prev pointer set to nullptr, its next pointer set to the T nucleotide next to it, and its across pointer set to the C beneath it. The C nucleotide beneath it would have its next pointer set to the A to its right, its prev pointer set to nullptr, and its across pointer back to the G in the top strand.

Write a function

void addComplementaryStrand(Nucleotide* dna);

that takes as input a single strand of DNA whose across pointers have not been initialized. The function then constructs the complementary strand, then links the two strands together by appropriately wiring the across pointers of the two strands.

Some notes on this problem:

The across pointers of each of the nucleotides in the input sequence have not been initialized when this function is called. You should not assume anything about where they point.
Your solution must run in time O(n), where n is the number of nucleotides in dna.
The input pointer will always point to the first nucleotide in the strand, or will be nullptr if the input DNA strand is empty.
You can assume all letters in the DNA strand are either A, C, G, or T.
You may not use any container types (e.g. Vector, Set, etc.) in the course of solving this problem. This includes the string type.

Solution

There are several options here. Here’s one:

/* Given a base, returns its complementary base. */
char complementOf(char base) {
    if (base == 'A') return 'T';
    if (base == 'C') return 'G';
    if (base == 'G') return 'C';
    if (base == 'T') return 'A';
    error("Unknown nucleotide.");
}

void addComplementaryStrand(Nucleotide* dna) {
    while (dna != nullptr) {
        Nucleotide* partner = new Nucleotide;

        /* Set the value. */
        partner->value = complementOf(dna->value);

        /* Wire the across pointers. */
        dna->across = partner;
        partner->across = dna;

        /* As of now, this cell has no next pointer, since we haven't built the
         * next cell yet.
         */
        partner->next = nullptr;

        /* There may be a previous cell, though. */
        if (dna->prev != nullptr) {
            partner->prev = dna->prev->across;
            dna->prev->across->next = partner;
        } else {
            partner->prev = nullptr;
        }

        /* Walk forward to the next cell. */
        dna = dna->next;
    }
}

Problem Ten: Thresholding

Write a function

void removeValuesAbove(Cell*& list, int value);

that modifies a list by removing all cells whose value is strictly greater than value. The other cells should remain in their same relative order. The removed cells should be deleted to avoid memory leaks.

Solution

void removeValuesAbove(Cell*& list, int value) {
    /* Keep track of the previous linked list cell so we can rewire its
     * next pointer if necessary.
     */
    Cell* prev = nullptr;

    /* Walk over the list, removing cells as necessary. */
    Cell* curr = list;
    while (curr != nullptr) {
        /* Do we need to remove this one? */
        if (curr->value > value) {
            /* If we have a previous pointer, wire it around this cell. */
            if (prev != nullptr) {
                prev->next = curr->next;
            }
            /* Otherwise, we are the head of the list. In that case, the
             * list head needs to change, since it's pointing to a cell
             * that will no longer exist.
             */
            else {
                list = list->next;
            }

            /* Remove this cell from the list. */
            Cell* next = curr->next;
            delete curr;
            curr = next;
        }
        /* Otherwise, we don't remove this cell. It's now the previous
         * cell.
         */
        else {
            prev = curr;
            curr = curr->next;
        }
    }
}