Section 8. Trees


Section materials curated by Neel Kishnani, drawing upon materials from previous quarters.

This week’s section exercises are all about trees, particularly binary search trees and common tree idioms and algorithms. Trees are yet another way to organize the way that data is stored, and they are perhaps one of the most powerful paradigms for data storage that we've encountered so far! Their recursive structure makes writing recursive functions very natural, so we will be using lots of recursion when working with trees. After you're done working through this section handout, you'll truly know what it means to party with trees!

📦 Starter code

In what follows, assume you have access to this Node type:

struct Node {
    int value;
    Node* left;
    Node* right;
};

Trees

Problem One: Tree-quality

Write a function

bool areEqual(Node* one, Node* two);

that take as input pointers to the roots of two binary trees (not necessarily binary search trees), then returns whether the two trees have the exact same shape and contents.

Let’s use the recursive definition of trees! The empty tree is only equal to the empty tree. A nonempty tree is only equal to another tree if that tree is nonempty, if the roots have the same values, and if the left and right subtrees of those roots are the same. That leads to this recursive algorithm:

bool areEqual(Node* one, Node* two) {
    /* Base Case: If either tree is empty, they had both better be empty. */
    if (one == nullptr || two == nullptr) {
        return one == two; // At least one is null
    }

    /* We now know both trees are nonempty. Confirm the root values match and
     * that the subtrees agree.
     */
    return one->value == two->value &&
           areEqual(one->left, two->left) &&
           areEqual(one->right, two->right);
}

For a Not At All Fun Or Exciting exercise, try writing this one iteratively!


Problem Two: A Problem of Great Depth and Complexity

Write a function

int heightOf(Node* root);

that returns the height of the given tree. By convention, an empty tree has height -1. Then talk about the big-O time complexity of your solution.

There are many ways to write this function. The easiest one that I know of is to use a nice recursive observation: the height of the empty tree is -1, and the height of a nonempty tree is always one plus the height of the larger of the heights of its two subtrees (do you see why?). To check this, think about what that says about the height of a tree with a single node, the height of a highly degenerate tree, etc. Here’s what this looks like:

int heightOf(Node* root) {
    if (root == nullptr) return -1;

    return 1 + max(heightOf(root->left), heightOf(root->right));
}

So how efficient is this code? Well, notice that it visits every node in the tree once and exactly once, doing O(1) work at each node. There are O(n) total nodes in the tree, so this does a total of O(n) work.


Problem Three: Palindromic Trees

​ A binary tree (not necessarily a binary search tree) is called a palindromic tree if it’s its own mirror image. For example, the tree on the left is a palindromic tree, but the tree on the right is not: The tree on the left is a palindromic tree. It looks like this: Node 3 on the first level, nodes 2 and 2 on the second level, nodes 5, 4 and 4, 5 on the third level, and nodes 1 & 2 under the left-most node 4 and nodes 2&1 under the right-most node 4. The tree on the right is not a palindromic tree. It looks like this: Node 3 on the first level, nodes 1 and 1 on the second level, nodes 1 & 2 as the children of the left-most 1 node on the third level, and nodes 1 & 2 as the children of the right-most node on the third level.

Write a function

bool isPalindromicTree(Node* root);

that takes in a pointer to the root of a binary tree and returns whether it’s a palindromic tree. ​

​ To solve this problem, we’ll solve a slightly more general problem: given two trees, are they mirrors of one another? We can then check if a tree is a palindrome by seeing whether that tree is a mirror of itself. ​

bool areMirrors(Node* root1, Node* root2) {
    /* If either tree is empty, both must be. */
    if (root1 == nullptr || root2 == nullptr) {
        return root1 == root2;
    }

    /* Neither tree is empty. The roots must have equal values. */
    if (root1->value != root2->value) {
        return false;
    }

    /* To see if they're mirrors, we need to check whether the left subtree of
     * the first tree mirrors the right subtree of the second tree and vice-versa.
     */
    return areMirrors(root1->left, root2->right) &&
           areMirrors(root1->right, root2->left);
}

bool isPalindromicTree(Node* root) {
    return areMirrors(root, root);
}

(Thanks to legendary SL Ali Malik for this code!) ​


Problem Four: The Ultimate and Penultimate Values

Write a function

Node* biggestNodeIn(Node* root);

that takes as input a pointer to the root of a (nonempty) binary search tree, then returns a pointer to the node containing the largest value in the BST. As usual, call error if the tree is empty. What is the runtime of your function if the tree is balanced? If it’s imbalanced? (Fun Fact: This first algorithm is how the Set<T>::first() function works.)

Then, write a function

Node* secondBiggestNodeIn(Node* root);

that takes as input a pointer to the root of a BST containing at least two nodes, then returns a pointer to the node containing the second-largest value in the BST. (If the BST is empty or only has one item in it, call error.) Then answer the same runtime questions posed in the first part of this problem.

We could solve this problem by writing a function that searches over the entire BST looking for the biggest value, but we can do a lot better than this! It turns out that the biggest value in a BST is always the one that you get to by starting at the root and walking to the right until it’s impossible to go any further. Here’s a recursive solution that shows off why this works:

Node* biggestNodeIn(Node* root) {
    if (root == nullptr) error("Nothing to see here, folks.");

    /* Base case: If the root of the tree has no right child, then the root node
     * holds the largest value because everything else is smaller than it.
     */
    if (root->right == nullptr) return root;

    /* Otherwise, the largest value in the tree is bigger than the root, so it’s
     * in the right subtree.
     */
    return biggestNodeIn(root->right);
}

And, of course, we should do this iteratively as well, just for funzies:

Node* biggestNodeIn(Node* root) {
    if (root == nullptr) error("Nothing to see here, folks.");

    while (root->right != nullptr) root = root->right;

    return root;
}

Getting the second-largest node is a bit trickier simply because there’s more places it can be. The good news is that it’s definitely going to be near the rightmost node – we just need to figure out exactly where.

There are two cases here. First, imagine that the rightmost node does not have a left child. In that case, the second-smallest value must be that node’s parent. Why? Well, its parent has a smaller value, and there are no values between the node and its parent in the tree (do you see why?) That means that the parent holds the second-smallest value. The other option is that the rightmost node does have a left child. The largest value in that subtree is then the second-largest value in the tree, since that’s the largest value smaller than the max. We can use this to write a nice iterative function for this problem that works by walking down the right spine of the tree (that’s the fancy term for the nodes you get by starting at the root and just walking right), tracking the current node and its parent node. Once we get to the largest node, we either go into its left subtree and take the largest value, or we return the parent, whichever is appropriate.

Node* secondBiggestNodeIn(Node* root) {
    if (root == nullptr) error("Nothing to see here, folks.");

    /* Walk down the right spine, tracking the node just above us. */
    Node* prev = nullptr;
    Node* curr = root;
    while (curr->right != nullptr) {
        prev = curr;
        curr = curr->right;
    }

    /* If we have a left child, find the maximum value there. */
    if (curr->left != nullptr) {
        return biggestNodeIn(curr->left);
    }

    /* Otherwise, the node above us is the second-biggest. */
    if (prev == nullptr) {
        error("Only one thing to see here, folks.");
    }
    return prev;
}

Notice that all three of these functions work by walking down the tree, doing a constant amount of work at each node. This means that the runtime is O(h), where h is the height of the tree. In a balanced tree that’s O(log n) work, and in an imbalanced tree that’s O(n) work in the worst-case.


Problem Five: Checking BST Validity

A binary search tree. The root node is 4, and has child nodes 3 on the left and 9 on the right. The node 3 has children 1 on the left and 2 on the right.

You are given a pointer to a Node that is the root of some type of binary tree. However, you are not sure whether or not it is a binary search tree. That is, you might get a tree like the one shown to the right, which is a binary tree but not a binary search tree. Write a function

bool isBST(Node* root);

that, given a pointer to the root of a tree, determines whether or not the tree is a legal binary search tree. You can assume that what you’re getting as input is actually a tree, so, for example, you won’t have a node that has multiple pointers into it, no node will point at itself, etc.

As a hint, think back to our recursive definition of what a binary search tree is. If you have a node in a binary tree, what properties must be true of its left and right subtrees for the overall tree to be a binary search tree? Consider writing a helper function that hands back all the relevant information you’ll need in order to answer this question.

There are a bunch of different ways that you could write this function. The one that we’ll use is based on recursive definition of a BST from lecture: a BST is either empty, or it’s a node x whose left subtree is a BST of values smaller than x and whose right subtree is a BST of values greater than x.

This solution works by walking down the tree, at each point keeping track of two pointers to nodes that delimit the range of values we need to stay within.

bool isBSTRec(Node* root, Node* lowerBound, Node* upperBound) {
    /* Base case: The empty tree is always valid.*/
    if (root == nullptr) return true;
    
    /* Otherwise, make sure this value is in the proper range. */
    if (lowerBound != nullptr && root->value <= lowerBound->value) return false;
    if (upperBound != nullptr && root->value >= upperBound->value) return false;
    
    /* Okay! We're in range. So now we just need to confirm that the left and
     * right subtrees are good as well. Notice how the range changes based on the
     * introduction of this node.
     */
    return isBSTRec(root->left,  lowerBound, root) &&
           isBSTRec(root->right, root,       upperBound);
}

bool isBST(Node* root) {
    return isBSTRec(root, nullptr, nullptr);
}

Problem Six: Deleting from a BST

We didn’t talk about how to delete nodes out of a BST, and that’s for a good reason – it’s surprisingly challenging! Let’s suppose you want to delete a node out of a BST. There are three cases to consider:

  1. The node is a leaf. In that case, it’s really easy to delete, so we just go and delete it.

  2. The node has exactly one child. In that case, we delete the node and “replace” it with that one child by updating the node’s parent so that it points directly at the single child.

  3. The node has two children. In that case, we do the following. Suppose we want to delete the node containing $x$. Find the node with the largest value in $x$’s left subtree. Copy the value from that node and overwrite the value $x$. Then, go and delete that new node instead of $x$.

A binary search tree. The root node is G, with children E on the left and N on the right. E has a child A on the left, and A has a child C on the right. N has children K on the left and Q on the right. K has children I on the left and M on the right.

Trace through this algorithm by hand on the tree above, deleting C, then E, then N, then M. Then, implement a function

void removeFrom(Node*& root, int value);

that removes the specified value from the given BST, if that value exists. Finally, discuss the runtime of the algorithm you implemented as a function of the height of the tree.

C is a leaf node, so we can delete it by just removing it from the tree:

The same binary search tree from before, with the leaf node C removed

The node E has only a single child, so we just remove it and replace it with its child to get this tree:

E is removed from the tree and replaced with its child, A. The resulting tree is as follows: root node G, with child A on the left and N on the right. N has children K on the left and Q on the right. K has children I on the left and M on the right.

The node N has two children. The largest value in its left subtree is M, so we replace M by N, then delete the node N. This is shown here:

The node N is deleted, and replaced with node M. The updated tree: Root node G, with child A on the left and M on the right.M has children K on the left and Q on the right. K has child I on the left.

The node M now has two children. To delete M, we start by finding the largest value in its left subtree (K) and replacing M by K. We then delete the node that formerly held K. Since that node has exactly one child, we just replace it with its one child. The result is shown here:

The node M is deleted, and replaced with node K. The updated tree: Root node G, with child A on the left and M on the right.M has children I on the left and Q on the right.

Here’s some code for how to do this deletion.

int removeLargestFrom(Node*& root);
void performDeletion(Node*& toRemove);

void removeFrom(Node*& root, int value) {
    /* If the tree is empty, there’s nothing to remove! */
    if (root == nullptr) return;
    /* If the node to delete is to the left, remove it from there. */
    else if (value < root->value) {
        removeFrom(root->left, value);
    }
    /* If the node to delete is to the right, remove from there. */
    else if (value > root->value) {
        removeFrom(root->right, value);
    }
    /* Otherwise, we’ve found the node to remove – so go remove it! */
    else {
        performDeletion(root);
    }
}
/* Actually does the deletion necessary to remove a node from the tree. */
void performDeletion(Node*& toRemove) {
    /* Case 1: The node is a leaf. Then we just delete it. */
    if (toRemove->left == nullptr && toRemove->right == nullptr) {
        delete toRemove;

        /* Inform whoever was pointing at us that we no longer exist. */
        toRemove = nullptr;
    }
    /* Case 2a: Only have a left child. */
    else if (toRemove->right == nullptr) {
        Node* replacement = toRemove->left;
        delete toRemove;
        toRemove = replacement;
    }
    /* Case 2b: Only have a right child. */
    else if (toRemove->left == nullptr) {
        Node* replacement = toRemove->right;
        delete toRemove;
        toRemove = replacement;
    }
    /* Case 3: Replace this node with the largest node in its left subtree. */
    else {
        toRemove->value = removeLargestFrom(toRemove->left);
    }
}

/* Deletes the largest value from the specified tree, returning that value. */
int removeLargestFrom(Node*& root) {
    if (root->right == nullptr) {
        int result = root->value;
        performDeletion(root);
        return result;
    }
    return removeLargestFrom(root->right);
}