Problem Set 5

Due Friday, November 1 at 1:00 pm Pacific

Solutions are available!

🔑 Solutions

This problem set – the last one purely on discrete mathematics – is designed as a cumulative review of the topics we’ve covered so far and a proving ground to try out your newfound skills with mathematical induction. The problems here span all sorts of topics and we hope that it serves as a fitting coda to our whirlwind tour of discrete math!

We recommend that you read the Guide to Induction before starting this problem set. It contains a lot of useful advice about how to approach problems inductively, how to structure inductive proofs, and how to not fall into common inductive traps. Additionally, before submitting, be sure to read over the Induction Proofwriting Checklist for a list of specific things to watch for in your solutions before submitting.

As a note on this problem set – normally, you're welcome to use any proof technique you'd like to prove results in this course. On this problem set, we've specifically asked on some problems that you prove a result inductively. For those problems, you should prove those results using induction or complete induction, even if there is another way to prove the result. (If you'd like to use induction in conjunction with other techniques like proof by contradiction or proof by contrapositive, that's perfectly fine.)

Good luck, and have fun!

Starter Files

This problem set does not have a coding component.

If you would like to type your solutions in $\LaTeX$, you may want to download this template file where you can place your answers:

🖋 PS5 $\LaTeX$ Template

Problem One: Induction Proof Critiques

Below are a collection of proofs by induction. For each proof, do the following:

Critique the proofwriting style in regards to the Induction Proofwriting Checklist and Guide to Induction.
Identify any logic errors in the execution of the proofs and explain why they are logic errors.
Determine whether the "theorem" being proved is actually true. If it's true, just tell us that it's true. If it's false, give a counterexample.

You do not need to rewrite these proofs – just give us your feedback on the items we mentioned by filling in this table:

Checklist Item	Violation?	If Yes, where is the error?
Make $P(n)$ a predicate, not a number or function.”	Yes/No
Watch your variable scoping in $P(n)\text.$	Yes/No
"Build up" if $P(n)$ is existentially quantified; "build down" if $P(n)$ is universally-quantified.	Yes/No
Choose the simplest base cases possible, and avoid redundant base cases.	Yes/No
Are there any logic errors? If so, where, and why are they logic errors?
Is the overall theorem true?

Critique the following proof about sums of natural numbers.

Theorem: The sum of the first $n$ natural numbers is $\frac{n(n - 1)}{2}$.

Proof: Let $P(n) = \frac{n(n - 1)}{2}$. We will prove by induction on $n$ that $P(n)$ holds for all $n \in \naturals$, from which theorem follows.

As our base cases, we prove $P(0)$ and $P(1)$. First we’ll prove $P(0)$, that the sum of the first zero natural numbers is $\frac{0(0 - 1)}{2}$. The sum of no numbers is the empty sum $(0)\text,$ and we see that $\frac{0(0 - 1)}{2} = 0$, so $P(0)$ is true. Next, we’ll prove $P(1)$. The sum of the first natural number is $0$ (since $0$ is the smallest natural number), and we note that $\frac{1(1 - 1)}{2} = 0$, so $P(1)$ is true.

For our inductive step, assume for all natural numbers $k$ that $P(k)$ is true. This means
\[0 + 1 + 2 + \dots + (k - 1) \quad = \quad \frac{k(k - 1)}{2}\text.\]
We will prove $P(k+1)$, that the sum of the first $k+1$ natural numbers is equal to $\frac{(k + 1)k}{2}$. Starting with the sum of the first $k+1$ natural numbers, we see that
\[\begin{array}{lcl} 0 + 1 + 2 + \dots + (k - 1) + k & = & \frac{(k+1)k}{2} \\ 0 + 1 + 2 + \dots + (k - 1) & = & \frac{(k + 1)k}{2} - k \\ 0 + 1 + 2 + \dots + (k - 1) & = & \frac{(k + 1)k}{2} - \frac{2k}{2}\\ 0 + 1 + 2 + \dots + (k - 1) & = & \frac{(k + 1 - 2)k}{2} \\ 0 + 1 + 2 + \dots + (k - 1) & = & \frac{k(k - 1)}{2} \text. \end{array}\]
We've reached our inductive hypothesis, so the equation is true. Thus $P(k+1)$ holds, completing the induction. $\qed$

Critique the following proof about directed graphs.

Theorem: No directed graphs have any cycles.

Proof: Let $P(n)$ be the statement “for any $n \in \naturals$, no directed graph with $n$ nodes contains a cycle.” We will prove by induction that $P(n)$ holds for all $n \in \naturals$, from which the theorem follows.

As a base case, we prove $P(0)$, that no directed graphs with $0$ nodes contain a cycle. To see this, consider any directed graph $G$ with no nodes. Since $G$ has no nodes, it has no closed walks, since all closed walks contain at least one node. Therefore, $G$ has no cycles, as required.

For our inductive step, assume for some arbitrary $k \in \naturals$ that P(k) is true and no directed graph with $k$ nodes has any cycles. We will prove that $P(k+1)$ is true, namely, that no directed graphs with $k+1$ nodes have any cycles.

Consider a directed graph $G$ with $k$ nodes. By our inductive hypothesis, we know that $G$ has no cycles. Now, consider the graph $G'$ formed by adding a new node $v$ to $G$, then adding edges from $v$ to each node in $G$. We claim that this directed graph has no cycles. To see this, note that any cycle in $G'$ must involve $v$, since $G$ has no cycles on its own. But this is impossible, since any walk that leaves $v$ can never return to it and therefore can't be a closed walk. Therefore, $G'$ has no cycles, so $P(k+1)$ holds, completing the induction. $\qed$

Problem Two: Recurrence Relations

A recurrence relation is a way of defining an infinitely long sequence (usually, a sequence of numbers). A recurrence relation specifies the value of the first term or terms of the sequence, then defines the remaining entries from the previous terms. For example, here’s a simple recurrence relation:

\[a_0 = 1 \qquad a_{n+1} = 2a_n\]

The first terms of this sequence are given as follows:

$a_0 = 1$, since that’s what the first rule says.
$a_1 = 2$, since the second rule says that $a_1 = 2a_0 = 2 \cdot 1 = 2\text.$
$a_2 = 4$, since the second rule says that $a_2 = 2a_1 = 2 \cdot 2 = 4\text.$
$a_3 = 8$, since the second rule says that $a_3 = 2a_2 = 2 \cdot 4 = 8\text.$

Extending further, this sequence starts off with the numbers

\[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, \dots,\]

which all happen to be powers of two. It turns out that this isn't a coincidence – this recurrence relation perfectly describes the powers of two.

Fill in the blanks below to complete the proof by induction.

Theorem: For all natural numbers $n$, we have $a_n = 2^n$.

Proof: Let $P(n)$ be the statement "$a_n = 2^n$." We will prove by induction that $P(n)$ holds for all $n \in \naturals$, from which the theorem follows.

As our base case, we prove $P(\blank)$, that $\blank$. To see this, $\blank$.

For our inductive step, assume for some arbitrary $k \in \naturals$ that $P(k)$ is true, meaning that $\blank$. We need to show $P(\blank)$, meaning that $\blank$. To see this, note that

\[\begin{aligned} a_{k+1} &= \blank \\ &= \blank && \text{(by our IH)} \\ &= \blank \text. \end{aligned}\]

Therefore, we see that $\blank$, so $P(\blank)$ is true, completing the induction. $\qed$

The relative sizes of the blanks here do not necessarily indicate how much you need to write in each spot.

In case you’re wondering what you’re asked to prove here, you can think of this recurrence relation as a mathematical way of writing out this recursive function:

int a(int n) {
    if (n == 0) return 1;
    return 2 * a(n - 1);
}

For any $n \in \naturals$, you can compute a(n) by just running this code, and after doing some computation it will return the value of $a_n$. What we’re asking you to do is the mathematical equivalent of showing that the value returned by a(n) is always $2^n$. While it might help to think about things in terms of this analogy, your proof should not reference this code and should just use the definitions given in the problem statement.

Taking a recurrence relation like the relation $a_n$ here and finding a simple, non-recursive expression for it is called solving the recurrence. It's common in theoretical CS to need to solve recurrences. For example, when determining the efficiency of a recursive algorithm, you might first start by writing a recurrence relation for its runtime, then solve the recurrence to get a simpler, direct way of calculating its efficiency. Take CS161 to learn how to do this!

Problem Three: Square Triangular Numbers

As a refresher, a natural number $n$ is called a square number when there is a natural number $k$ such that $n = k^2\text.$ Similarly, a natural number $n$ is a triangular number when there is a natural number $k$ such that $n = \frac{k(k+1)}{2}\text.$

Some numbers are both square and triangular. For example, 36 is both square $(36 = 6^2)$ and triangular $(36 = \frac{8(8+1)}{2})\text.$ You can see this visually below; each figure is made of 36 squares.

There are infinitely many numbers that are both square and triangular. To see this, consider the following recurrence relation:

\[a_0 = 1 \qquad a_{n+1} = 4a_n \cdot (8a_n + 1)\text.\]

Prove by induction that $a_n$ is both square and triangular for all natural numbers $n\text.$

Problem Four: Presenting Presenters Presents

There is a conference for people who host events (emcees). Each emcee arrives with gifts for up to three other emcees. To avoid ruining surprises, the event organizer sets the following rule: no emcee is allowed to stay at the same hotel as a emcee they will give a gift to. The question then arises - how many hotels are necessary to house the emcees?

You might think that if there's a big group of emcees (say, 100 or so) and each one arrives with three gifts that you would need a lot of hotels to house everyone. But that's actually not the case. It turns out you never need more than seven hotels to hold everyone.

We’re going to ask you to prove this using a proof by induction with this predicate $P(n)$:

$P(n)$ is the predicate “for any group of $n$ emcees, each of whom has gifts for up to three other emcees, there is a way to house the emcees in seven hotels so that no emcee stays at a hotel with a emcee they have a gift for”

Before proceeding, let’s confirm you see the general structure of the proof.

Is $P(n)$ a universally-quantified statement or an existentially-quantified statement? Based on that, will you “induct up,” or will you “induct down?”

There are multiple quantifiers in $P(n)$. We care about the first one.

Prove by induction that $P(n)$ is true for all $n \in \naturals$.

It might be helpful to think about the emcee who receives the fewest number of gifts.

Problem Five: Regular Graphs

A graph $G = (V, E)$ is called $d$-regular if every node in $G$ has degree exactly $d$. (As a reminder, the degree of a node $v$ is the number of edges that have $v$ as an endpoint, or, equivalently, the number of nodes that $v$ is adjacent to.)

For example, here's a 3-regular graph with 6 nodes and a 4-regular graph with 7 nodes:

Prove by induction that for all natural numbers $n$ there exists an $n$-regular graph containing exactly $2^n$ nodes.

Is this a problem where you'll "induct up," or a problem where you'll "induct down?"

As a hint, moving from $2^k$ to $2^{k+1}$ doubles the number of nodes in the graph, and moving from $2^{k+1}$ to $2^k$ halves the number of nodes in the graph. If you think this is an "induct up" problem, try finding a natural way to "double" a graph. If you think this is an "induct down" problem, try finding a natural way to "halve" a graph.

Regular graphs often make appearances in combinatorics and theoretical computer science. Graphs with the specific property you're exploring in this problem - where there are a huge number of nodes of modest degree - have applications in parallel computing, network design, and error-correcting codes. Take CS144 (Computer Networks), CS149 (Parallel Computing) or CS250 (Algebraic Error-Correcting Codes) to learn more!

Problem Six: The Spanning Tree Protocol

In lecture, we discussed how local area networks must be designed as trees to avoid broadcast storms. However, this solution is less than ideal. First, if any single link fails, it disconnects the network (trees are minimally-connected graphs). Second, if someone accidentally plugs in a cable that links two already-connected devices, the resulting network will have a loop (trees are maximally-acyclic graphs).

Fortunately, there's a way to avoid this. We begin by removing most restrictions on the link structure of the network. Any device can be plugged into any other device, even if doing so creates cycles. The only restriction is that the resulting graph must be connected. Next, each computer on the network selectively disables some of its outgoing links so that the remaining links form a tree. The computers then communicate purely on the links making up this tree. This strategy ensures communication always proceeds along a tree (full connectivity and no broadcast storms) while mitigating the two weaknesses given above.

Let's formalize this idea. Mathematically, given the graph $G = (V, E)$ of the computers and links on the network, we want to find a subset of the edges in $E$ that form a tree. Such a tree is called spanning tree of $G\text.$ The algorithm network devices use to choose this spanning tree is called the Spaning Tree Protocol (or STP for short) and is an essential component of the internet.

Here is a slightly simplified description of STP. We assume the graph $G = (V, E)$ of computers $(V)$ and links $(E)$ is connected. First, the computers on the network choose a function $d : V \to \naturals$ with the following properties:

\[\begin{aligned} \text{(I)} & \quad \text{there is exactly one node } r \text{ where } d(r) = 0\text. \\ \text{(II)} & \quad \forall u \in V. \left(u \ne r \to \exists v \in V. \left( \set{u, v} \in E \land d(v) \lt d(u) \right) \right)\text. \end{aligned}\]

Next, the computers on the network agree on a function $p : V - \set{r} \to V$ with the following properties:

\[\begin{aligned} \text{(III)} & \quad \forall v \in V. \left(v \ne r \to \set{v, p(v)} \in E\right)\text. \\ \text{(IV)} & \quad \forall v \in V. \left(v \ne r \to d(p(v)) \lt d(v)\right)\text. \end{aligned}\]

Finally, the computers classify each edge as either enabled or disabled according to the following rule:

\[\begin{aligned} \text{(V)} & \quad \forall e \in E. (e \text{ is enabled } \leftrightarrow \ \exists v \in V. e = \set{v, p(v)}) \end{aligned}\]

There is a lot to unpack in the description of STP. You may find it helpful to answer the following questions to build an intuition for what's going on. (We won't collect these for credit.)

What does rule (II) say, in plain English?
What does rule (III) say, in plain English?
What does rule (IV) say, in plain English?
Why was $V - \set{r}$ chosen as the domain for $p$ instead of $V\text?$ Why is it not possible to choose $V$ as its domain?
Why was $V$ chosen as the codomain for $p$ instead of $V - \set{r}\text?$ Why is it not possible to choose $V - \set{r}$ as its codomain?

We'll begin with a warmup exercise to help you get adjusted to STP.

Consider the graph $G = (V, E)$ shown below:

Here, $V = \set{r, v_0, v_1, v_2, v_3, v_4, v_5, v_6}\text.$ We have annotated each node $v \in V$ with the value of $d(v)\text.$ For example, $d(r) = 0$ and $d(v_5) = 137\text.$ Fill in the blanks below to define a function $p : V - \set{r} \to V$ meeting the necessary requirements. No justification is necessary.
\[p(v) = \begin{cases} \blank & \text{if } v = v_0 \\ \blank & \text{if } v = v_1 \\ \blank & \text{if } v = v_2 \\ \blank & \text{if } v = v_3 \\ \blank & \text{if } v = v_4 \\ \blank & \text{if } v = v_5 \\ \blank & \text{if } v = v_6 \end{cases}\]

There are multiple correct answers here; you only need to give us one choice of $p$ that works.

Once you've come up with $p\text,$ we recommend you draw out the graph and note which edges are enabled. This will help you check that you indeed have a spanning tree.

Now that you've gotten a feel for how STP works, we'd like you to prove that the set of edges enabled by STP forms a spanning tree of $G\text.$ We have broken this down into three smaller pieces.

Let $G = (V, E)$ be an arbitrary graph (not necessarily the one from part (i)) on which we've run STP. Prove the following fact by induction: for all natural numbers $n$ and for each node $v \in V$ where $d(v) = n\text,$ there is a walk in $G$ from $v$ to $r$ consisting solely of enabled edges.

The $d$ values of the nodes on the walks you find will change as you move from $v$ to $r\text.$ Can you control how much $d$ will change by? Based on that, what style of induction is most appropriate here?

We are expecting you to make use of the formal definitions given in the description of STP, so your proof should discuss $r\text,$ $d\text,$ $p\text,$ and some of the numbered properties.

Using your result from part (ii), prove that for any two nodes $u, v \in V\text,$ there is a walk from $u$ to $v$ consisting solely of enabled edges.

You can use any proof technique you'd like, but we are expecting you to structure your proof around the formal definitions of the relevant terms.

Prove that $G$ contains no cycles made purely of enabled edges.

Proceed by contradiction. If there is a cycle, look at a node in that cycle whose $d$ value is the largest.

As above, we are expecting a formal proof that calls back to the relevant definitions.

Your results from parts (iii) and (iv) collectively establish that, purely using enabled edges, the network graph is connected and acyclc (a tree). Congratulations! You've just proved a result about a key technology underpinning computer networking.

STP is essential in keeping the internet running. Its inventor, Radia Perlman, is sometimes credited as the "Mother of the Internet." To learn more about STP, including the details of how $d$ and $p$ are chosen, take CS144!

Optional Fun Problem: Egyptian Fractions

Leonardo Fibonacci, an eleventh-century Italian mathematician, is credited with introducing Hindu-Arabic numerals (the number system we use today) to Europe in his book Liber Abaci. The book Liber Abaci is also the source of the Fibonacci sequence (a sequence that begins 0, 1 and where each successive term is the sum of the two previous terms).

Relevant for this problem, Liber Abaci also described a method of writing out fractions called Egyptian fractions, which has been employed since ancient times; the Rhind Mathematical Papyrus, composed about 3,500 years ago in Thebes, includes several tables of fractions written out this way.

An Egyptian fraction is a sum of distinct fractions whose numerators are all 1 (these fractions are called unit fractions). For example, here are some sample Egyptian fraction representations:

\[\begin{array}{c} \frac{2}{3} = \frac{1}{2} + \frac{1}{6} \\ \frac{2}{15} = \frac{1}{10} + \frac{1}{30} \\ \frac{7}{15} = \frac{1}{3} + \frac{1}{8} + \frac{1}{120} \\ \frac{2}{85} = \frac{1}{51} + \frac{1}{255} \end{array}\]

Egyptian fractions are useful for divvying up objects fairly. For example, suppose you have two cakes to distribute to fifteen people – that is, everyone should get a $\frac{2}{15}$ fraction of those cakes. Begin by slicing each cake into tenths and giving each person one ($\frac{1}{10}$). Now, take the remaining tenths you haven’t distributed and cut them into thirds, giving thirtieths of the original cake. Each person then takes one of those ($\frac{1}{30}$). Because $\frac{1}{10} + \frac{1}{30} = \frac{2}{15}$, everyone gets their fair share. Pretty cool, isn’t it?

It's not immediately obvious that every fraction between 0 and 1 can be written this way, but, surprisingly, that is indeed the case. One way of finding an Egyptian fraction representation of a number is to use a greedy algorithm that works by finding the largest unit fraction at any point that can be subtracted out from the rational number. For example, to compute an Egyptian fraction representation for $\frac{42}{137}$, we would start off by noting that $\frac{1}{4}$ is the largest unit fraction less than $\frac{42}{137}$. We then say that

\[\frac{42}{137} = \frac{1}{4} + (\frac{42}{137} - \frac{1}{4}) = \frac{1}{4} + \frac{31}{548} \text.\]

We then repeat this process by finding the largest unit fraction less than $\frac{31}{548}$ and subtracting it out. This number is $\frac{1}{18}$, so we get

\[\frac{42}{137} = \frac{1}{4} + \frac{1}{18} + (\frac{31}{548} - \frac{1}{18}) = \frac{1}{4} + \frac{1}{18} + \frac{5}{4,932} \text.\]

The largest unit fraction we can subtract from $\frac{5}{4932}$ is $\frac{1}{987}$:

\[\frac{42}{137} = \frac{1}{4} + \frac{1}{18} + \frac{1}{987} (\frac{5}{4,932} - \frac{1}{987}) = \frac{1}{4} + \frac{1}{18} + \frac{1}{987} + \frac{1}{1,622,628} \text.\]

And at this point we're done, because the leftover fraction is itself a unit fraction.

Prove that the greedy algorithm for Egyptian fractions always terminates for any rational number $r$ in the range $0 \lt r \lt 1$ and always produces a valid Egyptian fraction. (A rational number is a real number that can be written as $r = \frac{p}{q}$ for some integers $p$ and $q$ where $q \ne 0$.) That is, the sum of the unit fractions should be the original number, there should only be finitely many fractions, and no unit fraction should be repeated. This shows that every rational number in the range $0 \lt r \lt 1$ has at least one Egyptian fraction representation.