Problem Set 9

Due Friday, March 14 at 1:00 PM Pacific

What problems are beyond our capacity to solve? Why are they so hard? And why is anything that we've discussed this quarter at all practically relevant? In this problem set – the last one of the quarter! – you'll explore the absolute limits of computing power.

Before beginning, we recommend reading

the Guide to Self-Reference, which does a deep-dive into undecidability via self-reference, and
the Guide to the Lava Diagram, which explores how to classify languages.

As always, please feel free to drop by office hours or ask on Ed if you have any questions. We'd be happy to help out.

Although you have two weeks to finish this problem set, it's designed to only take one week to complete. We are not expecting you to work on this during the break.

Good luck, and have fun!

Starter Files

Here are the starter files for the problems you'll submit electronically:

📦 PS9 Starter Files

Unpack the files somewhere convenient and open up the bundled project.

If you would like to type your solutions in $\LaTeX$, you may want to download this template file where you can place your answers:

🖋 PS9 $\LaTeX$ Template

General Advice

First, read through the index of problems below to get an idea of time management for this assignment.
Next, glance through all the questions before you start working. This not only ensures you won't be caught off guard as you reach later problems, but also gives your brain a chance to start working on some of these as a background process while you do other things.
If you get stuck, consider moving on to another problem for a while! Changing things up is a great way to get out of a rut, make new connections between concepts, and spark creative insights!
If you realize you don't have the requisite knowledge to complete a problem, take a step back from it and review the relevant lecture materials. Skipping lecture and trying to extract the relevant tidbits for each problem will lead to a more frustrating experience than engaging fully with a lecture and approaching the problem sets afterward.

Problem One: This Program is Not Responding

Most operating systems provide some functionality to detect programs that are looping infinitely. Typically, they display a dialog box containing a message like these:

These messages give the user the option to terminate the program or to let the program keep running.

You'd think that a good OS would shut down any program that had gone into an infinite loop, since these programs just waste system resources (processor time, battery power, etc.) that could be better spent by other programs.

Since it makes more sense for the OS to automatically detect programs that have gone into an infinite loop, why does the OS have to ask the user whether to terminate the program or let it keep running? Briefly explain your answer; 2-3 sentences should be enough, and we are not expecting a formal proof.

Problem Two: Verifier Warmups

Verifiers can feel a bit weird the first time you encounter them. The definition is abstract, and it never feels like they're actually solving a problem. These questions are designed to get you playing around a bit more with verifiers so you can build a better intuitive feel for them.

We recommend pulling up the formal definition of a verifier before starting this problem. Remember that the certificate $c$ can be the encoding of an object or the encoding of a group of one or more objects.

Consider the language $L = \Set{ \langle n \rangle \suchthat n \in \naturals \land n \text{ is a perfect square}}\text.$ Below is a list of programs that purport to be verifiers for $L\text.$ Which are, and which aren't? Briefly justify your answers, but no formal proof is necessary.

bool checkSquare1(int n) {
    for (int i = 0; i <= n; i++) {
        if (i * i == n) return true;
    }
    return false;
}

bool checkSquare2(int n) {
    int i = 0;
    while (true) {
        if (i * i == n) return true;
        i++;
    }
}

bool checkSquare3(int n, string w) {
    return w == "n is a perfect square."
}

bool checkSquare4(int n, int k) {
    return k * k == n;
}

bool checkSquare5(int n, int k, int m) {
    return k * m == n;
}

bool checkSquare6(int n, int k) {
    for (int i = 0; i < k; i++) {
        if (i * i == n) return true;
    }
    return false;
}

bool checkSquare7(int n, int i) {
    while (true) {
        if (i * i == n) return true;
        i++;
    }
}

Briefly explain why the following is a verifier for $A_{TM}\text.$ Specifically, explain why this verifier halts on all inputs and why it's the case that $\langle M, w\rangle \in A_{TM}$ if and only if there is a certificate $n$ where the verifier accepts $\langle M, w, n\rangle\text.$ No formal proof is necessary.
```
bool checkAccepts(TM M, string w, int n) {
    set up a simulation of M running on w;
    for (int i = 0; i < n; i++) {
        simulate one more step of M running on w;
    }
    return whether M executed the command "Return True";
}
```

Problem Three: Isn’t Everything Undecidable?

(We recommend reading the Guide to Self-Reference before attempting this problem.)

In lecture, we proved that $A_{TM}$ and the halting problem are undecidable – that, in some sense, they’re beyond the reach of algorithmic problem-solving. The proofs we used involved the nuanced technique of self-reference, which can seem pretty jarring and weird the first time you run into it. The good news is that with practice, you’ll get the hang of the technique pretty quickly!

One of the most common questions we get about self-reference proofs is why you can’t just use a self-reference argument to prove that every language is undecidable. As is often the case in Theoryland, the best way to answer this question is to try looking at some of the ways you might use self-reference to prove that every language is undecidable, then see where those arguments break down.

To begin with, consider this proof:

(Incorrect!) Theorem: All languages are undecidable.

(Incorrect!) Proof: By contradiction; assume there is a decidable language $L$. Then there is a decider $D$ for $L$. We can represent $D$ as a function

bool inL(string w);

that takes in a string $w$, then returns true if $w \in L$ and false otherwise.

Now, consider the following program:

bool trickster(string w) {
    return !inL(w);
}

Since inL decides $L$, we know that

\[\mathtt{inL(w)}\text{ returns true} \qquad \text{if and only if} \qquad w \in L\text.\]

Given how trickster is written, we see that

\[\mathtt{inL(w)}\text{ returns true} \qquad \text{if and only if} \qquad \mathtt{trickster(w)}\text{ returns false.}\]

This means that

\[\mathtt{trickster(w)}\text{ returns false} \qquad \text{if and only if} \qquad w \in L\text.\]

This is impossible. We've reached a contradiction, so our assumption was wrong and $L$ is undecidable. $\qed$

This proof has to be wrong because we know of many decidable languages.

What’s wrong with this proof? To answer this question, give us a specific claim made in the proof that is incorrect, then explain why it's incorrect.

Go one sentence at a time. For each claim that’s made, ask yourself – why, specifically, is this statement correct? Pull up the proof that $A_{TM}$ is undecidable and compare this proof and that one side-by-side, going one sentence at a time if you need to.

Here’s another incorrect proof that all languages are undecidable:

(Incorrect!) Theorem: All languages are undecidable.

(Incorrect!) Proof: By contradiction; assume there is a decidable language $L$. Then there is a decider $D$ for $L$. We can represent $D$ as a function

bool willAccept(string function, string w);

that takes in the source code of a function function and a string $w$, then returns true if function(w) returns true and false otherwise.

Now, consider the following program:

bool trickster(string w) {
    string me = /* source code for trickster */;

    /* See whether we'll accept, then do the opposite. */
    return !willAccept(me, w);
}

Since willAccept decides $L$ and me holds the source code of trickster, we know that

\[\mathtt{willAccept(me, input)}\text{ returns true}\qquad \text{if and only if} \qquad \texttt{trickster(input)} \text{ returns true.}\]

Given how trickster is written, we see that

\[\mathtt{willAccept(me, input)}\text{ returns true}\qquad \text{if and only if} \qquad \texttt{trickster(input)} \text{ doesn't return true.}\]

This means that

\[\mathtt{trickster(input)}\text{ returns true}\qquad \text{if and only if} \qquad \texttt{trickster(input)} \text{ doesn't return true.}\]

This is impossible. We've reached a contradiction, so our assumption was wrong and $L$ is undecidable. $\qed$

It’s a nice read, but this proof isn’t correct.

What’s wrong with this proof? To answer this question, give us a specific claim made in the proof that is incorrect, then explain why it's incorrect.

Many of the examples we’ve seen of undecidable languages involve checking for properties of Turing machines or computer programs, which might give you the sense that every question you might want to ask about TMs or programs is undecidable. That isn’t the case, though, and this question explores why.

Consider the following language $L$:

\[L = \Set{ \encoded{P} \suchthat P \text{ is a syntactically valid C++ program} }\]

Below is a purported proof that $L$ is undecidable:

(Incorrect!) Theorem: The language $L$ is undecidable.

(Incorrect!) Proof: By contradiction; assume $L$ is decidable. Then there is a decider $D$ for $L$. We can represent $D$ as a function

bool isValid(string function);

that takes in the source code of a function function, then returns true if function is syntactically valid and false otherwise.

Now, consider the following program:

void trickster() {
    string me = /* source code for trickster */;

    /* Execute a line based on whether our syntax is right. */
    if (isValid(me)) {
        oops, this line of code isn't valid C++!
    } else {
        int num = 137; // Perfectly valid syntax!
    }
}

Since isValid decides $L$ and me holds the source of trickster, we know that

\[\mathtt{isValid(me)}\text{ returns true} \qquad \text{if and only if} \qquad \mathtt{trickster} \text{ is syntactically valid.}\]

Given how trickster is written, we see that

\[\mathtt{isValid(me)}\text{ returns true} \qquad \text{if and only if} \qquad \mathtt{trickster} \text{ is not syntactically valid.}\]

This means that

\[\mathtt{trickster} \text{ is syntactically valid} \qquad \text{if and only if} \qquad \mathtt{trickster} \text{ is syntactically invalid.}\]

This is impossible. We've reached a contradiction, so our assumption was wrong and $L$ is undecidable. $\qed$

This proof, unfortunately, is incorrect.

What’s wrong with this proof? To answer this question, give us a specific claim made in the proof that is incorrect, then explain why it's incorrect.

Problem Four: Double Verification

This problem explores the following beautiful and fundamental theorem about the relationship between the $\rlangs$ and $\relangs$ languages:

If $L$ is a language, then $L \in \rlangs$ if and only if $L \in \relangs$ and $\overline{L} \in \relangs$.

This theorem has a beautiful intuition: it says that a language $L$ is decidable $(L \in \rlangs)$ precisely if for every string in the language, it's possible to prove it's in the language $(L \in \relangs)$ and, simultaneously, for every string not in the language, it's possible to prove that the string is not in the language $(\overline{L} \in \relangs)$. In this problem, we're going to ask you to prove one of the two directions of this theorem.

Let $L$ be a language where $L \in \relangs$ and $\overline{L} \in \relangs$. This means that there's a verifier for $L$ and a verifier for $\overline{L}$. In software, you could imagine that these verifiers correspond to functions with these signatures:

/* Verifier for L. This function has these properties:
 *
 *    checkInL(w, c) always halts.
 *    for all strings w, (w in L if and only if there's a c
 *        where checkInL(w, c) returns true
 *    )
 */
bool checkInL(string w, string c);

/* Verifier for L^bar. This function has these properties:
 *
 *    checkInLBar(w, c) always halts.
 *    for all strings w, (w in L^bar if and only if there's a c
 *        where checkInLBar(w, c) returns true
 *    )
 */
bool checkInLBar(string w, string c); 

Show that $L \in \rlangs$ by writing pseudocode for a function

bool isInL(string w)

that accepts as input a string $w$, then returns true if $w \in L$ and returns false if $w \notin L$. You don't need to write much code here. If you find yourself writing ten or more lines of pseudocode, you're probably missing something. No justification is required.

Look at the appendix for "Unsolvable Problems, Part II" to see how to convert a verifier into a recognizer or vice-versa. You may find some of the ideas there useful here.

Problem Five: The Lava Diagram

(We recommend reading the Guide to the Lava Diagram before starting this problem.)

Download the starter files for Problem Set 9 and extract them somewhere convenient. Run the provided program and choose “The Lava Diagram.” You’ll be presented with a Venn diagram showing the overlap of different classes of languages we've studied so far. We have also provided you a list of twelve numbered languages. For each of those languages, draw where in the Venn diagram that language belongs. As an example, we've indicated where Language 1 and Language 2 should go. No proofs or justifications are necessary – the purpose of this problem is to help you build a better intuition for what makes a language regular, $\rlangs$, $\relangs$, or none of these.

Use the provided to test your solution locally, then submit the file res/LavaDiagram.answers to Gradescope when you’re done.

$\Sigma^\star$
$L_D$
$\Set{ \mathtt{a}^n \suchthat n \in \naturals }$
$\Set{ \mathtt{a}^n \suchthat n \in \naturals \text{ and } n \text{ is a multiple of } 137 }$
$\Set{ \mathtt{1}^m\mathtt{+1}^n\mathtt{=1}^{m+n} \suchthat m, n \in \naturals }$
$\Set{ \encoded{M} \suchthat M \text{ is a TM and } M \text{ is a not a recognizer for } \emptyset }$
$\Set{ \encoded{M} \suchthat M \text{ is a TM and } M \text{ is a recognizer for } \emptyset }$
$\Set{ \encoded{M} \suchthat M \text{ is a TM and } M \text{ is a recognizer for } L_D }$
$\Set{ \encoded{M, n} \suchthat M\text{ is a TM, } n \in \naturals, \text{ and } M \text{ accepts all strings in } \Set{\mathtt{a}, \mathtt{b}}^\star \text{ of length at most } n }$
$\Set{ \encoded{M, n} \suchthat M\text{ is a TM, } n \in \naturals, \text{ and } M \text{ rejects all strings in } \Set{\mathtt{a}, \mathtt{b}}^\star \text{ of length at most } n }$
$\Set{ \encoded{M, n} \suchthat M\text{ is a TM, } n \in \naturals, \text{ and } M \text{ loops on all strings in } \Set{\mathtt{a}, \mathtt{b}}^\star \text{ of length at most } n }$
{ $\encoded{M_1, M_2, M_3, w} \suchthat M_1, M_2,$ and $M_3$ are TMs, $w$ is a string, and at least two of $M_1$, $M_2$, and $M_3$ accept $w$. }

Problem Six: $\corelangs$ and Beyond

When we first saw that $A_{TM}$ is undecidable, we could at least take consolation in the fact that $A_{TM}$ is recognizable. Then we discovered that $L_D$ is unrecognizable, and up to now we haven’t had a comforting fact to fall back on. But there is something about $L_D$ we can console ourselves with: the complement of $L_D$, the language $\overline{L_D}$, is an $\relangs$ language. (Do you see why?) In other words, while there’s no general way to prove that some TM will not accept its own encoding, there is a general way to prove that some TM will accept its own encoding.

The fact that $L_D$ isn’t an $\relangs$ language while $\overline{L_D}$ is still in $\relangs$ suggests that there might be a bit more to the computability landscape than just $\rlangs$ and $\relangs\text.$ And indeed there is: the same way that $A_{TM}$ is the poster child of an $\relangs$ language, the language $L_D$ is the poster child of a $\corelangs$ language. Formally speaking, the class $\corelangs$ consists of all the languages whose complement is an $\relangs$ language:

\[\corelangs = \Set{ L \suchthat \overline{L} \in \relangs }\text.\]

Intuitively speaking, the $\corelangs$ languages are languages where if you have a string that isn’t in the language, there’s some easy way to prove that it isn’t in the language.

Prove that $A_{TM} \notin \corelangs\text.$

You haven't seen the term $\corelangs$ prior to this problem, but you have seen something somewhere before that would be helpful in solving this problem.

At this point, we have an $\relangs$ language that’s not in $\corelangs\, (A_{TM})$ and a $\corelangs$ language that’s not in $\relangs\, (L_D)\text.$ As a coda to our treatment of unsolvable problems, let’s see a language that’s neither in $\relangs$ nor $\corelangs\text.$ In other words, there’s no general way to prove that strings in that language are indeed in that language, nor is there way to prove that strings not in that language are indeed not in that language. Yikes!

In what follows, let's assume all languages are over the alphabet $\Sigma = \Set{\mathtt{0}, \mathtt{1}}\text.$ Given two languages $A$ and $B$ over $\Sigma\text,$ the disjoint union of $A$ and $B\text,$ denoted $A \uplus B\text,$ is the language

\[A \uplus B = \mathtt{0}A \cup \mathtt{1}B\text.\]

For example, if $A = \Set{ {\color{blue} \mathtt{1}, \mathtt{10}, \mathtt{100}, \mathtt{1000} }}$ and $B = \Set{ {\color{red}\varepsilon, \mathtt{0}, \mathtt{1}, \mathtt{00}, \mathtt{01}, \mathtt{10}, \mathtt{11}} }\text,$ then $A \uplus B$ is the language

\[A \uplus B = \Set{ \mathtt{0\color{blue}1}, \mathtt{0\color{blue}10}, \mathtt{0\color{blue}100}, \mathtt{0\color{blue}1000}, \mathtt{1\color{red}}, \mathtt{1\color{red}0}, \mathtt{1\color{red}1}, \mathtt{1\color{red}00}, \mathtt{1\color{red}01}, \mathtt{1\color{red}10}, \mathtt{1\color{red}11} }\text.\]

Notice how each string in $A \uplus B$ is tagged with which language it originated in. Any string that starts with $\mathtt{0}$ came from $A\text,$ and any string that starts with $\mathtt{1}$ came from $B\text.$

Here is a useful theorem about disjoint unions and $\relangs\text:$

Theorem: If $A \uplus B \in \relangs\text,$ then $A \in \relangs\text.$

To see why this is the case, let $A$ and $B$ be languages where $A \uplus B \in \relangs\text.$ This means that there is a recognizer for $A \uplus B\text,$ which we can represent in software as a function

bool isIn_A_uplus_B(string w);

that takes as input a string, then returns true if and only if that string is in $A \uplus B\text.$

Below is a partial implementation of a function that acts as a recognizer for the language $A\text.$ Fill in the blank with a single line of code to complete that implementation. Feel free to use the isIn_A_uplus_B function in your solution. No justification is necessary.
```
/* Given a string w, returns true if w in A, and
 * does not return true if w notin A.
 */
bool isIn_A(string w) {
    return ______________;
}
```

Here is another useful theorem about disjoint unions and $\corelangs\text:$

Theorem: If $A \uplus B \in \corelangs\text,$ then $B \in \corelangs\text.$

To see why this is the case, let $A$ and $B$ be languages where $A \uplus B \in \corelangs\text.$ This means that there is a recognizer for $\overline{A \uplus B}\text,$ which we can represent in software as a function

bool isIn_Complement_A_uplus_B(string w);

that takes as input a string, then returns true if and only if that string is in $\overline{A \uplus B}\text.$

Below is a partial implementation of a function that acts as a recognizer for the language $\overline{B}\text.$ Fill in the blank with a single line of code to complete that implementation. Feel free to use the isIn_Complement_A_uplus_B function in your solution. No justification is necessary.
```
/* Given a string w, returns true if w in complement(B), and
 * does not return true if w notin complement(B).
 */
bool isIn_Complement_B(string w) {
    return ______________;
}
```

Using the theorems from parts (ii) and (iii) of this problem, prove that $L_D \uplus A_{TM} \notin \relangs$ and that $L_D \uplus A_{TM} \notin \corelangs\text.$

Your result from part (iv) shows that, in a very real sense, the language $L_D \uplus A_{TM}$ is a "harder" problem than both $L_D$ and $A_{TM}\text;$ it's neither in $\relangs$ nor in $\corelangs\text!$ But even this nightmarish problem sits within a class of problems that can still be touched by computing power, if only in a very weak sense. Specifically, $L_D \uplus A_{TM}$ is what’s called a $\Delta^0_2$ language, where $\Delta^0_2$ is a class of languages that contains all of $\relangs$ and $\corelangs\text,$ plus a bunch of other problems. And $\Delta^0_2$ itself sits inside even larger classes of languages, stretching outward to infinity.

In a sense, the same way that Cantor's theorem gives a way of finding infinite sets of progressively larger sizes, it's possible to find undecidable problems of progressively "harder" difficulty. Want to learn more? Take Phil 152!

Optional Fun Problem: Undecidability Phase Transitions

Let $A_{TM}^{(n)}$ be the following language:

\[A_{TM}^{(n)} = \Set{ \langle M, w \rangle \suchthat M \text{ is a TM that is at most } n \text{ lines long, } w \text{ is a string, and } M \accepts w }.\]

The language $A_{TM}^{(0)}$ is the empty language because there are no 0-line TMs. As a result, $A_{TM}^{(0)}$ is decidable. Similarly, $A_{TM}^{(1)}$ is decidable - the only legal 1-line TM is the TM Start:, which implicitly rejects every string. $A_{TM}^{(2)}$ is also decidable, but it's a bit more interesting because there are two-line TMs that accept their inputs. More generally, as we increase $n\text,$ we start dealing with more and more complex TMs, and the language becomes more and more complex.

Prove that there exists an $n \in \naturals$ where $A_{TM}^{(n)}$ is undecidable. This shows that there is a "phase transition" in these languages - they begin decidable, are decidable for some range of values of $n\text,$ but then abruptly switch to undecidable.

Grand Challenge Problem: $\mathbf{P} \stackrel{?}{=} \mathbf{NP}$

(Worth an A+, $1,000,000, and a Ph.D)

Prove or disprove: $\mathbf{P} = \mathbf{NP}$.

Take fifteen minutes and try this. Seriously. And if you can’t crack this problem, feel free to submit your best effort, or the silliest answer you can think of.

Congratulations on finishing the last problem set of the quarter!

We’re extremely impressed with how much progress you’ve made since the start of the quarter.

Best of luck on the final exam!