Homework 4 - Cryptography

This program exercises strings, lists, and main(). There is also a short ethics question in the homework folder, and the lecture for that will be monday. All parts of HW4 are due Wed Oct 23rd at 11:55pm.

Warmup Functions

We have some warmup string and list functions to get started.

> strlisthw section on the server

Some problems say to solve with a particular loop form, so you get practice with both forms. As a reminder, here are the two forms:

Solve with for/i/range loop:

for i in range(len(s)):
    # use s[i]

Solve with for/ch/s loop: (Fri lecture)

for ch in s:
    # use ch

Turn In: Turn In HW4 Warmups to Paperless

Cryptography Introduction

Download the cryptography.zip and open the "cryptography" folder in PyCharm to get started.

The beginnings of Computer Science are deeply tied up with the famous Alan Turing "Enigma Code" cryptography work in the heat of World War II, so it's neat that we can go a little bit into the area with this project.

We'll start with a little terminology. In cryptography, we take the original "plaintext", and encrypt it under the control of a key word, yielding an unintelligible "ciphertext". Decryption goes in the opposite direction, using the key to recover the plaintext from the ciphertext. Anyone who intercepts the ciphertext cannot make any sense of it without the key.

alt: plaintext to ciphertext and back, under control of key

In fact these same elements describe how the encryption of the data on your phone works: There is a key stored in a chip in the phone, and it is only released by, say, the correct fingerprint. That key is used to encrypt and decrypt all the data stored on the phone. Without the key, nothing can be read back.

For this project you'll implement a simple cipher, where each alphabetic char is encrypted as a different alphabetic char — so for example each 'a' is encrypted as an 'x' and 'i' as a 'b', and so on. The cipher is simple but includes all the features of cryptography — plain and cipher texts, encryption and decryptions, all under the control of a key.

History aside: The Enigma Machine of World War II did character-to-character encryption like this, but shifting to a different mapping for each subsequent character. There's lots of interesting history and applied mathematics in cryptography. For a great introduction, see Simon Singh's "The Code Book".

Functions and Doctests

For this program, we are providing the function decomposition and Doctests. You can concentrate on writing all the code and seeing how the functions fit together.

a. compute_slug(key)

We will use an encryption scheme that is simple enough it can be done on paper, as was done throughout history up until world war II.

We'll say that a "slug" is a len-26 list of the 26 lowercase chars a-z in some order. In a later step, the slug will drive the encryption of each char. The problem at this stage is to compute a len-26 slug from an easily memorized key like 'Bananas!'. All key/slug formation is done with lowercase alphabetic chars. ("slug" is a borrowed typesetting term, referring to a line of letters in a row.)

Here is the algorithm to compute a slug from a key string: Use only the alphabetic chars in the key, and for each char, use its lowercase form. The first char is the first char of the slug, the second char is the second of the slug, and so on through all the chars of the key. However, if a char is already in the slug, skip it. A char can never be in the slug twice.

With the first few chars of the slug provide by the key, complete the slug by going through the chars of the regular alphabet a, b, c, d, .. and appending each char to the end of the slug if it is not already in the slug. Obtain the a, b, c, d .. chars from the ALPHABET constant, shown below. The resulting slug list will be exactly length 26 and contain every a-z char once at some position.

Here is the slug for the key 'Bananas!' You can see how the 'b', 'a', 'n', 's' go at the front of the slug, followed by the rest of the regular alphabet.

compute_slug('Bananas!') -> 
['b', 'a', 'n', 's', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'o', 'p', 'q', 'r', 't', 'u', 'v', 'w', 'x', 'y', 'z']

The starter code includes the definition of a constant ALPHABET which is the a-z lowercase letters in a list.

# provided ALPHABET constant - list of the regular alphabet
# in lowercase. Refer to this simply as ALPHABET in your code.
# This list should not be modified.
ALPHABET = ['a', 'b', 'c', 'd', ..., 'y', 'z']

You should refer to this constant as ALPHABET in your code at spots where you need the standard a-z chars. Do not change the ALPHABET list; constants like this are intended to be a read-only resource for the code, and it's a convention to give them upper-case names.

Implement the compute_slug() function. The 'z' key is so simple, its slug can be worked out by hand, which is how the Doctest was written.

compute_slug('z') -> 
['z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']

b. encrypt_char(source, slug, ch)

The function encrypt_char(source, slug, ch) is the heart of the program, and it's very algorithmic bit of coding. The function takes in a "source" parameter which is a list of the lowercase characters that can be encrypted, a "slug" list of the same length which shows the encrypted char for each source char, and "ch" the char to encrypt.

To think about the encryption algorithm, here are source and slug lists just length 4, and some of the provided Doctests also use these length-4 lists. (This is the sort of little example case that comes up in office hours to work out an algorithm).

source = ['a', 'b', 'c', 'd']
slug   = ['d', 'c', 'b', 'a']

1. To encrypt the input char, find the lowercase version of that char in the source list. If it is in there, then the char at the same index in the slug list is its encrypted form.

e.g. with the lists above, the encrypt of 'a' is 'd', and the encrypt of 'c' is 'b'.

2. The output encrypted char should have the same upper/lower case as the input char.

e.g. the encrypt of 'A' is 'D', and the encrypt of 'C' is 'B'.

3. If the input char is not in the source list, the char itself is its encrypted form.

e.g. the encrypt of space ' ' and comma ',' are just ' ' and ','

Do not use the ALPHABET constant in this function. Use the "source" parameter - we let the caller of this function specify exactly the source/slug pair they want by passing them both in. Foreshadowing: this flexibility will enable a neat trick later.

The Doctests are provided. The first tests use the length-4 lists above. Later tests obtain a slug by calling your compute_slug() function from part (a), storing the slug in a variable named z_slug. The test then uses ALPHABET as the source and z_slug as the slug:

>>> z_slug = compute_slug('z')
>>> encrypt_char(ALPHABET, z_slug, 'A')
'Z'

This means compute_slug() should be working and tested before you can test encrypt_char().

Here are the chars that make up the 'z' slug for reference, so you can try encryptions by hand. For example 'a' should encrypt to 'z' and 'H' should encrypt to 'G'.

source: abcdefghijklmnopqrstuvwxyz
z-slug: zabcdefghijklmnopqrstuvwxy

c. encrypt_str(source, slug, s)

This function is just an application of your source/slug encryption to every char in a string. The test uses the last line of the poem The Eagle — 'And like a thunderbolt he falls.'

d. decrypt_str(source, slug, s)

Given a string which was encrypted with the given source/slug, return its decrypted form.

Here you will have a chance to appreciate a small, satisfying moment of decomposition. The first important feature of decomposition is the ability to proceed in steps smaller than the whole program, testing each step on its own. You've done that many times. Another key advantage of decomposition is re-use of code in multiple places.

The decrypt_str() function can be fully implemented with one line of code after its def. The test case here just reverses the earlier encryption of the The Eagle.

e. encrypt_file(), decrypt_file()

Implement these functions, which take in a filename and key, and print out either the encrypted or decrypted lines of the file. Use the standard file reading loop to read through all the lines of the file.

For reference, here is the standard file reading loop, as seen in the crazycat.zip lecture example.

def print_file_plain(filename):
    """
    Given a filename, read all its lines and print them out.
    This shows our standard file-reading loop.
    """
    with open(filename) as f:
        for line in f:
            line = line.strip()
            print(line)

Most of the time, functions return their results. These two functions use the the other, less common pattern, using print() to express their output as text written to standard output. Doctests are provided for these functions using the "z" key and the small files "test-plain.txt" and "test-crypt.txt" as input. Doctest trick: usually a Doctest looks at the result of a function. However, Doctests will also consider anything printed by the function as an additional sort of informal output. That's how these functions are tested.

These encrypt/decrypt file functions both take in a "key" parameter, e.g. 'Bananas'. Note that the helper functions like encrypt_str() do not take in a key, but instead take in a slug. Your code will need to solve that mismatch.

A minor style idea about function calls: code will often call a helper function to obtain a needed value. However think about what is happening if the call is done inside a loop. If the function call is computing the exact same value again and again, it's a little wasteful. In that case, you could call the function once outside the loop, storing its result in a variable for use in the loop.

f. main()

Now it's time to implement a Python main().

def main():
    args = sys.argv[1:]
    # args is the list of command line arguments
    # 2 commmand line argument patterns:
    # -encrypt key filename
    # -decrypt key filename
    # Call encrypt_file() or decrypt_file() based on the args.
    pass

The real algorithmic works is in your earlier functions. The main() is just a little wiring things together, looking at the command line args and calling your functions, passing each function the right value to get started. This program has two modes -encrypt and -decrypt, all controlled by the "args" list holding the argument strings the user typed on the command line.

Suppose we are using the key "zebra" to encrypt the file "poem.txt". The command line to encrypt is this:

$ python3 crypto.py -encrypt zebra poem.txt
Pmqaq Zpa Par
...

For the above command line, len(args) == 3, args[0] is '-encrypt', args[1] is the key 'zebra', and args[2] is the filename 'poem.txt'. Add code in main() so when the command line starts with '-encrypt', main() calls your encrypt_file() function with the appropriate parameters.

Similarly, this command line decrypts a file:

$ python3 crypto.py -decrypt key filename

...

Add code in main() so the above command line calls the decrypt_file() function with the appropriate parameters.

Running With Real Data

With the main() filled in, you can run from the command line with real data and see the output. Here the "cat" command is used in the terminal to show the plaintext first ("cat" works in the latest Windows PowerShell, the older equivalent Windows command is "type"). Remember to use the tab-key to fill out the filenames vs. typing them.

$ cat the-eagle.txt
He clasps the crag with crooked hands; 
Close to the sun in lonely lands, 
Ring'd with the azure world, he stands. 

The wrinkled sea beneath him crawls; 
He watches from his mountain walls, 
And like a thunderbolt he falls.

--Alfred, Lord Tennyson
$ 
$ # encrypt with the key "z"
$ python3 crypto.py -encrypt z the-eagle.txt
Gd bkzror sgd bqzf vhsg bqnnjdc gzmcr;
Bknrd sn sgd rtm hm knmdkx kzmcr,
Qhmf'c vhsg sgd zytqd vnqkc, gd rszmcr.

Sgd vqhmjkdc rdz admdzsg ghl bqzvkr;
Gd vzsbgdr eqnl ghr lntmszhm vzkkr,
Zmc khjd z sgtmcdqanks gd ezkkr.

--Zkeqdc, Knqc Sdmmxrnm

The file "independence-crypt.txt" contains part of the declaration of independence encrypted with the key "Bleh". Try decrypting it with your code.

$ cat independence-crypt.txt 
Wfan gn tfa Eoursa oc fumbn avants gt laeomas naeassbry cor ona paopka
to hgssokva tfa pokgtgebk lbnhs wfgef fbva eonnaetah tfam wgtf bnotfar
bnh to bssuma bmond tfa powars oc tfa abrtf, tfa sapbrbta bnh aqubk
stbtgon to wfgef tfa Kbws oc Nbtura bnh oc Nbtura's Doh antgtka tfam,
b haeant raspaet to tfa opgngons oc mbnjgnh raqugras tfbt tfay sfoukh
haekbra tfa ebusas wfgef gmpak tfam to tfa sapbrbtgon.

Wa fokh tfasa trutfs to la sakc-avghant, tfbt bkk man bra erabtah
aqubk, tfbt tfay bra anhowah ly tfagr Erabtor wgtf eartbgn unbkganblka
Rgdfts, tfbt bmond tfasa bra Kgca, Kglarty bnh tfa pursugt oc
Fbppgnass.
$
$ python3 crypto.py -decrypt Bleh independence-crypt.txt
When in the Course of human events it becomes necessary for one people
to dissolve the political bands which have connected them with another
and to assume among the powers of the earth, the separate and equal
station to which the Laws of Nature and of Nature's God entitle them,
a decent respect to the opinions of mankind requires that they should
declare the causes which impel them to the separation.

We hold these truths to be self-evident, that all men are created
equal, that they are endowed by their Creator with certain unalienable
Rights, that among these are Life, Liberty and the pursuit of
Happiness.

Look carefully at the cipher and plain texts, and you can see that the words and punctuation are all intact, just the letters have been shifted around by the encryption. The file "the-eagle-crypt.txt" has its encrypted form under the key "z".

All Done

This is a nice example of a real Python program - ripping through data files and applying a real computation. The functions build on each other to solve the whole thing.

When your code is running nicely and has cleaned up style, turn in your crypto.py file and your ethics-crypto.txt files on paperless as hw 4.2.

Mystery Crypt

The file mystery-crypt.txt contains this cipher text:

Wi'qi km stqckliqs tm hmvi
Ymu gkmw tbi quhis ckd sm dm E
C nuhh rmjjetjikt's wbct E'j tbekgekl mn
Ymu wmuhdk't lit tbes nqmj cky mtbiq luy
E fust wckkc tihh ymu bmw E'j niihekl
Lmttc jcgi ymu ukdiqstckd
Kiviq lmkkc levi ymu uo
Kiviq lmkkc hit ymu dmwk
Kiviq lmkkc quk cqmukd ckd disiqt ymu
Kiviq lmkkc jcgi ymu rqy
Kiviq lmkkc scy lmmdayi
Kiviq lmkkc tihh c hei ckd buqt ymu

The encryption key for this one is a nice color around Stanford. Just for fun, you can try decrypting with guesses to figure it out.

Addendum - Output Capture With >

This is just FYI, not part of the homework - how to capture the output of a program run in the terminal?

Normally when a program runs and does any printing, the text output appears in the terminal. There is a famous terminal option (works in Windows too) to put "> output.txt" at the end of the command line, and that captures the output text and saves it to the named text file. You can run your program a few different ways, seeing the output directly each time. When you have a run you like, run it with ">" to capture that output in a file. That's how the data professionals do it and it's a handy feature of the command line. This is how the files like the-eagle-crypt.txt were made:

$ python3 crypto.py -encrypt z the-eagle.txt > test-crypt.txt
$ cat test-crypt.txt
Gd bkzror sgd bqzf vhsg bqnnjdc gzmcr; 
Bknrd sn sgd rtm hm knmdkx kzmcr, 
Qhmf'c vhsg sgd zytqd vnqkc, gd rszmcr. 

Sgd vqhmjkdc rdz admdzsg ghl bqzvkr; 
Gd vzsbgdr eqnl ghr lntmszhm vzkkr, 
Zmc khjd z sgtmcdqanks gd ezkkr.

--Zkeqdc, Knqc Sdmmxrnm

Background and History

This assignment was created by Nick Parlante as part of the new Stanford CS106A in Python. The goal was a program that processed text out of a file, combined with a non-trivial computation built with several functions and Doctests.