Python Extras


Slide 1

Announcements

  • The final exam information has been posted: https://web.stanford.edu/class/archive/cs/cs106a/cs106a.1228/assessments/final-exam/
  • Tomorrow's lecture is cancelled. We will have lecture on Wednesday and Thursday.
  • Chris needs to move his office hours this week. I will have office hours during the following times:
    • Wednesday 11am-12pm
    • Thursday 10am-12pm
  • For Wednesday's lecture (the last one before our final lecture on Thursday), I'm taking suggestions for things to cover. We already have one suggestion: git and github. If you have other suggestions, which can be Python-related or not, I'll entertain the suggestions.

Slide 2

Today's Topics

  • Reading and writing binary data files
  • Dictionary Comprehensions
  • Sets
  • Queues
  • Multithreading
  • try/except
  • for/while else
  • enumerate

Slide 3

Reading and writing binary data files

  • Let's start by talking about binary files, or files that are not strictly text (e.g., images, binary programs, .zip files, etc.)
  • First, let's discuss writing to a regular text file. You open the file with the "w" parameter, and then use the .write() function. E.g.,
    >>> with open("myfile.txt", "w") as f:
    ...   f.write("Here is some text\n")
    ...   f.write("We must manually put newlines because 'write' does not do it automatically.\n")
    ...
    18
    76
    >>>
    
  • The 18 and 76 are the return values from f.write and indicate how many bytes were written. myfile.txt now looks like this:
    Here is some text
    We must manually put newlines because 'write' does not do it automatically.
    
  • What if we wanted to open and use a file like a .jpg file? We can't simply do it with the regular 1-argument open:
    >>> with open("EarthFromSpace.jpg") as f:
    ...   data = f.read()
    ...
    Traceback (most recent call last):
    File "<stdin>", line 2, in <module>
    File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/codecs.py", line 322, in decode
      (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
    >>>
    
  • Instead, we have to open it in binary mode:
>>> with open("EarthFromSpace.jpg", "rb") as f:
...   data = f.read()
...
>>> type(data)
<class 'bytes'>
>>>
  • Reading binary files lets you read any file (even text files), but you get back a bytes type. It is like a string, but each byte is considered a simple value between 0 and 255 (an "8-bit" value, because 2**8 == 256).
  • Let's look at the first three bytes of the file we read:
    >>> data[:3]
    b'\xff\xd8\xff'
    >>>
    
  • The values are numbers in hexadecimal, or base-16. If we go look up the JPEG format (or just Google b'\xff\xd8\xff'), we find that these first three bytes indicate that the binary file we have read is a JPEG file (and it is).
  • We can actually convert the bytes to an image with our PIL library (and with some help from the io library):
    >>> import PIL.Image as Image
    >>> import io
    >>> image = Image.open(io.BytesIO(data))
    >>> image.show()
    >>>
    
  • If we wanted to write the raw bytes to another file, we can do that:
>>> with open("newimage.jpg", "wb") as f:
...   f.write(data)
...
124524
>>> len(data)
124524
  • So, we see that we wrote out all the bytes into the new image file.

Slide 4

Dictionary Comprehensions

  • We have covered list comprehensions already, but Python also has dictionary comprehensions, which, as you can imagine, perform a similar function on dictionaries: they process a dict and return a dict.
  • Here is a simple example:
    >>> numbers = {'one': 1, 'two': 2, 'three': 3, 'four': 4}
    >>> square_numbers = {k:v**2 for (k,v) in numbers.items()}
    print(square_numbers)
    {'one': 1, 'two': 4, 'three': 9, 'four': 16}
    >>>
    
  • So, what is going on here? The general form is:
    dict_variable = {key:value for (key,value) in dictonary.items()}
    
  • In other words, the comprehension loops through all the elements in the dict and pulls out the key and value for each element, and then uses it to make some update so that a new dict can be formed.

  • Just like with list comprehensions, you can filter, as well:
    >>> state_details = {'state': 'CA', 'flower': 'poppy', 'bird': 'quail', 'marine mammal': 'California Gray Whale', 'motto': 'Eureka!'}
    >>> just_upper = {k:v for (k,v) in state_details.items() if v[0].isupper()}
    >>> print(just_upper)
    {'state': 'CA', 'marine mammal': 'California Gray Whale', 'motto': 'Eureka!'}
    >>>
    
  • Dictionary comprehensions can be useful if you want to make changes to a dictionary, or to filter it. I don't use them often in my own code, but they are there if you need them.

Slide 5

Sets

  • Python has a built-in data type called a set. Sets are a common way to store unique elements, and they are fast, too. Python sets can also be used to get intersections, unions, and other math set-like output. Here is an example:
    >>> s1 = set()
    >>> s1.add(1)
    >>> s1.add(5)
    >>> s1.add(9)
    >>> s1.add(5)
    >>> print(s1)
    {1, 5, 9}
    
  • As you can see, when you add elements to a set, if the element is already in the set, it is ignored.
  • You can create a set all at once using curly braces, e.g., s2 = {5, 8, 2}.
  • Sets are not ordered in Python (they are ordered in many other languages, though):
    >>> s2 = {5, 8, 2}
    >>> s2
    {8, 2, 5}
    
  • You can perform some neat operations using sets:
    >>> s1
    {1, 5, 9}
    >>> s2
    {8, 2, 5}
    >>> s1.union(s2)
    {1, 2, 5, 8, 9}
    >>> s1.intersection(s2)
    {5}
    >>> s1.isdisjoint(s2)
    False
    >>> s1.isdisjoint({2, 4, 8})
    True
    >>> s1.issubset({1, 2, 3, 4, 5, 6, 7, 8, 9})
    True
    
  • If you do want a sorted list of elements from a set, just convert it to a list first, and then use sorted:
    >>> s2
    {8, 2, 5}
    >>> sorted(list(s2))
    [2, 5, 8]
    
  • If you want to find all the unique elements in a list, just convert to a set (but remember, it isn't sorted!):
    >>> set([1, 2, 20, 6, 210, 1, 5, 9, 8, 1, 3])
    {1, 2, 3, 5, 6, 8, 9, 210, 20}
    >>>
    

Slide 6

Queues

  • A queue is a data structure that allows first-in-first-out access to elements. You can think of a queue as a line in a supermarket – the first person in the line is the first person handled, and the last person in line is the last person handeld. If someone comes into the line, they go to the back of the line, and have to wait for all other people to be served first.
  • Python has a queue library (meaning that you need to import queue to use it). Here is an example:
>>> supermarket_line = queue.Queue()
>>> supermarket_line.put(4)
>>> supermarket_line.put(8)
>>> supermarket_line.put(12)
>>> supermarket_line
<queue.Queue object at 0x7f8632931d90>
>>> supermarket_line.get()
4
>>> supermarket_line.get()
8
>>> supermarket_line.get()
12
>>> supermarket_line.get()
(hangs!)
  • You can only get the next element in a queue – it has no random access (like a list).
  • In the example above, the .put() function is used to put a new element into the queue. When supermarket_line.get() is called, the first element in the queue is returned. Then, the next element in is returned on the next supermarket_line.get().
  • It turns out that if you have an empty queue and try to get() from it, your program hangs (freezes)!
    • This is because queues are usually used in multithreading programs – see below for an example of this. If another part of your program puts something into the queue, then your get() will return that value.
  • If you know that your queue might be empty, you should check for that before calling get():
    >>> supermarket_line = queue.Queue()
    >>> for n in (1, 3, 5, 7, 9):
    ...   supermarket_line.put(n)
    ...
    >>> while not supermarket_line.empty():
    ...   print(supermarket_line.get())
    ...
    1
    3
    5
    7
    9
    >>>
    
  • You can see the size of a queue using supermarket_line.qsize() (you cannot use len(supermarket_line)), but you should generally use the while not supermarket_line.empty() approach above if you are removing elements in a loop.

Slide 7

Multithreading

  • Multithreading is the ablity for your program to seem like two or more different parts of the program are running simultaneously. Python comes with a threading library for this purpose. The way Python is built, two parts of your program cannot literally be running at the same time (many other languages, like C and C++ do allow this), but it suits most purposes (e.g., it is used extensively for graphics in tkinter). If you take CS111, you will cover multithreading in detail.
  • Here is an example:
    • The count_up function will be run in its own thread, and every second it will add the next higher number to the count_queue.
    • The pull_from_queue function pulls numbers from the queue the value it gets reaches MAX_VAL
    • If there aren't any numbers in the queue, the function counts as fast as it can.
    • Then it gets the next number and prints it.
    • main() starts count_up in a thread, and then asks the user to hit <return> when ready.
import time
import threading
import queue

MAX_VAL = 10

def count_up(count_queue):
    for i in range(MAX_VAL + 1):
        time.sleep(1) # sleep for 1 second
        count_queue.put(i)

def pull_from_queue(count_queue):
    print(f'Current queue size: {count_queue.qsize()}')
    while True:
        next_value = count_queue.get()
        print(next_value)
        if next_value == MAX_VAL:
            break # stop the loop
        if count_queue.qsize() == 0:
            print("This thread can do stuff at the same time as the other thread.")
            local_count = 0
            while count_queue.qsize() == 0:
               local_count += 1
            print(f"I counted to {local_count} while waiting. Now I'll get the next value.")

def main():
    count_queue = queue.Queue()
    count_thread = threading.Thread(target=count_up, args=[count_queue])
    count_thread.start() # this runs count_up(count_queue) in its own thread
    input("When you are ready to begin, press <return>")
    pull_from_queue(count_queue)
    count_thread.join() # this cleans up the thread we set up

if __name__ == "__main__":
    main()

Let's assume the user waited about three seconds before hitting <return>. This might be the output:

% python3 thread_ex.py
When you are ready to begin, press <return>
Current queue size: 3
0
1
2
This thread can do stuff at the same time as the other thread.
I counted to 930072 while waiting. Now I'll get the next value.
3
This thread can do stuff at the same time as the other thread.
I counted to 1608413 while waiting. Now I'll get the next value.
4
This thread can do stuff at the same time as the other thread.
I counted to 1691159 while waiting. Now I'll get the next value.
5
This thread can do stuff at the same time as the other thread.
I counted to 1684888 while waiting. Now I'll get the next value.
6
This thread can do stuff at the same time as the other thread.
I counted to 1705034 while waiting. Now I'll get the next value.
7
This thread can do stuff at the same time as the other thread.
I counted to 1656617 while waiting. Now I'll get the next value.
8
This thread can do stuff at the same time as the other thread.
I counted to 1613523 while waiting. Now I'll get the next value.
9
This thread can do stuff at the same time as the other thread.
I counted to 1639027 while waiting. Now I'll get the next value.
10

Slide 8

try/except

  • Python has the ability to catch errors that happen so your program doesn't crash. This is useful in many situations (though not all – sometimes you actually want your program to crash if the logic has gone awry).
  • For example, let's say you asked for your user to type a decimal number. You might have something like this:
    num = float(input("Please type a decimal number: "))
    print(f"Your number: {num}")
    
    % python3 ask_for_num.py
    Please type a decimal number: 4.3
    Your number: 4.3
    

    But, what if the user typed something that wasn't a number?

    % python3 ask_for_num.py
    Please type a decimal number: abc
    Traceback (most recent call last):
    File "/Users/tofer/GoogleDriveCG/cs106a-summer-2021/website/lectures/25-python-extras/ask_for_num.py", line 1, in <module>
      num = float(input("Please type a decimal number: "))
    ValueError: could not convert string to float: 'abc'
    

    Your program will crash! We use the try/except functionality to catch the problem when it happens:

    try:
      num = float(input("Please type a decimal number: "))
      print(f"Your number: {num}")
    except ValueError:
      print("You didn't type a number!")
    
    % python3 ask_for_num.py
    Please type a decimal number: abc
    You didn't type a number!
    
  • Here, we knew we might get a ValueError, so we had our program try a code block, and if that code block produces a ValueError, then the except block runs, and our program doesn't crash.
  • You can see a list of exceptions that Python handles by default here.
  • You can also create your own exception types, but that is rarely necessary.
  • You can, if absolutly necessary, have an except without any specific exception (it would just be except:), but you want to avoid that as you won't be able to tell what caused the error.
  • Here is another example:
    def read_file(filename):
        try:
            with open(filename) as f:
                lines = f.readlines()
            for line in lines:
                print(line)
        except FileNotFoundError:
            print(f"The file '{filename}' was not found.")
    
    def main():
        read_file("somefile.txt")
    
    % python3 ask_for_num.py
    The file 'somefile.txt' was not found.
    

Slide 9

for/while else

  • Another Python feature we are going to look at is another one that I have rarely used, but that can make looping code easier.
  • Python is the only language I know of that has an else clause for both the for loop and the while loop. It is used if you want to do something if your loop exits normally (e.g., when the top-line condition causes the loop to stop). Here is an example:
    >>> while a < 5:
    ...     print(a)
    ...     a += 1
    ... else:
    ...     print("The loop made it without breaking out")
    ...
    0
    1
    2
    3
    4
    The loop made it without breaking out
    >>>
    

    Here is a converse example:

    >>> a = 0
    >>> while a < 5:
    ...     print(a)
    ...     a += 1
    ...     if a == 2:
    ...         break
    ... else:
    ...     print("The loop made it without breaking out")
    ...
    0
    1
    >>>
    
  • You can also use a similar construct with a for loop, and it is somewhat more useful. For example, let's say you had a list and wanted to loop through it until you got to a particular value, but stop once you reach that value. If the value isn't in the list, you want to do something else. E.g.,
    lst = [1, 3, 5, 7, 9]
    found_val = False
    for val in lst:
      if val == 5:
          print("Found 5!")
          found_val = True
          break
    if not found_val:
      print("Did not find 5 :(")
    
  • This necessitates a boolean found_val, which is a bit ugly.
  • Instead, you could do the following:
    lst = [1, 3, 5, 7, 9]
    for val in lst:
      if val == 5:
          print("Found 5!")
          break
    else:
      print("Did not find 5 :(")
    
  • No more need for the boolean. If the loop exited normally (by going through all the values in lst), then the else block runs.
  • Note that the choice of the word else was probably a bad one (and the creator of Python has admitted as much). Once you understand it, it is useful, but seeing it in someone elses code (no pun intended) is often somewhat jarring.

One more thing: enumerate

  • The last topic we are going to cover today is one that I actually use quite frequently in my own code. Sometimes, you want to loop through some list or other iterable, but you want both the elements from the list and you want the index of the element you are on. We've often done this the following way – we've looped over a range and then extracted the value, e.g., for a string:
s = "hello"
for i in range(len(s)):
  c = s[i]
  print(i, c)

Output:

0 h
1 e
2 l
3 l
4 o
  • This is great, but there is an easier way, using enumerate:
    s = "hello"
    for i, c in enumerate(s):
      print(i, c)
    
  • This accomplishes the same thing, and we don't have to manually extract the character using the index (we also don't need a range).

  • We can also use enumerate on other data structures that don't index directly, like sets. Could we do this?
my_set = {'hello', 'goodbye', 'seeya', 'toodleoo'}
for i in range(len(my_set)):
    # get the string associated with i?
    s = my_set[i] # error!
  • Nope! There is no way to index into a set. But, if we cared to count along while we extracted elements from the set, we could use enumerate (but remember, sets are not ordered in Python!):
my_set = {'hello', 'goodbye', 'seeya', 'toodleoo'}
for i, s in enumerate(s):
    print(i, s)

Output:

0 seeya
1 goodbye
2 hello
3 toodleoo