Section 4. File Reading and Dictionaries


Section materials curated by our head TA Iddah Mlauzi, drawing upon materials from previous quarters.

This week, we'll be practicing two powerful skills in data processing: reading data in from text files, and storing that data in dictionaries. This will be great practice for your next homework ☺. We'll start with a review of dictionaries and then use nested dictionaries to represent complex information.

Here's the Section 4 starter code. You'll unzip/extract this file and open it in PyCharm just like you do with your assignment starter code.

Dictionary Review

Here's a problem you can work through in the Python interpreter to review some common dictionary operations. Let's say we have a dictionary classes where the keys are class names (strings) and the values are the unit count for that class (ints).

>>> classes = {'CS106A': 5, 'PSYCH1': 5, 'PWR1': 4}

Write a single line of code to:

  • Access the value for the key 'PSYCH1'.
  • Update the value for the key 'CS106A', changing it from 5 to 3.
  • Add a new key/value pair to the dictionary, 'AMSTUD107' with value 4.
>>> # Initialize dictionary
>>> classes = {'CS106A': 5, 'PSYCH1': 5, 'PWR1': 4}
>>>
>>> # Access the value for the key 'PSYCH1'
>>> classes['PSYCH1']           
5
>>>
>>> # Update the value for the key 'CS106A', changing it from 5 to 3
>>> classes['CS106A'] = 3      
>>> classes
{'CS106A': 3, 'PSYCH1': 5, 'PWR1': 4}
>>>
>>> # Add a new key/value pair to classses, 'AMSTUD107' with value 4
>>> classes['AMSTUD107'] = 4    
>>> classes
{'CS106A': 3, 'PSYCH1': 5, 'PWR1': 4, 'AMSTUD107': 4}

Grocery Lists Dictionaries

In this problem, you'll create a helpful grocery dictionary from your disorganized grocery list.

Your Data

Throughout the week, you make notes of ingredients, where to buy them from, and how many to buy. Unfortunately, you lazily jot down each item at the bottom of a long list, rather than checking if it was already in the list and keeping a tally. When it's time to go shopping, you've got a grocery list that looks like this:

safeway:eggs,1
costco:croissant,12
safeway:coconut milk,3
target:sugar cookies,1
safeway:flour,1
safeway:eggs,2
trader joes:pineapple,4
trader joes:grapes,1
costco:coffee,2
trader joes:kale,1

How do you know how many eggs you need to buy at Safeway? It's pretty tedious to scan through the entire list and tally each item in case of any duplicates. We're going to write a program that can help.

The groceries Dictionary

We want to build a dictionary, which we'll call groceries that stores your items so that it's easy to see what you need from each store. Here's what the groceries dict would look like for the example list above:

groceries = {
    'safeway': {'eggs': 3, 'coconut milk': 3, 'flour': 1},
    'costco': {'croissant': 12, 'coffee': 2},
    'target': {'sugar cookies': 1},
    'trader joes': {'pineapple': 4, 'grapes': 1, 'kale': 1}
}

Now that our data's organized in a new way, it's easier to see that we need 3 cartons of eggs from Safeway. Our groceries dictionary has store names as the keys and then inner dictionaries as values. Within each store's inner dictionary, the keys are item names, and the value is the number of that item we need to buy this week.

Warmup: Navigating Dictionaries

Let's consider a smaller groceries dictionary, shown below, that we got after reading the first three lines of our grocery list and putting them into our dictionary. How would this dictionary look after reading in the following lines from our file? Hint: It's a good idea to write this out!

groceries = {
  'safeway': {'eggs': 1, 'coconut milk': 3}, 
  'costco': {'croissant': 12}
}
  1. target:sugar cookies,1
  2. safeway:flour,1
  3. safeway:eggs,2

Adding Items

Now that you've got a feel for how to add a single item to groceries, let's implement this process in Python. We want to write the function add_item(groceries, store, item, num) which takes a groceries dict like we've seen above, and adds a new item to our dict as specified by the parameters store, item, num. If you check out the starter project, we've provided some code to help you get started.

Hint: Let the different cases from the warmup to guide your code. What do we do if we haven't seen a store name before? What if we've seen both the store and item before?

Making the Grocery Dict

Now, you'll read your entire grocery list in from a text file and construct a groceries dictionary. To implement the function make_groceries(filename), call your add_item() helper function in a file reading loop to build up a groceries dictionary, then return groceries at the end of the function after reading all the lines of the file.

For a refresher on what these grocery files look like, check out short_list.txt or long_list.txt in the starter project. You can use .find() or .split() to separate out the important parts of each line.

Running your program

You did it - you've got a program that can turn a messy grocery list into a helpful dictionary! To run your program from the terminal, run groceries.py with one argument (a text filename) to read in all the data from that file and print out a summary of your groceries. Remember to replace python3 with py or python on Windows computers.

$ python3 groceries.py long_list.txt
You need 3 eggs(s) from safeway
You need 3 coconut milk(s) from safeway
You need 1 flour(s) from safeway
You need 12 croissant(s) from costco
You need 2 coffee(s) from costco
You need 1 sugar cookies(s) from target
You need 4 pineapple(s) from trader joes
You need 1 grapes(s) from trader joes
You need 1 kale(s) from trader joes
def add_item(groceries, store, item, num):
    """
    Given a groceries dict which may
    already contain some data, update the
    dict to add new data for the given
    store, item, and num of that item.
    >>> add_item({}, 'safeway', 'eggs', 1) # new store, new item
    {'safeway': {'eggs': 1}}
    >>> add_item({'safeway': {'eggs': 1}}, 'costco', 'croissant', 12) # new store, new item
    {'safeway': {'eggs': 1}, 'costco': {'croissant': 12}}
    >>> add_item({'safeway': {'eggs': 1}, 'costco': {'croissant': 12}}, 'safeway', 'eggs', 2) # seen store, seen item
    {'safeway': {'eggs': 3}, 'costco': {'croissant': 12}}
    >>> add_item({'safeway': {'eggs': 3}, 'costco': {'croissant': 12}}, 'safeway', 'coconut milk', 3) # seen store, new item
    {'safeway': {'eggs': 3, 'coconut milk': 3}, 'costco': {'croissant': 12}}
    """
    if store not in groceries:
        groceries[store] = {}

    inner = groceries[store]
    if item not in inner:
        inner[item] = 0
    inner[item] += num
    '''
    can also do:
    if item not in inner:
        inner[item] = num
    else:
        inner[item] += num
    '''

    return groceries


def make_groceries(filename):
    """
    Given a grocery list file, where each
    line is in the format 'store:item,num'
    create and return the groceries dict
    made from this list.
    Hint: Use your helper function!
    >>> make_groceries('short_list.txt')
    {'safeway': {'eggs': 3, 'coconut milk': 3}, 'costco': {'croissant': 12}}
    """
    groceries = {}
    with open(filename) as f:
        for line in f:
            line = line.strip()
            colon = line.find(':')
            comma = line.find(',')
            store = line[:colon]
            item = line[colon + 1:comma]
            num = int(line[comma + 1:])
            add_item(groceries, store, item, num)
    return groceries


def print_groceries(groceries):
    """
    (provided)
    Prints contents of groceries dict.
    """
    for store in groceries:
        items = groceries[store]
        for item in items:
            count = items[item]
            print('You need ' + str(count) + ' ' + item + '(s) from ' + store)


def main():
    args = sys.argv[1:]
    # to run from terminal:
    # python3 groceries.py filename      # prints out all groceries
    if len(args) == 1:
        groceries = make_groceries(args[0])
        print_groceries(groceries)


if __name__ == '__main__':
    main()

Smallest Unique Positive Integer

In the file smallest_int.py, implement the function find_smallest_int(filename) that takes as a parameter a filename (which is just a string) that refers to a file containing a single integer on each line, and returns the smallest unique positive integer in the file. An integer is positive if is greater than 0, and unique if it occurs exactly once in the file. For example, suppose numbers1.txt looks like this:

42
1
13
12
1
-8
20

Calling find_smallest_int('numbers1.txt') would return 12. You may assume that each line of the file contains exactly one integer (although it may not be positive) and that there is at least one positive integer in the file. To test your code, you can run python3 smallest_int.py numbers1.txt, replacing python3 with py or python on Windows computers. You can switch the filename to numbers2.txt to try out another text file, for which your function should return 7.

def find_smallest_int(filename):
    nums_so_far = []
    duplicates = []
    with open(filename) as f:
        for line in f:
            num = int(line)
            if num > 0:
                # if we've seen this number already
                if num in nums_so_far:
                    # record that it's a duplicate
                    duplicates.append(num)
                # note that we've seen this number
                nums_so_far.append(num)

    uniques = []
    for elem in nums_so_far:
        if elem not in duplicates:
            uniques.append(elem)
    return min(uniques)