Lecture 18: Dictionaries Part 1
July 21st, 2021
Today: new data type dict and the dict-count algorithm
Announcements
- Lecture recordings are posted on Canvas under the "Panopto Course Videos" tab
- There is a link to our Canvas page on the cs106a website
- HW5 comes out later today
Dict - Hash Table - Fast
- Python "dict" type
- A key/value "dictionary"
- In CS generally known as a "hash table"
Sounds like a real hacker thing
- Dict is a bit advanced, compared to basic string/list
- The defining feature of dicts:
Can store and then look up data, and be fast about it
Dict Story Arc
- Have: have some big data set, data is not organized, perhaps random order
- Dict:
Pick out data items we want
Store each item under a key in the dict
- Done: now the data is organized by key
- The dict is fast doing get/set by key, its defining superpower
- Job interview pattern:
Interview question has some messed up data
Best answer inevitably uses a dict to organize the data
Because the dict is powerful and fast...
Interviewers cannot resist using it
Dict Basics
See Python Guide for more detail: Python Dict
- Dictionaries assosciate a key with a value
- key is a unique identifier
- value is a value associated with that key
- In drawing with an arrow:
key -> value
- Examples in the real world:
- Phonebook
Keys: names
Values: phone numbers
- Dictionary
keys: words
Values: definitions
- Creating Dictiionaries
- Dictionaries start/end with curly braces { }
- Key:value pairs are separated by a colon
- Each pair is separated by a comma
squares = {2: 4, 3: 9, 4: 16, 5: 25}
ages = {'Greta': 18, 'Kamala': 56, 'Yaa': 32}
greek_alphabet = {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
Style: one space after colon and comma
Dict-1. Set key:value into Dict
- Create empty dict:
d = {}
- Set:
d[key] = value
- e.g.
d['a'] = 'alpha'
- Set creates that key entry in dict if needed
- Overwrites any previous value for that key
>>> d = {} # Start with empty dict {}
>>> d['a'] = 'alpha' # Set key/value
>>> d['g'] = 'gamma'
>>> d['b'] = 'beta'
>>> # Now we have built the picture above
>>> # Python can display dict with the literal syntax
>>> d
{'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
>>>
Dict-2: Get value out of Dict
- Get a value out of a dict by key
- Get:
d[key]
- returns the value for that key
- e.g.
d['a']
returns 'alpha'
- Handy:
+=
Does a get/set series on the value
This will be a handy pattern
- e.g.
d['a'] += '!!!'
Equivalent to: d['a'] = d['a'] + '!!!'
Adds '!!!' to end of that value
- Can also do += if the value is an int/float
Increments the value by that amount
>>> d['g'] # Get by key
'gamma'
>>> d['b']
'beta'
>>> d['a'] = 'apple' # Overwrite 'a' key
>>> d['a']
'apple'
>>>
>>> # += modify value
>>> d['a'] += '!!!'
>>> d['a']
'apple!!!'
>>> # Dict literal format, curl-braces, key:value
>>> d
{'a': 'apple!!!', 'g': 'gamma', 'b': 'beta'}
>>>
>>>d = {}
>>>d['Greta'] = 18
>>>d['Greta'] += 1 # Increase Greta's age by one
>>>
Dict-3 Get Error / "in" Test
- There is one big catch
- Get [ ] a value out of a dict only works if the key is in the dict
- If the key is not in the dict, [ ] is an error
- Use in to check if a key is in the dict
- The in* check is for keys, not values
All important dict logic is with the key, the value is just stored, not looked at
- Before using [ ] to get a value, need in-check first to see that it is there
Sort of the "guard" pattern again
>>> # Can initialize dict with literal
>>> d = {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
>>>
>>> d['x'] # Key not in -> Error
Error:KeyError('x',)
>>>
>>> 'a' in d # "in" key tests
True
>>> 'x' in d
False
>>>
>>> # Guard pattern
>>> if 'a' in d:
val = d['a']
>>>
>>> # in uses key, not value, so this does not work:
>>> 'alpha' in d
False
>>>
Dict Memory Example - Meals
Use dict to remember that 'breakfast'
is 'apple'
and 'lunch'
is 'donut'
. Using 'breakfast'
and 'lunch'
as keys.
>>> meals = {}
>>> meals['breakfast'] = 'apple'
>>> meals['lunch'] = 'donut'
>>>
>>> # time passes, other lines run
>>>
>>> # what was lunch again?
>>> meals['lunch']
'donut'
>>>
>>> # did I have breakfast or dinner?
>>> 'breakfast' in meals
True
>>> 'dinner' in meals
False
>>>
Basic Dict Code Examples - Meals
Look at the dict1 "meals" exercises on the experimental server
> dict1 meals exercises
With the "meals" examples, the keys are 'breakfast', 'lunch', 'dinner'
and the values are like 'soup'
and 'bagel'
. A key like 'breakfast'
may or may not be in the dict, so need to "in" check first.
- bad_start() - check for bad breakfast - return True if no breakfast or if it is 'candy'
- candyish() - check for candy lunch or dinner
- enkale() - if
'candy'
for dinner, change it to 'kale'
Case to think bout: dict[key]
The code can only get the value for a key, if the key is in the dict. Otherwise it's an error. Therefore, need to structure the code with an "in" guard check or something to make sure the key is in the dict before trying to get its value.
bad_start()
>
bad_start()
bad_start(meals): Given a "meals" dict which contains key/value pairs like 'lunch' -> 'hot dog'. The possible keys are 'breakfast', 'lunch', 'dinner'. Return True if there is no 'breakfast' key in meals, or the value for 'breakfast' is 'candy'. Otherwise return False.
bad_start() Solution Code
Question: is the meals['breakfast'] == 'candy'
line safe? Yes. The if-statement guards the [ ].
def bad_start(meals):
if 'breakfast' not in meals:
return True
if meals['breakfast'] == 'candy':
return True
return False
# Can be written with "or" / short-circuiting avoids key-error
# if 'breakfast' not in meals or meals['breakfast'] == 'candy':
enkale()
>
enkale()
enkale(meals): Given a "meals" dict which contains key/value pairs like 'lunch' -> 'hot dog'. The possible keys are 'breakfast', 'lunch', 'dinner'. If the key 'dinner' is in the dict with the value 'candy', change the value to 'kale'. Otherwise leave the dict unchanged. Return the dict in all cases.
enkale() Solution Code
Demo: work out the code, see key error
Cannot access meals['dinner']
in the case that dinner is not in the dict, so need logic to avoid that case.
def enkale(meals):
if 'dinner' in meals and meals['dinner'] == 'candy':
meals['dinner'] = 'kale'
return meals
Typical pattern: "in" check guards the meals['dinner']
access, since the short-circuit and only proceeds when the first test is True. Could write it out in this longer form which is ok - works exactly the same as the above and/short-circuit form:
def enkale(meals):
if 'dinner' in meals:
if meals['dinner'] == 'candy':
meals['dinner'] = 'kale'
return meals
Exercise: is_boring()
>
is_boring()
is_boring(meals): Given a "meals" dict. We'll say the meals are boring if lunch and dinner are both present and are the same thing. Return True if the meals are boring, False otherwise.
Dict Observations
Dict Random Order
- Note: the order of the keys in the dict is kind of random
It is the order they were added
Simplest to think of it as random
- Key type is typically a str or int (immutable)
- Value type can be anything (str, list, ...)
- The get/set/in logic is on the keys
- The values are just payload - stored but can only lookup with the key
"in" Guard Pattern
- Very often see "in" checks just before key access
- Accessing
meals['dinner']
= an error if dinner not in the dict
- Therefore: check
dinner in meals
first
- Only access
meals['dinner']
when the key is present
- The
and
on the following line does this, only proceeding when in is True:
if 'dinner' in meals and meals['dinner'] == 'candy':
- Aka "short circuiting" of boolean expressions
Key and Value - Different Roles
- Note that get/set/in are all by key
- The key is the control, value is just payload
- Could say key/value are asymmetric, having specialized roles
- YES set by key:
d['a'] = 'alpha'
- YES get by key:
d['a'] -> 'alpha'
- YES in check of key:
'a' in d -> True
- NO, in check by value:
'alpha' in d
Does not work
- get/set/in all work with key
Dict vs. List - Keys
- Dict and List both remember things
- What's the difference?
- Keys!
- The "keys" for list are always index numbers
0, 1, 2, 3, ... len-1
- The "keys" for dict, you choose!
Any string or int etc. value is a key
Can set the values in any order
- Can install anything, make that key on the fly...
- e.g.
d['dinner'] = 'radish'
Dict-Count Algorithm
- Extremely important dict algorithm pattern
- (Read: we'll use it a lot)
- A "counts" dict:
We have some big data set
Store a key for each distinct value in the data
The value for each key is count of occurrences of that key in the data
- e.g. strs:
'a', 'b', 'a', 'c', 'b'
- creates "counts" dict:
{'a': 2, 'b': 2, 'c': 1}
Dict Count Code Examples
> dict2 Count exercises
Dict-Count Algorithm Steps
- 1. Start with empty dict
- 2. For each str, test: not seen before?
- 3. Not seen before: store key = str, value = 1
- 4. Seen before: key = str, value = value + 1
Dict-Count abacb
Go through these strs
strs = ['a', 'b', 'a', 'c', 'b']
Counts dict ends up as {'a': 2, 'b': 2, 'c': 1}
:
- strs:
'a', 'b', 'a', 'c', 'b'
- Each distinct str is a key in the dict
- The value for each key is the number of times it is seen
- Algorithm: loop through all s, update dict with counts as we go
- Each s: seen this before or not?
1. str-count1() - if/else
>
str_count1()
str_count1 demo, canonical dict-count algorithm
- Central test of this algorithm: not seen before?
- if/else solution
- Test: not seen before?
not seen before: counts[s] = 1
seen before: counts[s] += 1
- This if/else approach is fine, but we'll see another way below
str_count1() Solution
def str_count1(strs):
counts = {}
for s in strs:
# s not seen before?
if s not in counts:
counts[s] = 1 # first time
else:
counts[s] +=1 # every later time
return counts
2. str-count2() - Unified/Invariant Version, no else
>
str_count2()
- A slight "invariant" improvement on the above code
- Same central test: not seen s before?
- Fix the counts dict - if s not in, set it in with 0
- If not seen before:
counts[s] = 0
- With fix done, following line works for all cases:
-
counts[s] += 1
- "Invariant" - works for all cases
- I have a slight preference this version
It's one fewer lines and does not use else
- All counting goes through that one += 1 line
Standard Dict-Count Code - Unified/Invariant Version
def str_count2(strs):
counts = {}
for s in strs:
if s not in counts: # fix counts/s if not seen before
counts[s] = 0
# Invariant: now s is in counts one way or
# another, so can do next step unconditionally
counts[s] += 1
return counts
Int Count - Exercise
>
int_count()
Apply the dict-count algorithm to a list of int values, return a counts dict, counting how many times each int value appears in the list.
Char Count - Exercise
>
char_count()
Apply the dict-count algorithm to chars in a string. Build a counts dict of how many times each char, converted to lowercsae, appears in a string so 'Coffee'
returns {'c': 1, 'o': 1, 'f': 2, 'e': 2}
.