Slide 1
Today: new data type dict and the dict-count algorithm
Slide 2
Dict - Hash Table - Fast
- Python "dict" type
- A key/value "dictionary"
- In CS generally known as a "hash table"
Sounds like a real hacker thing
CS106B! - Dict is a bit advanced, compared to basic string/list
- The defining feature of dicts:
Can store and then look up data, and be fast about it
Slide 3
Dict Story Arc
- Have: have some big data set, data is not organized, perhaps random order
- Dict:
Pick out data items we want
Store each item under a key in the dict - Done: now the data is organized by key
- The dict is fast doing get/set by key, its defining superpower
- Job interview pattern:
Interview question has some messed up data
Best answer inevitably uses a dict to organize the data
Because the dict is powerful and fast...
Interviewers cannot resist using it
Slide 4
Dict = Memory
- Dict plays the role of memory
- Organized by key
- Suppose key is
'a'
- Your code can store at one time:
d['a'] = 12
- Later, code can lookup:
d['a']
- Get back the 12 stored earlier
- Speed: even if d contains 10 million keys, get/set/in by key is nearly instant
Slide 5
Dict Basics
See Python Guide for more detail: Python Dict
- Organize data around a key
- Each key stores one associated value
In drawing with an arrow:key -> value
- Literal syntax of a dict - curly braces, key:value
{'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
Style: one space after colon and comma
Slide 6
Dict-1. Set key:value into Dict
- Create empty dict:
d = {}
- Set:
d[key] = value
- e.g.
d['a'] = 'alpha'
- Set creates that key entry in dict if needed
- Overwrites any previous value for that key
>>> d = {} # Start with empty dict {}
>>> d['a'] = 'alpha' # Set key/value
>>> d['g'] = 'gamma'
>>> d['b'] = 'beta'
>>> # Now we have built the picture above
>>> # Python can display dict with the literal syntax
>>> d
{'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
>>>
Slide 7
Dict-2: Get value out of Dict
- Get a value out of a dict by key
- Get:
d[key]
- returns the value for that key - e.g.
d['a']
returns'alpha'
- Handy:
+=
Does a get/set series on the value
This will be a handy pattern - e.g.
d['a'] += '!!!'
Equivalent to:d['a'] = d['a'] + '!!!'
Adds '!!!' to end of that value
>>> d['g'] # Get by key
'gamma'
>>> d['b']
'beta'
>>> d['a'] = 'apple' # Overwrite 'a' key
>>> d['a']
'apple'
>>>
>>> # += modify value
>>> d['a'] += '!!!'
>>> d['a']
'apple!!!'
>>> # Dict literal format, curl-braces, key:value
>>> d
{'a': 'apple!!!', 'g': 'gamma', 'b': 'beta'}
>>>
Slide 8
Dict-3 Get Error / "in" Test
- There is one big catch
- Get [ ] a value out of a dict only works if the key is in the dict
- If the key is not in the dict, [ ] is an error
- Use in to check if a key is in the dict
- The in* check is for keys, not values
All important dict logic is with the key, the value is just stored, not looked at - Before using [ ] to get a value, need in-check first to see that
it is there
Sort of the "guard" pattern again
>>> # Can initialize dict with literal
>>> d = {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
>>>
>>> d['x'] # Key not in -> Error
Error:KeyError('x',)
>>>
>>> 'a' in d # "in" key tests
True
>>> 'x' in d
False
>>>
>>> # Guard pattern
>>> if 'a' in d:
val = d['a']
>>>
>>> # in uses key, not value, so this does not work:
>>> 'alpha' in d
False
>>>
Slide 9
Dict Memory Example - Meals
Use dict to remember that 'breakfast'
is 'apple'
and 'lunch'
is
'donut'
. Using 'breakfast'
and 'lunch'
as keys.
>>> meals = {}
>>> meals['breakfast'] = 'apple'
>>> meals['lunch'] = 'donut'
>>>
>>> # time passes, other lines run
>>>
>>> # what was lunch again?
>>> meals['lunch']
'donut'
>>>
>>> # did I have breakfast or dinner?
>>> 'breakfast' in meals
True
>>> 'dinner' in meals
False
>>>
Slide 10
Basic Dict Code Examples - Meals
Look at the dict1 "meals" exercises on the experimental server
> dict1 meals exercises
With the "meals" examples, the keys are
'breakfast', 'lunch', 'dinner'
and the values are like 'hot dot'
and
'bagel'
. A key like 'breakfast'
may or may not be in the dict, so
need to "in" check first.
- bad_start() - check for bad breakfast - return True if no breakfast or if it is 'candy'
- candyish() - check for candy lunch or dinner
- enkale() - if
'candy'
for dinner, change it to'kale'
Slide 11
Case to think about: dict[key]
The code can only get the value for a key, if the key is in the dict. Otherwise it's an error. Therefore, need to structure the code with an "in" guard check or something to make sure the key is in the dict before trying to get its value.
Slide 12
bad_start()
bad_start(meals): Given a "meals" dict which contains key/value pairs like 'lunch' -> 'hot dog'. The possible keys are 'breakfast', 'lunch', 'dinner'. Return True if there is no 'breakfast' key in meals, or the value for 'breakfast' is 'candy'. Otherwise return False.
Slide 13
bad_start() Solution Code
Question: is the meals['breakfast'] == 'candy'
line safe? Yes. The
if-statement guards the [ ].
def bad_start(meals):
if 'breakfast' not in meals:
return True
if meals['breakfast'] == 'candy':
return True
return False
# Can be written with "or" / short-circuiting avoids key-error
# if 'breakfast' not in meals or meals['breakfast'] == 'candy':
Slide 14
enkale()
> enkale()
enkale(meals): Given a "meals" dict which contains key/value pairs like 'lunch' -> 'hot dog'. The possible keys are 'breakfast', 'lunch', 'dinner'. If the key 'dinner' is in the dict with the value 'candy', change the value to 'kale'. Otherwise leave the dict unchanged. Return the dict in all cases.
Slide 15
enkale() Solution Code
Demo: work out the code, see key error
Cannot access meals['dinner']
in the case that dinner is not in the
dict, so need logic to avoid that case.
def enkale(meals):
if 'dinner' in meals and meals['dinner'] == 'candy':
meals['dinner'] = 'kale'
return meals
Typical pattern: "in" check guards the meals['dinner']
access, since
the short-circuit and only proceeds when the first test is True. Could
write it out in this longer form which is ok - works exactly the same as
the above and/short-circuit form:
def enkale(meals):
if 'dinner' in meals:
if meals['dinner'] == 'candy':
meals['dinner'] = 'kale'
return meals
Slide 16
Exercise: is_boring()
is_boring(meals): Given a "meals" dict. We'll say the meals are boring if lunch and dinner are both present and are the same thing. Return True if the meals are boring, False otherwise.
Slide 17
Dict Observations
Dict Random Order
- Note: the order of the keys in the dict is kind of random
It is the order they were added
Simplest to think of it as random - Key type is typically a str or int (immutable)
- Value type can be anything (str, list, ...)
- The get/set/in logic is on the keys
- The values are just dumb payload - stored but looked at
"in" Guard Pattern
- Very often see "in" checks just before key access
- Accessing
meals['dinner']
= an error if dinner not in the dict - Therefore: check
dinner in meals
first - Only access
meals['dinner']
when the key is present - The
and
on the following line does this, only proceeding when in is True:
if 'dinner' in meals and meals['dinner'] == 'candy':
- Aka "short circuiting" of boolean expressions
Key and Value - Different Roles
- Note that get/set/in are all by key
- The key is the control, value is just dumb payload
- Could say key/value are asymmetric, having specialized roles
- YES set by key:
d['a'] = 'alpha'
- YES get by key:
d['a'] -> 'alpha'
- YES in check of key:
'a' in d -> True
- NO, in check by value:
'alpha' in d
Does not work - get/set/in all work with key
Dict vs. List - Keys
- Dict and List both remember things
- What's the difference?
- Keys!
- The "keys" for list are always index numbers
0, 1, 2, 3, ... len-1
- The "keys" for dict, you choose!
Any string or int etc. value is a key
Can set the values in any order - Can install anything, make that key on the fly...
- e.g.
d['dinner'] = 'radish'
Slide 18
Dict-Count Algorithm
- Extremely important dict algorithm pattern
- (Read: we'll use it a lot)
- A "counts" dict:
We have some big data set
Store a key for each distinct value in the data
The value for each key is count of occurrences of that key in the data - e.g. strs:
'a', 'b', 'a', 'c', 'b'
- creates "counts" dict:
{'a': 2, 'b': 2, 'c': 1}
Slide 19
Dict Count Code Examples
> dict2 Count exercises
Slide 20
Dict-Count Algorithm Steps
- 1. Start with empty dict
- 2. For each str, test: not seen before?
- 3. Not seen before: store key = str, value = 1
- 4. Seen before: key = str, value = value + 1
Slide 21
Dict-Count abacb
Go through these strs
strs = ['a', 'b', 'a', 'c', 'b']
Sketch out counts dict here:
Counts dict ends up as {'a': 2, 'b': 2, 'c': 1}
:
- strs:
'a', 'b', 'a', 'c', 'b'
- Each distinct str is a key in the dict
- The value for each key is the number of times it is seen
- Algorithm: loop through all s, update dict with counts as we go
- Each s: seen this before or not?
Slide 22
1. str-count1() - if/else
str_count1 demo, canonical dict-count algorithm
- Central test of this algorithm: not seen before?
- if/else solution
- Test: not seen before?
not seen before:counts[s] = 1
seen before:counts[s] += 1
- This if/else approach is fine, but we'll see another way below
Slide 23
str_count1() Solution
def str_count1(strs):
counts = {}
for s in strs:
# s not seen before?
if s not in counts:
counts[s] = 1 # first time
else:
counts[s] +=1 # every later time
return counts
Slide 24
2. str-count2() - Unified/Invariant Version, no else
- A slight "invariant" improvement on the above code
- Same central test: not seen s before?
- Fix the counts dict - if s not in, set it in with 0
- If not seen before:
counts[s] = 0
- With fix done, following line works for all cases:
counts[s] += 1
- "Invariant" - works for all cases
- I have a slight preference this version
It's one fewer lines and does not use else - All counting goes through that one += 1 line
Slide 25
Standard Dict-Count Code - Unified/Invariant Version
def str_count2(strs):
counts = {}
for s in strs:
if s not in counts: # fix counts/s if not seen before
counts[s] = 0
# Invariant: now s is in counts one way or
# another, so can do next step unconditionally
counts[s] += 1
return counts
Slide 26
Int Count - Exercise
Apply the dict-count algorithm to a list of int values, return a counts dict, counting how many times each int value appears in the list.
Slide 27
Char Count - Exercise
Apply the dict-count algorithm to chars in a string. Build a counts dict
of how many times each char, converted to lowercsae, appears in a string
so 'Coffee'
returns {'c': 1, 'o': 1, 'f': 2, 'e': 2}
.