Slide 1

Today: new data type dict and the dict-count algorithm


Slide 2

Dict - Hash Table - Fast

  • Python "dict" type
  • A key/value "dictionary"
  • In CS generally known as a "hash table"
    Sounds like a real hacker thing
    CS106B!
  • Dict is a bit advanced, compared to basic string/list
  • The defining feature of dicts:
    Can store and then look up data, and be fast about it

Slide 3

Dict Story Arc

  • Have: have some big data set, data is not organized, perhaps random order
  • Dict:
    Pick out data items we want
    Store each item under a key in the dict
  • Done: now the data is organized by key
  • The dict is fast doing get/set by key, its defining superpower
  • Job interview pattern:
    Interview question has some messed up data
    Best answer inevitably uses a dict to organize the data
    Because the dict is powerful and fast...
    Interviewers cannot resist using it

Slide 4

Dict = Memory

  • Dict plays the role of memory
  • Organized by key
  • Suppose key is 'a'
  • Your code can store at one time: d['a'] = 12
  • Later, code can lookup: d['a']
  • Get back the 12 stored earlier
  • Speed: even if d contains 10 million keys, get/set/in by key is nearly instant

Slide 5

Dict Basics

See Python Guide for more detail: Python Dict

alt:python dict key/value pairs 'a'/'alpha' 'g'/'gamma'
'b'/'beta'

  • Organize data around a key
  • Each key stores one associated value
    In drawing with an arrow: key -> value
  • Literal syntax of a dict - curly braces, key:value
    {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
    Style: one space after colon and comma

Slide 6

Dict-1. Set key:value into Dict

  • Create empty dict: d = {}
  • Set: d[key] = value
  • e.g. d['a'] = 'alpha'
  • Set creates that key entry in dict if needed
  • Overwrites any previous value for that key
>>> d = {}             # Start with empty dict {}
>>> d['a'] = 'alpha'   # Set key/value
>>> d['g'] = 'gamma'
>>> d['b'] = 'beta'
>>> # Now we have built the picture above
>>> # Python can display dict with the literal syntax
>>> d
{'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
>>>

Slide 7

Dict-2: Get value out of Dict

  • Get a value out of a dict by key
  • Get: d[key] - returns the value for that key
  • e.g. d['a'] returns 'alpha'
  • Handy: +=
    Does a get/set series on the value
    This will be a handy pattern
  • e.g. d['a'] += '!!!'
    Equivalent to: d['a'] = d['a'] + '!!!'
    Adds '!!!' to end of that value
>>> d['g']             # Get by key
'gamma'
>>> d['b']
'beta'
>>> d['a'] = 'apple'   # Overwrite 'a' key
>>> d['a']
'apple'
>>>
>>> # += modify value
>>> d['a'] += '!!!'
>>> d['a']
'apple!!!'
>>> # Dict literal format, curl-braces, key:value
>>> d
{'a': 'apple!!!', 'g': 'gamma', 'b': 'beta'}
>>>

Slide 8

Dict-3 Get Error / "in" Test

  • There is one big catch
  • Get [ ] a value out of a dict only works if the key is in the dict
  • If the key is not in the dict, [ ] is an error
  • Use in to check if a key is in the dict
  • The in* check is for keys, not values
    All important dict logic is with the key, the value is just stored, not looked at
  • Before using [ ] to get a value, need in-check first to see that it is there
    Sort of the "guard" pattern again
>>> # Can initialize dict with literal
>>> d = {'a': 'alpha', 'g': 'gamma', 'b': 'beta'}
>>>
>>> d['x']             # Key not in -> Error
Error:KeyError('x',)
>>>
>>> 'a' in d           # "in" key tests
True
>>> 'x' in d
False
>>> 
>>> # Guard pattern
>>> if 'a' in d:
      val = d['a']
>>>
>>> # in uses key, not value, so this does not work:
>>> 'alpha' in d 
False
>>>

Slide 9

Dict Memory Example - Meals

Use dict to remember that 'breakfast' is 'apple' and 'lunch' is 'donut'. Using 'breakfast' and 'lunch' as keys.

>>> meals = {}
>>> meals['breakfast'] = 'apple'
>>> meals['lunch'] = 'donut'
>>>
>>> # time passes, other lines run
>>>
>>> # what was lunch again?
>>> meals['lunch']
'donut'
>>> 
>>> # did I have breakfast or dinner?
>>> 'breakfast' in meals
True
>>> 'dinner' in meals
False
>>>

Slide 10

Basic Dict Code Examples - Meals

Look at the dict1 "meals" exercises on the experimental server

> dict1 meals exercises

With the "meals" examples, the keys are 'breakfast', 'lunch', 'dinner' and the values are like 'hot dot' and 'bagel'. A key like 'breakfast' may or may not be in the dict, so need to "in" check first.

  • bad_start() - check for bad breakfast - return True if no breakfast or if it is 'candy'
  • candyish() - check for candy lunch or dinner
  • enkale() - if 'candy' for dinner, change it to 'kale'

Slide 11

Case to think about: dict[key]

The code can only get the value for a key, if the key is in the dict. Otherwise it's an error. Therefore, need to structure the code with an "in" guard check or something to make sure the key is in the dict before trying to get its value.


Slide 12

bad_start()

> bad_start()

bad_start(meals): Given a "meals" dict which contains key/value pairs like 'lunch' -> 'hot dog'. The possible keys are 'breakfast', 'lunch', 'dinner'. Return True if there is no 'breakfast' key in meals, or the value for 'breakfast' is 'candy'. Otherwise return False.


Slide 13

bad_start() Solution Code

Question: is the meals['breakfast'] == 'candy' line safe? Yes. The if-statement guards the [ ].

def bad_start(meals):
    if 'breakfast' not in meals:
        return True
    if meals['breakfast'] == 'candy':
        return True
    return False
    # Can be written with "or" / short-circuiting avoids key-error
    # if 'breakfast' not in meals or meals['breakfast'] == 'candy':

Slide 14

enkale()

> enkale()

enkale(meals): Given a "meals" dict which contains key/value pairs like 'lunch' -> 'hot dog'. The possible keys are 'breakfast', 'lunch', 'dinner'. If the key 'dinner' is in the dict with the value 'candy', change the value to 'kale'. Otherwise leave the dict unchanged. Return the dict in all cases.


Slide 15

enkale() Solution Code

Demo: work out the code, see key error

Cannot access meals['dinner'] in the case that dinner is not in the dict, so need logic to avoid that case.

def enkale(meals):
    if 'dinner' in meals and meals['dinner'] == 'candy':
        meals['dinner'] = 'kale'
    return meals

Typical pattern: "in" check guards the meals['dinner'] access, since the short-circuit and only proceeds when the first test is True. Could write it out in this longer form which is ok - works exactly the same as the above and/short-circuit form:

def enkale(meals):
    if 'dinner' in meals:
        if meals['dinner'] == 'candy':
            meals['dinner'] = 'kale'
    return meals

Slide 16

Exercise: is_boring()

> is_boring()

is_boring(meals): Given a "meals" dict. We'll say the meals are boring if lunch and dinner are both present and are the same thing. Return True if the meals are boring, False otherwise.


Slide 17

Dict Observations

Dict Random Order

  • Note: the order of the keys in the dict is kind of random
    It is the order they were added
    Simplest to think of it as random
  • Key type is typically a str or int (immutable)
  • Value type can be anything (str, list, ...)
  • The get/set/in logic is on the keys
  • The values are just dumb payload - stored but looked at

"in" Guard Pattern

  • Very often see "in" checks just before key access
  • Accessing meals['dinner'] = an error if dinner not in the dict
  • Therefore: check dinner in meals first
  • Only access meals['dinner'] when the key is present
  • The and on the following line does this, only proceeding when in is True:
    if 'dinner' in meals and meals['dinner'] == 'candy':
  • Aka "short circuiting" of boolean expressions

Key and Value - Different Roles

  • Note that get/set/in are all by key
  • The key is the control, value is just dumb payload
  • Could say key/value are asymmetric, having specialized roles
  • YES set by key: d['a'] = 'alpha'
  • YES get by key: d['a'] -> 'alpha'
  • YES in check of key: 'a' in d -> True
  • NO, in check by value: 'alpha' in d
    Does not work
  • get/set/in all work with key

Dict vs. List - Keys

  • Dict and List both remember things
  • What's the difference?
  • Keys!
  • The "keys" for list are always index numbers
    0, 1, 2, 3, ... len-1
  • The "keys" for dict, you choose!
    Any string or int etc. value is a key
    Can set the values in any order
  • Can install anything, make that key on the fly...
  • e.g. d['dinner'] = 'radish'


Slide 18

Dict-Count Algorithm

  • Extremely important dict algorithm pattern
  • (Read: we'll use it a lot)
  • A "counts" dict:
    We have some big data set
    Store a key for each distinct value in the data
    The value for each key is count of occurrences of that key in the data
  • e.g. strs: 'a', 'b', 'a', 'c', 'b'
  • creates "counts" dict: {'a': 2, 'b': 2, 'c': 1}

Slide 19

Dict Count Code Examples

> dict2 Count exercises


Slide 20

Dict-Count Algorithm Steps

  • 1. Start with empty dict
  • 2. For each str, test: not seen before?
  • 3. Not seen before: store key = str, value = 1
  • 4. Seen before: key = str, value = value + 1

Slide 21

Dict-Count abacb

Go through these strs
strs = ['a', 'b', 'a',  'c',  'b']

Sketch out counts dict here:












Counts dict ends up as {'a': 2, 'b': 2, 'c': 1}:

alt: counts a 2 b 2 c 1

  • strs: 'a', 'b', 'a', 'c', 'b'
  • Each distinct str is a key in the dict
  • The value for each key is the number of times it is seen
  • Algorithm: loop through all s, update dict with counts as we go
  • Each s: seen this before or not?

Slide 22

1. str-count1() - if/else

> str_count1()

str_count1 demo, canonical dict-count algorithm

  • Central test of this algorithm: not seen before?
  • if/else solution
  • Test: not seen before?
    not seen before: counts[s] = 1
    seen before: counts[s] += 1
  • This if/else approach is fine, but we'll see another way below

Slide 23

str_count1() Solution

def str_count1(strs):
    counts = {}
    for s in strs:
        # s not seen before?
        if s not in counts:
            counts[s] = 1   # first time
        else:
            counts[s] +=1   # every later time
    return counts

Slide 24

2. str-count2() - Unified/Invariant Version, no else

> str_count2()

  • A slight "invariant" improvement on the above code
  • Same central test: not seen s before?
  • Fix the counts dict - if s not in, set it in with 0
  • If not seen before:
    counts[s] = 0
  • With fix done, following line works for all cases:
  • counts[s] += 1
  • "Invariant" - works for all cases
  • I have a slight preference this version
    It's one fewer lines and does not use else
  • All counting goes through that one += 1 line

Slide 25

Standard Dict-Count Code - Unified/Invariant Version

def str_count2(strs):
    counts = {}
    for s in strs:
        if s not in counts:  # fix counts/s if not seen before
            counts[s] = 0
        # Invariant: now s is in counts one way or
        # another, so can do next step unconditionally
        counts[s] += 1
    return counts

Slide 26

Int Count - Exercise

> int_count()

Apply the dict-count algorithm to a list of int values, return a counts dict, counting how many times each int value appears in the list.


Slide 27

Char Count - Exercise

> char_count()

Apply the dict-count algorithm to chars in a string. Build a counts dict of how many times each char, converted to lowercsae, appears in a string so 'Coffee' returns {'c': 1, 'o': 1, 'f': 2, 'e': 2}.