July 22nd, 2021
Today: code for dict-count algorithm, when does python make a copy? More sophisticated nested-dict example, other ways to read a file
> dict2 Demos and Exercises
def str_count2(strs): counts = {} for s in strs: if s not in counts: # fix counts/s if not seen before counts[s] = 0 # Unified: now s is in counts one way or # another, so can do next step unconditionally counts[s] += 1 return counts
Apply the dict-count algorithm to chars in a string. Build a counts dict of how many times each char appears in a string so 'Coffee'
returns {'c': 1, 'o': 1, 'f': 2, 'e': 2}
.
Recall loop form: for ch in s:
def char_count(s): counts = {} for ch in s: # Do all computation in lowercase low = ch.lower() if low not in counts: counts[low] = 0 counts[low] += 1 return counts
For more detail see guide: Python Not Copying
When Python uses an assignment = with a data structure like a list or a dict, Python does not make a copy of the structure. Instead, there is just the one list or dict, and multiple pointers pointing to it.
Here is code that creates one list and one dict, each with a variable pointing to it.
>>> lst = [1, 2, 3] >>> d = {} >>> d['a'] = 1
Memory looks like:
>>> d['b'] = lst
What does this do? Key: the =
does not make a copy of the list. Instead, it stores an additional reference to the one list inside the dict.
Memory looks like:
There is just one list, and there are two references to it. This is fine. What does the following code do?
>>> lst.append(4)
What does memory look like now? First, what does the list look like? Who is pointing to it?
Memory looks like:
What do these lines of code print now?
>>> lst ??? >>> d['b'] ???
Answer
Both lst
and d['b']
refer to the same list, which is now [1, 2, 3, 4]
Python does not copy a list or dict when used with, say, =. Instead, Python just spreads around more pointers to the one list. This is a normal way for Python programs to work - a few important lists or dicts, and pointers to those structures spread around in the code. This does not require any action on your part, just realize that that there are no copies.
Suppose "x" holds the key we're counting...
if x not in counts: # Fix so x is in there counts[x] = 0 # -Init counts[x] += 1 # -Increment
High level: we have a big list of email addresses. We want to organize the data by host. For each host, build up a list of all the users for that host.
Given a list of email address strings. Each email address has one '@' in it, e.g. 'abby@foo.com', where 'abby' is the user, and 'foo.com' is the host.
Create a nested dict with a key for each host, and the value for that key is a list of all the users for that host, in the order they appear in the original list (repetition allowed).
emails: ['abby@foo.com', 'bob@bar.com', 'abe@foo.com'] returns hosts dict: { 'foo.com': ['abby', 'abe'], 'bar.com': ['bob'] }
When working a nested dict problem, it's good to keep in mind the type of each key and its value. This info guides code that reads or writes in the dict - when do you do +=
and when do you do .append()
. What we have for this problem - will refer to this when writing a key line of code:
1. Each key is a host string, e.g. 'foo.com'
2. The value for each key is a list of users for that host, e.g. ['abby', 'abe']
> email_hosts() - nested dict problem
Here is the code to start with. The "not in" structure still applies.
def email_hosts(emails): hosts = {} for email in emails: at = email.find('@') user = email[:at] host = email[at + 1:] # your code here pass return hosts
1. Think about the "increment" line first. What is the append line? Look above at the definition for each key and value.
2. Need to init for the not-in case. For counting the init was: 0. Now the init is: [].
hosts[host]
This line is very hard to read, like what on earth is it?
Recall that the hosts dict looks like:
{ 'foo.com': ['abby', 'abe'], 'bar.com': ['bob'] }
In hosts dict, each key is a host, and each value is a list of user names.
Instead of using hosts[host]
as is, put its value into a well named var, spelling out what sort of data it holds. This is a big help and is how our solution is written. Note how the names in this line of code confirm that the logic is correct: users.append(user)
This depends on the "shallow" feature of Python data (above), e.g. hosts[host]
returns a reference to the embedded list to us.
No:
hosts[host].append(user)
Yes:
users = hosts[host] users.append(user)
def email_hosts(emails): hosts = {} for email in emails: at = email.find('@') user = email[:at] host = email[at + 1:] # key algorithm: init/increment if host not in hosts: hosts[host] = [] users = hosts[host] # decomp by var users.append(user) return hosts
> food_ratings() - nested dict problem
food_ratings(ratings): Given a list of food survey rating strings like this 'donut:10'. Create and return a "foods" dict that has a key for each food, and its value is a list of all its rating numbers in int form. Use split() to divide each food rating into its parts. There's a lot of Python packed into this question!
Build dict with structure:
Key = one food string
Value = list of rating ints
split(':')
Nice technique: rating = 'donut:10'
Use split(): rating.split(':') -> ['donut', '10']
See guide for more details: File Reading and Writing
Standard "with" to open a text file for reading:
with open(filename) as f: # use f in here
The form below is equivalent to above since 'r' is the default, meaning read the file. 'w' means write the file from RAM to the file system. See the guide above for sample writing code.
with open(filename, 'r') as f: # use f in here
Can specify encoding (default depends on your machine / locale). Encoding 'utf-8' is what many files use. Try this if you get a UnciodeDecodeError
with open(filename, encoding='utf-8') as f: # use f
Older way to open() a file (use in interpreter)
f = open(filename) # use f # f.close() when done # "with" does the .close() automatically
Most common, process 1 line at a time. Uses the least memory.
for line in f: # process each line
f.readlines() - return list of line strings, can do slices etc. to access lines in a custom order. Each line has the '\n' at its end. Use str.strip() to strip off whitespace from the ends of a line.
>>> f = open('poem.txt') # alternative to with, use in interpreter >>> lines = f.readlines() >>> lines ['Roses are red\n', 'Violets are blue\n', '"RED" BLUE.\n'] >>> >>> line = lines[0] # first line >>> line 'Roses are red\n' >>> line.strip() # strip() - remove whitespace from ends 'Roses are red' >>> >>> lines[1:] # slice to grab subset of lines ['Violets are blue\n', '"RED" BLUE.\n']
read() - whole file into one string. Handy if you can process the whole thing at once, not needing to go line by line. Reading from a file "consumes" the data. Doing a second read returns the empty string.
>>> f = open('poem.txt') >>> s = f.read() # whole file in string >>> s 'Roses are red\nViolets are blue\n"RED" BLUE.\n' >>> >>> >>> f.read() # reading again gets nothing ''