You can turn str, list, tuple, dict, and other iterables into iterators with the iter
built-in.
The next
built-in will return the next item from the iterator:
vals = [1,2,3]
i = iter(vals)
next(i)
1
next(i)
2
next(i)
3
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
As the StopIteration
error shows, the iterator is "used up" when we've moved through all its members.
You can call next
on file objects:
f = open("alice.txt")
next(f)
'\ufeffThe Project Gutenberg EBook of Alice in Wonderland, by Lewis Carroll\n'
next(f)
'\n'
When you use our standard idiom for opening a file and moving through its lines, you are relying on the fact that file objects are iterators:
with open(filename) as f:
for line in f:
# Do something with `line`
This is a memory-efficient way to read through the file, because it never reads the whole file into memory. This is therefore preferred in contexts where line-by-line processing meets your needs.
We often want to read in files line-by-line and do something with those lines. For example, with a CSV, you might want to process the column values in complex ways. Custom generators provide a scalable, intuitive way to do this. Here's a simple example:
def uppercase_reader(filename):
with open(filename) as f:
for line in f:
line = line.upper() ## Presumably you would do something more interesting!
yield line
The crucial piece is using yield
rather than return
. With the above, calling next
will iteratively yield uppercased lines, and using a for-loop will move loop over uppercased lines:
for line in uppercase_reader(filename):
# Do something with `line`
Here's a more realistic illustration — a Google Books filereader:
import gzip
def googlebooks_reader(filename, gz=True):
if gz:
f = gzip.open(filename, mode='rt', encoding='utf8')
else:
f = open(filename)
with f:
for line in f:
w, yr, mc, vc = line.split("\t")
yr = int(yr)
mc = int(mc)
vc = int(vc)
yield {'word': w, 'year': yr, 'match_count': mc, 'volume_count': vc}
This version turns each line into a dict with intuitive key names and the correct types for the values. This seems like a good generic format for doing analytic work with the file.