Linguist 278: Programming for Linguists
Stanford Linguistics, Fall 2021
Christopher Potts

Introduction to object-oriented programming in Python

This notebook introduces the core concepts of object-oriented programming (OOP) as it is implemented in Python. The guiding idea is that using OOP will make your code easier to maintain, extend, and debug.

In Python, the core notion is that of a class, which can have attributes and methods that constitute the functionality of the class.

If you create classes that are modular and self-contained, then they will prove to be effective building blocks for very large codebases.

We will, as always, have time only for the core concepts, but I've included a section called Other topics that at least rounds out the set of things that I find myself doing most often with Python OOP.

Set-up

In [ ]:
import csv
import os
import re
import requests
import string

Class initialization, attributes, and instantiation

In [ ]:
## TO BE COMPLETED ##
pass

Instantiation:

In [ ]:
## TO BE COMPLETED ##
pass

Accessing an attribute:

In [ ]:
## TO BE COMPLETED ##
pass

Changing the value of an attribute:

In [ ]:
## TO BE COMPLETED ##
pass
In [ ]:
## TO BE COMPLETED ##
pass

Adding a more complex attribute:

In [ ]:
class Linguist:
    def __init__(self, first, last):
        self.first_name = first.title()
        self.last_name = last.title()
        self.wikipedia_url = "https://en.wikipedia.org/wiki/{}_{}".format(
            self.first_name, self.last_name)
In [ ]:
## TO BE COMPLETED ##
pass
In [ ]:
## TO BE COMPLETED ##
pass
In [ ]:
## TO BE COMPLETED ##
pass
In [ ]:
## TO BE COMPLETED ##
pass

Exercise: ConcretenessEntry

For this first in-class class exercise, asssume you are writing a class to model the rows in the Concreteness lexicon that we've used a few times in the past.

The file looks like this (these are the first five lines of data):

Word Bigram Conc.M Conc.SD Unknown Total Percent_known SUBTLEX Dom_Pos
roadsweeper False 4.85 0.37 1 27 0.96 0 0
traindriver False 4.54 0.71 3 29 0.90 0 0
tush False 4.45 1.01 3 25 0.88 66 0
hairdress False 3.93 1.28 0 29 1.00 1 0
pharmaceutics False 3.77 1.41 4 26 0.85 0 0

Assume for now that the lexicon file is being processed by csv.DictReader, and that we want to create a ConcretenessEntry instance for each of those rows/dicts.

For now, just create attributes corresponding to the keys 'Word', 'Conc.M', and 'Conc.SD'. (One of the advantages of OOP is that it is easy to revisit this and add more attributes, no matter how complex the surrounding code becomes.)

In [ ]:
class ConcretenessEntry:
    """Class for rows in the Concreteness lexicon.

    Parameters
    ----------
    d : dict
        With at least the keys 'Word', 'Conc.M', and 'Conc.SD'

    Attributes
    ----------
    word : str
    conc_mean : float
    conc_sd : float
    """
    def __init__(self, d):
        ## TO BE COMPLETED ##
        pass

This should work:

In [ ]:
entry = ConcretenessEntry({'Word': 'roadsweeper', 'Conc.M': '4.85', 'Conc.SD': '0.37'})

And these tests should pass

In [ ]:
assert entry.conc_mean == 4.85
In [ ]:
assert entry.conc_sd == 0.37

Class methods

Class methods are functions that defined inside the class. As a rule, when defining methods, the first argument is self. This makes everything about the current object available to the method, allowing you to make use of its attributes and other methods.

In [ ]:
class WikipediaLinguist:
    def __init__(self, first, last):
        self.first_name = first.title()
        self.last_name = last.title()
        self.wikipedia_url = "https://en.wikipedia.org/wiki/{}_{}".format(
            self.first_name, self.last_name)

    def get_wikipedia_page_contents(self):
        """Gets the contents of the Wikipedia page at `self.wikipedia_url`

        Returns
        -------
        str
        """
        page = requests.get(self.wikipedia_url)
        page.encoding = "utf8"
        return page.text

    def count_wikipedia_matches(self, regex):
        """Uses `self.get_wikipedia_page_contents` to get the text of
        `self.wikipedia_url` and returns all the matches of `regex`
        in that page.

        Parameters
        ----------
        regex : compiled regular expression

        Returns
        -------
        list of str
        """
        contents = self.get_wikipedia_page_contents()
        return len(regex.findall(contents))
In [ ]:
web_chomsky = WikipediaLinguist("noam", "chomsky")
In [ ]:
chomsky_str = web_chomsky.get_wikipedia_page_contents()
In [ ]:
# First few characters of the prettified page contents:

print(chomsky_str[: 145])
In [ ]:
web_chomsky.count_wikipedia_matches(re.compile(r"government"))

Exercise: ConcretenessCorpus

This exercise asks you to write a ConcretenessCorpus that is an interface to the full Concreteness lexicon. You'll write one method called iter_entries that yields ConcretenessEntry instances and another method called concreteness_mean that uses iter_entries to get the mean of the Conc.M values in the lexicon.

In [ ]:
concreteness_filename = os.path.join(
    "..", "data", "Concreteness_ratings_Brysbaert_et_al_BRM.csv")
In [ ]:
class ConcretenessCorpus:
    """Interface to the Concreteness lexicon.

    Parameters
    ----------
    src_filename : str
        Full path to the CSV file.

    Attributes
    ----------
    src_filename : str
    """
    def __init__(self, src_filename):
        self.src_filename = src_filename

    def iter_entries(self):
        """Iterates through `self.src_filename` yielding `ConcretenessEntry`
        objects instantiated from the dict corresponding to each row,
        via `csv.DictReader`.

        Yields
        ------
        ConcretenessEntry
        """
        ## TO BE COMPLETED ##
        pass

    def concreteness_mean(self):
        """Gets the mean of the `conc_mean` values of each `ConcretenessEntry`
        in the corpus. Uses `self.iter_entries`.

        Returns
        -------
        float
        """
        ## TO BE COMPLETED ##
        pass
In [ ]:
corpus = ConcretenessCorpus(concreteness_filename)
In [ ]:
mu = corpus.concreteness_mean()

mu
In [ ]:
assert round(mu, 4) == 3.0363

Class variables

Class variables are seldom used in my experience, but they are still worth knowing about. They are somewhat like attributes, but they are defined outside of the __init__ method, and so they can be redefined for all instances of a class.

In [ ]:
class WikipediaLinguistWithClassURL:

    WIKIPEDIA_URL_PATTERN = "https://en.wikipedia.org/wiki/{}_{}"

    def __init__(self, first, last):
        self.first_name = first.title()
        self.last_name = last.title()
        self.wikipedia_url = self.WIKIPEDIA_URL_PATTERN.format(
            self.first_name, self.last_name)
In [ ]:
class_chomsky = WikipediaLinguistWithClassURL("Noam", "Chomsky")
In [ ]:
class_partee = WikipediaLinguistWithClassURL("Barbara", "Partee")
In [ ]:
class_chomsky.WIKIPEDIA_URL_PATTERN

Here, we perversely redefine WIKIPEDIA_URL_PATTERN:

In [ ]:
WikipediaLinguistWithClassURL.WIKIPEDIA_URL_PATTERN = "Problematic redefinition!"

And now it is different for both of the instances we created:

In [ ]:
class_chomsky.WIKIPEDIA_URL_PATTERN
In [ ]:
class_partee.WIKIPEDIA_URL_PATTERN

Compare the following – trying to access an attribute without instantiating the object raises an error:

In [ ]:
WikipediaLinguistWithClassURL.first_name

Class inheritance

Python classes can be organized into a hierarchy, and subclasses inherit the attributes and methods of their superclasses (unless they redefine those attributes and methods).

The base class for our example:

In [ ]:
class Person:
    def __init__(self, first, last, profession):
        self.first = first
        self.last = last
        self.profession = profession

    def full_name(self):
        return "{} {}".format(self.first, self.last)

Two simple subclasses:

In [ ]:
class Linguist(Person):
    def __init__(self, first, last):
        super().__init__(first, last, profession="linguist")


class Sociologist(Person):
    def __init__(self, first, last):
        super().__init__(first, last, profession="sociologist")
In [ ]:
erving = Sociologist("Erving", "Goffman")
In [ ]:
erving.profession
In [ ]:
erving.full_name()

This subclass redefines the full_name method ad adds a new attribute of its own:

In [ ]:
class Doctor(Person):
    def __init__(self, first, last, alive=True):
        super().__init__(first, last, profession="doctor")
        self.alive = alive

    def full_name(self):
        return "{} {}, M.D.".format(self.first, self.last)
In [ ]:
doc = Doctor("Louis", "Pasteur", alive=False)
In [ ]:
doc.full_name()

Other topics

str methods

Adding a __str__ method means that your object instances will display in more helpful ways:

In [ ]:
str(chomsky)
In [ ]:
class PersonStr:
    def __init__(self, first, last, profession):
        self.first = first
        self.last = last
        self.profession = profession

    def __str__(self):
        return "{}, {}: {}".format(self.last, self.first, self.profession)
In [ ]:
str(PersonStr("Barack", "Obama", "politician"))

You can do something similar with len – you decide on what the appropriate notion of length is. (More generally, for any built-in function foo, you can implement a method __foo__. Of course, you want to do this only if it makes sense. For example, a __float__ method on Person would be conceptually confusing!)

repr methods

The idea behind a __repr__ method is that it returns the piece of code one would use to create the object itself. It's a bit of meta-programming, and it's often a very helpful representation for debugging and the like:

In [ ]:
class PersonRepr:
    def __init__(self, first, last):
        self.first = first
        self.last = last

    def __repr__(self):
        return 'PersonRepr("{}", "{}")'.format(self.first, self.last)
In [ ]:
repr_obama = PersonRepr("Barack", "Obama")
In [ ]:
repr(repr_obama)

Implementing a theory of comparison

By default, custom classes cannot be compared using the equality and inequality operators. However, if you add just three methods – __eq__, __lt__, and __le__ – then Python will generalize them to the full set of comparisons, allowing you to use the expected set of mathematical comparison operators:

In [ ]:
class PersonCmp:
    def __init__(self, first, last):
        self.first = first
        self.last = last

    def __eq__(self, other):
        return self.first == other.first and self.last == other.last

    def __lt__(self, other):
        """Sort alphabetically by last name."""
        return self.last < other.last

    def __le__(self, other):
        return self == other or self < other
In [ ]:
partee = PersonCmp("Barbara", "Partee")
In [ ]:
chomsky = PersonCmp("Noam", "Chomsky")
In [ ]:
partee == chomsky
In [ ]:
partee != chomsky
In [ ]:
not partee == chomsky
In [ ]:
chomsky < partee
In [ ]:
chomsky > partee
In [ ]:
chomsky <= partee

Hash methods, so your objects can be dictionary keys

By default, custom classes cannot be dictionary keys:

In [ ]:
{partee: True}

To allow this, you just need to add __eq__ and __repr__ methods, and best practices say that the two should reflect the same underlying notion of identity. I include __repr__ here to make the representations more transparent when printed:

In [ ]:
class PersonHashable:
    def __init__(self, first, last):
        self.first = first
        self.last = last

    def __eq__(self, other):
        return self.first == other.first and self.last == other.last

    def __hash__(self):
        return hash((self.first, self.last))

    def __repr__(self):
        return 'PersonHashable("{}", "{}")'.format(self.first, self.last)
In [ ]:
hashable_partee = PersonHashable("Barbara", "Partee")
In [ ]:
d = {hashable_partee: True}

d

Because the following object has the same __hash__ value as hashable_partee (and the two are equal by our definition), adding it as a key in effect overwrites hashable_partee:

In [ ]:
hashable_partee2 = PersonHashable("Barbara", "Partee")
In [ ]:
d[hashable_partee2] = False

d

Static methods

It can be convenient to have object methods that do not take self as a first argument. Such methods can be used even without instantiating the object itself – they are available directly from the class, like class variables. Such methods must have the decorator @staticmethod. They do not have access to self, and so are in principle definable outside of the object, but it might make sense to package them up inside a class. I find I most commonly do this with internal helper methods.

In [ ]:
class PageDownloader:
    def __init__(self, url):
        self.url = url

    def download(self):
        req = requests.get(self.url)
        text = self._process_download(req)
        return text

    @staticmethod
    def _process_download(req):
        req.encoding = 'utf8'
        return req.text
In [ ]:
downloader = PageDownloader("http://web.stanford.edu/class/linguist278/index.html")
In [ ]:
print(downloader.download()[:426])

Getter and setter methods

Here is the first class we defined in this notebook, repeated for convenience:

In [ ]:
class BasicLinguist:
    def __init__(self, name):
        self.name = name.title()
In [ ]:
chris = BasicLinguist("chris")
In [ ]:
chris.name

If we refine this value, we don't get the benefit of title being called to normalize the input:

In [ ]:
chris.name = 'christopher'

We can address this by instead defining getter and setter (property) methods for this attribute:

In [ ]:
class BasicLinguistGetSet:
    def __init__(self, name):
        self._name = name

    @property
    def name(self):
        return self._name.title()

    @name.setter
    def name(self, name):
        self._name = name.title()
In [ ]:
chris2 = BasicLinguistGetSet("chris")
In [ ]:
chris2.name
In [ ]:
chris2.name = 'christopher'
In [ ]:
chris2.name

A common idiom with setattr

In [ ]:
class ConcretenessEntrySetAttr:
    def __init__(self, d):
        for key, val in d.items():
            key = key.lower().replace(".", "_")
            if key in {'conc_m', 'conc_sd'}:
                val = float(val)
            setattr(self, key, val)
In [ ]:
setattr_entry = ConcretenessEntrySetAttr(
    {'Word': 'roadsweeper', 'Conc.M': '4.85', 'Conc.SD': '0.37'})

Making your class an iterator

In [ ]:
class ConcretenessCorpusIterator:
    def __init__(self, src_filename):
        self.src_filename = src_filename
        self.reader = csv.DictReader(open(self.src_filename))

    def __iter__(self):
        return self

    def __next__(self):
        return next(self.reader)
In [ ]:
corpus_iter = ConcretenessCorpusIterator(concreteness_filename)
In [ ]:
next(corpus_iter)