The overarching philosophy is that even the most complex programs should consist of many simple functions that (1) perform conceptually natural tasks and (2) can be tested in isolation.
We're going to use the library pytest to write our tests. pytest is full-featured but requires very little in terms of additional code.
We'll cover just the features of pytest that I personally use the most; do check out the documentation to find out about its other features.
To write a basic pytest unit test, you just need your function name to begin with 'test', and it should contain an assert
statement. For example:
def palindrome_detector(s):
"""From assignment 1: Return True of s, once downcased and
with spaces removed, is a palindrome, else return False."""
s = s.lower().replace(' ', '')
return s == s[::-1]
def test_palindrome_detector():
example = 'Lisa Bonet ate no basil'
expected = True
result = palindrome_detector(example)
assert result == expected
Note: the only naming requirement is that the test begin with 'test'. I named mine in a way that reveals the relationship to the function being tested, but this not required. The real connection is in how I use the function palindrome_detector
in line 3 of the test.
Suppose the above code was in a file called palindromes.py
. To run it, you go into the terminal and type
py.test palindromes.py
This is short for
python -m pytest palindromes.py
which will also work.
I personally often like to add the -vv
flag to get an even fuller report:
py.test -vv palindromes.py
If you have lots of tests in palindromes.py
and want to run just one of them:
py.test -vv palindromes.py::test_palindrome_detector
The above version of test_palindrome_detector
hard-codes a single example. This will be cumbersome as we compile lots of test cases. A natural response to this would be to write a for-loop:
def test_palindrome_detector_looping():
examples = [
('deleveled', True),
('Malayalam', True),
('detartrated', True),
('a', True),
('repaper', True),
('Al lets Della call Ed Stella', True),
('Lisa Bonet ate no basil', True),
('Linguistics', False),
('Python', False),
('palindrome', False),
('an', False),
('re-paper', False)]
for example, expected in examples:
result = palindrome_detector(example)
assert result == expected
This is definitely an improvement, but it has a major drawback: the test will stop (and the test report "Failed") as soon as it hits a case where the assert
statement returns False. So we won't see how we did on later tests, which might cause us to fail to see the full scope of the problems.
To address this, pytest uses Python decorators. We haven't covered decorators in this class. There are good tutorials about them on the Net (here's a particularly accessible one). You can, though, just memorize the syntax for now:
import pytest
@pytest.mark.parametrize("example, expected", [
('deleveled', True),
('Malayalam', True),
('detartrated', True),
('a', True),
('repaper', True),
('Al lets Della call Ed Stella', True),
('Lisa Bonet ate no basil', True),
('Linguistics', False),
('Python', False),
('palindrome', False),
('an', False),
('re-paper', False)
])
def test_palindrome_detector_looping_(example, expected):
result = palindrome_detector(example)
assert result == expected
Things to notice:
parametrize
is spelled with only two e's.It's very common to have tests that are centered around a single resource. For example, if you've written a function that reads in a corpus, then you'll have a lot of tests for that function, since it will be so central to your project. It would be inefficient to read the corpus in for every test, and doing that would lead to code with a lot of redundancy. The `@pytest.fixture` decorator allows you to set up these resources once and use them across your tests.
import pytest
import pandas as pd
@pytest.fixture
def toy_csv_df():
link = "https://web.stanford.edu/class/linguist278/data/toy-csv.csv"
df = pd.read_csv(link, index_col=0)
return df
def test_height_mean(toy_csv_df):
expected = 68.9003
result = round(toy_csv_df['Height'].mean(), 4)
assert result == expected
def test_occupation_counts(toy_csv_df):
expected = pd.Series({'Psychologist': 7, 'Linguist': 3})
result = toy_csv_df['Occupation'].value_counts()
assert result.equals(expected)
@pytest.mark.parametrize("subject, expected", [
(1, "Psychologist"),
(5, "Linguist")
])
def test_subject_values(toy_csv_df, subject, expected):
result = toy_csv_df.loc[subject]['Occupation']
assert result == expected
The crucial thing is just that toy_csv_df
is defined as a fixture and then given as an argument to any tests that need it. pytest will run toy_csv_df
just once here.