Linguist 278: Programming for Linguists (Stanford Linguistics, Fall 2021)

Notes on final projects

Project components

  1. Code base (as a set of Python modules).
  2. A notebook that describes the project goals and illustrates the core functionality of the code base.

Goal

The goal for the final project is to get you to write some code that you will continue to use and improve indefinitely. The ideal project from my perspective is one that solves a problem you currently have in your life.

Logistics

Sample past projects

  1. Code for working with a dataset that is important to your research, with some basic analyses and visualizations.

  2. Code for scraping a website and processing the downloaded files in a way that supported subsequent analysis. The project included a few illustrations of such analyses.

  3. Code for finding all instances of the English dative alternation (hand the book to me / hand me the book) in a large corpus and extracting features from the examples for subsequent analysis.

  4. Code for creating a graph of the Enron corpus, with nodes as email addresses and weighted edges between nodes based on the number of messages exchanged. The project also visualized these graphs. The core library used was networkx.

  5. Code for working with the CMU Pronouncing Dictionary.

  6. Code for reading the CHILDES database, to facilitate research with it.

  7. A simple online experiment using CGI Programming.

  8. A Naive Bayes classifier from scratch. This is an excellent first machine learning implementation.

  9. A good sentence tokenizer that handled sentence-medial punctuation (e.g., Dr. Kim is here. would not get split on the first period) and other tricky edge cases.

  10. A command-line utility to help students make decisions about which Linguistics courses to take based on their goals and past courses.

  11. A command-line utility for set-theoretic closure operations. For example, you could do python closures.py "1 2 3" --powerset and it would print out the powerset of {1, 2, 3}.