CS 124: From Languages to Information

Winter 2015 Dan Jurafsky

The web is a vast world of unstructured information — text and speech in multiple languages, social networks, tags, and all sorts of human interactions. Learn how to make sense of it!

Online Offering

From Languages to Information has much of the material online.

What this means:


Week Date Homework Quiz In-class Video Lectures and Readings
1 Jan 6 and 8 - -
  • Tue: Intro Lecture* [pptx] [pdf]

  • Thurs: Group Work: Text Processing with Unix tools [pptx] [pdf]
Basic Text Processing [slides pptx] [slides pdf]
Edit Distance [slides pptx] [slides pdf]
2 Jan 13 and 15

Homework 1: Spamlord

Due Fri Jan 16, 5:00pm

Quiz 1: Text Processing/Edit Distance

Due Tue Jan 13, 11:59pm

    Tuesday: Dan Lecture on Language Modeling (same material as video)
No Class Thursday
Language Modeling [slides pptx] [slides pdf] (skip the video/slides on Good Turing Smoothing)
Spelling Correction and the Noisy Channel [slides pptx] [slides pdf]
3 Jan 20 and 22

Homework 2: AutoCorrect!

Due Fri Jan 23, 5:00pm

Quiz 2: Language Modeling

Due Tuesday Jan 20, 11:59pm

Nave Bayes and Text Classification [slides pptx] [slides pdf]
Sentiment Analysis [slides pptx] [slides pdf]
4 Jan 27 and 29

Homework 3: Thumbs up!

Due Fri Jan 30, 5:00pm

Quiz 3: Text Categorization and Naive Bayes

Due Tuesday Jan 27, 11:59pm

  • Thursday: Rob Munro Guest Lecture: "Artificial Intelligence for Social Good" [slides pptx] [slides pdf]
Information Retrieval (I) [slides pptx] [slides pdf]
  • MR+S Chapter 1: Boolean Retrieval (pages 1-17)
  • MR+S Chapter 2: Term vocabulary and postings lists (only pages 33-42)
Information Retrieval (II) [slides pptx] [slides pdf]
  • MR+S Chapter 6: Scoring, term weighting, and the vector space model, (only pages 100 and 107-116)
  • MR+S Chapter 8: Evaluation in Information Retrieval (only pages 139-149)
5 Feb 3 and 5

Homework 4: Search!

Due Fri Feb 6, 5:00pm

Quiz 4: Information Retrieval

Due Tue Feb 3, 11:59pm

Tuesday Group Work on Information Retrieval [solutions]

Thursday: Dan Lecture on Relation Extraction and Question Answering (from same material as videos)

Relation Extraction [slides pptx] [slides pdf]
Question Answering [slides pptx] [slides pdf]
6 Feb 10 and 12

Homework 5: Jeopardy!

Due Fri Feb 13, 5:00pm

Quiz 5: Relation Extraction and Question Answering

Due Tue Feb 10, 11:59pm

Tuesday: 3:15-3:45. Dan finish lecture on Question Answering, focusing on Watson

Tuesday 3:45-4:30: Group Work on Question Answering in the Mobile Domain

Thursday: Dan lecture on Machine Translation (same material as parts of videos)

Machine Translation 1 [slides pptx] [slides pdf]
Machine Translation 2 [slides pptx] [slides pdf]
7 Feb 17 and 19 -

Quiz 6: Machine Translation

Due Tue Feb 17, 11:59pm

Tuesday: Dan Lecture on Social Meaning Extraction*

Thursday: 3:15-3:45 Dan Lecture on Part-of-Speech Tagging [slides pptx] [slides pdf] (The same material can be found by reading JM 3ed Chapter 8, pages 1-8)

Thursday: 3:45-4:30 PA 6 work time: Come with your groups to class to get started on PA6.

Speech and Social Meaning Extraction [slides pptx] [slides pdf]
8 Feb 24 and 26

Homework 6: Translate!

Due Fri Feb 27, 5:00pm

No quiz this week

Tuesday: Guest Lecture: Andrew Maas, Stanford and ex-Coursera: "NLP for Online Education"*

Thursday: PA 6 work time again: Come with your groups to class to work on PA6.

Web graphs, Links, and PageRank [slides pptx] [slides pdf]
9 Mar 3 and 5 -

Quiz 7: Pagerank

Due Tue Mar 3, 11:59pm

Tuesday: Dan Lecture on PageRank (same material as videos)
Thursday: Dan's Lecture on Social Networks*
Social Networks [slides pptx] [slides pdf]
10 Mar 10 and 12 -

Quiz 8: Networks and Zipfs Law

Due Tue Mar 10, 11:59pm

Tuesday: Dan's Lecture: Extraction of Social Meaning from Everyday Language: Dating and Food*

Thursday: Course Review, Discussion of Practice Final and its Solutions

- Mar 17 and 18 - -
Final Exam

You can take the final exam at either the regular or the alternate time. You don't have to RSVP, just show up for one of the two. Obviously you can't take both. Note that the Tuesday alternate final is in our regular classroom but the Wednesday final is not! Check the room!

  • Tuesday Mar 17, 12:15pm-3:15pm, 420-040
  • Wednesday Mar 18, 12:15pm-3:15pm, Hewlett 200

We will be giving you a sample final.

Course Information


Dan Jurafsky (jurafsky@stanford.edu)
Office: Margaret Jacks 117
Office Hours: Thursdays 2-3pm
Teaching Assistants

Adam Perelman (head TA)
Tulsee Doshi
Jon Gauthier
Vikesh Khanna
Gina Pai
Peng Qi
Pararth Paresh Shah
Jagadish Venkatraman

TA Office Hours
  • Wednesdays 7:00pm to 10:00pm in Huang 203 and 219
  • Tuesdays 1:15pm to 3:00pm in Huang B019
  • Thursdays 6:00pm to 8:00pm in Huang B019
Class Time

Tuesday and Thursday 3:15-4:30pm in 420-40


If you have a question that is not confidential or personal, post it on the Piazza forum - responses tend to be quicker and have a wider audience. To contact the teaching staff directly, we strongly encourage you to come to office hours. If that is not possible, you can also email (non-technical questions only) to the course staff list, cs124-win1415-staff@lists.stanford.edu. We can not reply to email sent to individual staff members. If you have a matter to be discussed privately, please come to office hours, or use cs124-win1415-staff@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.

We use the mailing list generated by Axess to convey messages to the class. We will assume that all students read these messages.

Honor Code

Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set or interfere in any way with programming assignment scoring or tampering with the submit script.

  • There is no required textbook, but I will expect you to know the material listed above, drawn from the textbooks and other readings. The material in the readings will be tested on the final exam. Different people may learn better from different combinations of videos/lectures, reading the chapters, or coming to the in-class group exercises. The best-prepared students who do the best on the final exams tend to do all three. But I won't take roll and attendence is up to you.
    • Online new chapters from Jurafsky and Martin. third edition in progress. Speech and Language Processing. I will be giving you the PDFs.
    • Chapters from Manning, Raghavan, and Schutze. 2008. Introduction to Information Retrieval. Cambridge University Press. You can buy the book, get it from the library, or it's also available online *HERE*.

Course Description

Extracting meaning, information, and structure from human language text, speech, web pages, genome sequences, social networks, or any less structured information. Methods include: string algorithms, edit distance, language modeling, naive Bayes, inverted indices, vector semantics. Applications such as information retrieval, question answering, text classification, social network models, machine translation, genomic sequence alignment, word meaning extraction.


CS 103, CS 107 and CS 109.

Required Work

Video Lectures

Each week, we will ask you to watch a set of video lectures (2 to 2.5 hours total). The videos will have some in-video questions embedded in them, which you should answer. You are required to watch the videos (or in some cases, attend the lectures that cover the identical material) but the embedded quizzes are not counted toward the final grade.

Automated Review Quizzes

After watching a week's video lectures, we will ask you to answer an open-notes, open-book review quiz (about 5 questions) on the content that you just learned. Each review quiz may be attempted several times, with a time lag of 10 minutes in between each attempt. The questions, as well as the options for each question, are randomly selected from a larger pool each time you take a quiz. We will take the highest score over all attempts for each quiz. The first two attempts will not be penalized; subsequent attempts will incur a cumulative 20% penalty (e.g., the maximum score possible is 80% on the 3rd attempt and 60% on the 4th attempt). Review Quizzes for each week are due 11:59pm Tuesday of the following week. There are no late days for review quizzes.

Class Participaton

Attendence is strongly recommended but optional except for the first day of class and 4 other lectures: 2 guest lectures and my lectures on networks and social meaning extraction. I won't be actually taking attendence but I'll being covering material that is not presented in the textbook or video lectures, and I will test this material on the final. Since lectures are on-line, the in-class sessions will be used mainly for group problem-solving, reviews, and occasional backup-lectures re-covering the video material. You can get credit for class participation by helpful answers on the class forum, asking good question of the invited speakers, helping out other students in office hours, etc.

Programming Assignments

6 programming assignments (in Java or Python, your choice). Each assignment is due at 5:00pm on the Friday it is due.

Programming Assignment Collaboration: You may talk to anybody you want about the assignments and bounce ideas off each other. But you must write the actual programs yourself.

Late homeworks

You have 4 free late (calendar) days to use on programming assignments 1-5. For the group homework PA 6, the number of late days is the mean of the late days of each person in your group, all fractions rounded up. (e.g., if your 3 members have 0, 1, and 3 late days left, your team will have 4/3 = 1.3 rounded up to 2 late days). Once these are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day. However, no assignment will be accepted more than four days after its due date.


This class has a significant amount of textbook reading. Most weeks have around 30 textbook pages. The homeworks and exams will be based heavily on the readings.

Final exam:

Wednesday Mar 18, 12:15pm-3:15pm, Hewlett 200

Final grade
  • 57% homeworks
  • 29% final exam
  • 9% weekly review quizzes
  • 5% participation in forums and class