CS 124: From Languages to Information

Winter 2016 Dan Jurafsky

The online world has a vast array of unstructured information in the form of language and social networks. Learn how to make sense of it and how to interact with humans via language, from answering questions to giving advice!

Schedule Coursera Material Piazza Forum

TL;DR

Schedule

Week Date Homework Quiz In-class Video Lectures and Readings
1 Jan 5 and 7 - -
  • Tue: Intro Lecture* [pptx] [pdf]

  • Thurs: Group Work: Text Processing with Unix tools (watch the 4 "Basic Text Processing" videos before class) [pptx] [pdf] [solutions]
Basic Text Processing (watch videos before Thursday's class) [slides pptx] [slides pdf]
Edit Distance [slides pptx] [slides pdf]
2 Jan 12 and 14

Homework 1: Spamlord

Due Fri Jan 15, 5:00pm

Quiz 1: Text Processing/Edit Distance

Due Tue Jan 12, 11:59pm

    Tuesday: Language Modeling (same material as video)
No Class Thursday
Language Modeling [slides pptx] [slides pdf]
Spelling Correction and the Noisy Channel [slides pptx] [slides pdf]
3 Jan 19 and 21

Homework 2: AutoCorrect!

Due Fri Jan 22, 5:00pm

Quiz 2: Language Modeling

Due Tuesday Jan 19, 11:59pm

Nave Bayes and Text Classification [slides pptx] [slides pdf]
Sentiment Analysis [slides pptx] [slides pdf]
4 Jan 26 and 28

Homework 3: Thumbs up!

Due Fri Jan 29, 5:00pm

Quiz 3: Text Categorization and Naive Bayes

Due Tuesday Jan 26, 11:59pm

Information Retrieval (I) [slides pptx] [slides pdf]
  • MR+S Chapter 1: Boolean Retrieval (pages 1-17)
  • MR+S Chapter 2: Term vocabulary and postings lists (only pages 33-42)
Information Retrieval (II) [slides pptx] [slides pdf]
  • MR+S Chapter 6: Scoring, term weighting, and the vector space model, (only pages 100 and 107-116)
  • MR+S Chapter 8: Evaluation in Information Retrieval (only pages 139-149)
5 Feb 2 and 4

Homework 4: Search!

Due Fri Feb 5, 5:00pm

Quiz 4: Information Retrieval

Due Tue Feb 2, 11:59pm

Tuesday: Group Work on Information Retrieval [answer key]



Thursday: Relation Extraction and Question Answering (from same material as videos)


Relation Extraction [slides pptx] [slides pdf]
Question Answering [slides pptx] [slides pdf]
6 Feb 9 and 11

Homework 5: Jeopardy!

Due Fri Feb 12, 5:00pm

Quiz 5: Relation Extraction and Question Answering

Due Tue Feb 9, 11:59pm

Tuesday: QA in Watson and Intro to Chatbots*

Thursday: Social Meaning Extraction*

Chat Bots [slides pptx] [slides pdf]
Optional advanced reading
Social Meaning: Extracting Emotion and Personality from Language [slides pptx] [slides pdf]
7 Feb 16 and 18 -

Quiz 6: Chatbots/Emotion Detection

Due Tue Feb 16, 11:59pm

Tuesday Group Work on Smartphone Chatbots + Question Answering

Thursday: Recommender Systems and Vector Semantics*

Recommender systems (Collaborative Filtering) [slides pptx] [slides pdf]
Vector Semantics [slides pptx] [slides pdf]
8 Feb 23 and 25

Homework 6: Chat!

Due Fri Feb 26, 5:00pm

Quiz 7: Recommendation Systems and Vector Semantics

Due Tue Feb 23, 11:59pm

Tuesday: Natalia on Linguistics for Chatbots*

Thursday: PA 6 work time: Class time to work on PA6.

Linguistics for chatbots [slides pdf]
Web graphs, Links, and PageRank [slides pptx] [slides pdf]
  • MR+S Chapter 21: Link Analysis
9 Mar 1 and 3 -

Quiz 8: Pagerank

Due Tue Mar 1, 11:59pm

Tuesday: Peer Grading in class of the Chatbots!
Thursday: Dan's Lecture on Social Networks*
Social Networks [slides pptx] [slides pdf]
10 Mar 8 and 10 -

Quiz 9: Networks and Zipfs Law

Due Tue Mar 8, 11:59pm

Tuesday: NLP Applied to Social and Humanistic Questions*

Thursday: Course Review, Discussion of sample final and its solutions*

[Sample final] [Sample final solutions]
- Mar 14 and 15 - -
Final Exam

The final is:

  • Tuesday Mar 15, 7:00pm-10:00pm, Mudd Chemistry Building AUD

The alternate final is:

  • Monday Mar 14, 3:30pm-6:30pm, Bishop Auditorium (in Lathrop Library)
You can take whichever final you prefer. You don't have to RSVP, just show up.

Course Information

Flipped Class

From Languages to Information is a semi-flipped class with much of the material online.

What this means:

  • Most of the lectures have been video-recorded, and you can watch them at home. The weekly quizzes and programming homeworks will be automatically uploaded and graded. Lecture, quizzes, and homeworks are available on Coursera.
  • 5 of the in-class sessions will be for group problem-solving activities. The remainder will be for some lectures on material not in the videos, a few redundant lectures by me (covering the same material as the videos), a few guest speakers, review sessions, and occasional presentation of state-of-the-art research.
  • Attendence is optional at all the in-class sessions except 8. Nonetheless attendence is highly recommended. Previous students who did well in the class have reported that the in-class group exercises have been extremely useful.
  • In other words: 8 of the in-class sessions are required. When I say required I mean that this material is not on the videos or textbooks and will be tested in the final; I will not be taking attendence. These required lectures are marked with a * and in blue on the syllabus above.

Logistics

Instructor
Dan Jurafsky (jurafsky@stanford.edu)
Office: Margaret Jacks 117
Office Hours: Tuesday 2:00-2:50pm, Thursday 2:00-2:50pm (except no office hours Tuesday Jan 12)
Teaching Assistants

Jade Huang (head TA)
Naveen Arivazhagan
Ignacio Cases
Jennifer Lu
Brad Huang
Raghav Gupta
Brandon Garcia
Mikaela Grace
Kevin McKenzie
Natalia Silveira

TA Office Hours
  • Wednesdays 7:00pm to 10:00pm in Huang 203 and 219
  • Tuesdays 1:15pm to 3:00pm in Huang B020
  • Thursdays 6:00pm to 8:00pm in Huang 203
Class Time

Tuesday and Thursday 3:00-4:20pm in 420-40

Email

If you have a question that is not confidential or personal, post it on the Piazza forum - responses tend to be quicker and have a wider audience. To contact the teaching staff directly, we strongly encourage you to come to office hours. If that is not possible, you can also email (non-technical questions only) to the course staff list, cs124-win1516-staff@lists.stanford.edu. We can not reply to email sent to individual staff members. If you have a matter to be discussed privately, please come to office hours, or use cs124-win1516-staff@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.

We use the mailing list generated by Axess to convey messages to the class. We will assume that all students read these messages.

Honor Code

Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set or interfere in any way with programming assignment scoring or tampering with the submit script.

Since quizzes are a form of assessment, students are not allowed to collaborate on completing quizzes. It is an honor code violation to discuss quiz questions with other students.

Textbooks
  • There is no required textbook, but I will expect you to know the material listed above, drawn from the textbooks and other readings. The material in the readings will be tested on the final exam. Different people may learn better from different combinations of videos/lectures, reading the chapters, or coming to the in-class group exercises. The best-prepared students who do the best on the final exams tend to do all three. But I won't take roll and attendence is up to you.

Course Description

Extracting meaning, information, and structure from human language text, speech, web pages, genome sequences, social networks, or any less structured information. Methods include: string algorithms, edit distance, language modeling, naive Bayes, inverted indices, vector semantics. Applications such as information retrieval, question answering, text classification, social network models, chatbots, genomic sequence alignment, word meaning extraction, recommender systems.

Prerequisites

CS 103, CS 107 and CS 109.

Required Work

Video Lectures

Each week, we will ask you to watch a set of video lectures (2 to 2.5 hours total). The videos will have some in-video questions embedded in them, which you should answer. You are required to watch the videos (or in some cases, attend the lectures that cover the identical material) but the embedded quizzes are not counted toward the final grade.

Automated Review Quizzes

After watching a week's video lectures, we will ask you to answer an open-notes, open-book review quiz (about 5 questions) on the content that you just learned. Each review quiz may be attempted several times, with a time lag of 10 minutes in between each attempt. The questions, as well as the options for each question, are randomly selected from a larger pool each time you take a quiz. We will take the highest score over all attempts for each quiz. The first two attempts will not be penalized; subsequent attempts will incur a cumulative 20% penalty (e.g., the maximum score possible is 80% on the 3rd attempt and 60% on the 4th attempt). Review Quizzes for each week are due 11:59pm Tuesday of the following week. There are no late days for review quizzes.

Class Participaton

Attendence is strongly recommended but optional except for the 8 lectures in blue bold. Reminder: I won't be actually taking attendence but I'll being covering material that is not presented in the textbook or video lectures, and I will test this material on the final. In addition, there will be 5 in-class sessions devoted to group problem-solving. You can get credit for class participation by:: helpful answers on the class forum, asking good question of the invited speakers, helping out other students in office hours, participating in the in-class group exercises, being the first person to find typos in the textbook (not counting bugs in figure or chapter numbering).

Programming Assignments

6 programming assignments (in Java or Python, your choice). Each assignment is due at 5:00pm on the Friday it is due.

Programming Assignment Collaboration: You may talk to anybody you want about the assignments and bounce ideas off each other. But you must write the actual programs yourself.

Late homeworks

You have 4 free late (calendar) days to use on programming assignments 1-5. You cannot use late days on PA 6. Once late days are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day. However, no assignment will be accepted more than four days after its due date.

Readings

This class has a significant amount of textbook reading. Most weeks have around 30 textbook pages. The homeworks and exams will be based heavily on the readings.

Final exam:

  • Tuesday Mar 15, 7:00pm-10:00pm, Mudd Chemistry Building AUD

The alternate final is:

  • Monday Mar 14, 3:30pm-6:30pm, Bishop Auditorium (in Lathrop Library)
You can take whichever final you prefer. You don't have to RSVP, just show up.

Final grade
  • 55% homeworks
  • 27% final exam
  • 11% weekly review quizzes
  • 7% participation in forums and class