CS 124: From Languages to Information

Winter 2013 · Chris Manning

The modern web world is a vast world of unstructured information... human language content, social networks, tags, etc. Learn how to make sense of it.

Schedule Coursera Material

Online Offering

From Languages to Information is offered online, adopting the format used by CS145 and CS229A!

What this means:

Schedule

Week Date Homework In-class Video Lectures and Readings
1 Jan 8 and 10 -
Basic Text Processing [slides pptx] [slides pdf]
  • J+M Section 2.1 Regular Expressions (17-26)
  • J+M section 3.9 Word and Sentence Tokenization (68-72)
  • MR+S Chapter 2: Term vocabulary and postings lists (Online version: 19-35, Paper version: 18-33)
  • Ken Church's tutorial Unix for Poets, at least pages 1-19
Edit Distance [slides pptx] [slides pdf]
  • J+M section 3.11: Minimum Edit Distance (pages 72-77)
2 Jan 15 and 17

Homework 1: Spamlord

Due Fri Jan 18, 5:00pm

    Grep and Regular Expressions [pdf]
Language Modeling [slides pptx] [slides pdf]
  • J+M Chapter 4, N-grams
Spelling Correction and the Noisy Channel [slides pptx] [slides pdf]
3 Jan 22 and 24

Homework 2: AutoCorrect!

Due Fri Jan 25, 5:00pm

Naïve Bayes and Text Classification [slides pptx] [slides pdf]
  • MR+S Chapter 13: Text classification and Naïve Bayes (skip sections 13.3 and 13.5) (Online version: 253-270, Paper version: 234-250)
Sentiment Analysis [slides pptx] [slides pdf]
4 Jan 29 and 31

Homework 3: Thumbs up!

Due Fri Feb 1, 5:00pm

  • Naïve Bayes and Sentiment Analysis [Slides] [Starter Code]
  • On corn, copy /afs/ir.stanford.edu/class/cs124/sections/section3/starter_code
MaxEnt Classifiers [slides pptx] [slides pdf]
  • J+M Chapter 6: Logistic Regression and MaxEnt Models, pages 193-211(=IE 227-245)
MEMM Sequence Models and Named Entity Tagging [slides pptx] [slides pdf]
  • J+M Chapter 22: Information Extraction, pages 727-734, 743-749 (=IE 761-768, 777-783)
5 Feb 5 and 7

Homework 4: Extract!

Due Fri Feb 8, 5:00pm

Named Entity Classification [slides pdf] [Starter code]
Information Retrieval (I) [slides pptx] [slides pdf]
  • MR+S Chapter 1: Boolean Retrieval
  • The rest of MR+S Chapter 2: Term vocabulary and postings lists
Information Retrieval (II) [slides pptx] [slides pdf]
  • MR+S Chapter 6: Scoring, term weighting, and the vector space model
  • MR+S Chapter 8: Evaluation in Information Retrieval
6 Feb 12 and 14

Homework 5: Search!

Due Fri Feb 15, 5:00pm

Information Retrieval [slides pdf]
Relation Extraction [slides pptx] [slides pdf]
  • J+M Chapter 22: Information Extraction page 734-762 (=IE 768-785)
XML: accessing structured information [slides pptx] [slides pdf]
  • The Wikipedia page on XML through the end of Section 6, Processing XML Files. (I.e., stop when you get to "History")
  • The Wikipedia page on DTD
  • XML in a Nutshell via Safari Tech books, Chapter 8 (XSLT),
  • XML in a Nutshell via Safari Tech books, Chapter 9 (XPath),

To get these, go to library.stanford.edu/ezproxy/, choose Safari Tech Books, and search for XML in a Nutshell.

7 Feb 19 and 21 - -
Word Meaning and Word Similarity [slides pptx] [slides pdf]
  • J+M Chapter 19: Lexical Semantics (pages 611-619 = IE 645-653)
  • J+M Chapter 20 Computational Lexical Semantics 20 (pages 652-670 = IE 686-704)
Question Answering [slides pptx] [slides pdf]
8 Feb 26 and 28

Homework 6: Jeopardy!

Due Fri Mar 1, 5:00pm

XML, Relation Extraction & QA [starter code]
Machine Translation 1 [slides pptx] [slides pdf]
  • J+M Chapter 25: Machine Translation, page 859-879 (=IE 895-915)
Machine Translation 2 [slides pptx] [slides pdf]
  • J+M Chapter 25: Machine Translation, page 879-897 (=IE 915-933)
9 Mar 5 and 7

Homework 7: Translate!

Due Fri Mar 8, 5:00pm

Machine Translation [slides]
Web graphs, Links, and PageRank [slides pptx] [slides pdf]
10 Mar 12 and 14 - -
Social Networks [slides pptx] [slides pdf]
- Mar 22 - -
Final Exam

Friday March 22, 12:15-3:15pm, Location: Cubberly Auditorium

(Alternate) Tuesday Mar 19, 12:15pm-3:15pm, Location: Annenberg Auditorium

Course Information

Logistics

Instructor
Chris Manning (manning@cs.stanford.edu)
Office: Gates 158
Office Hours: Thu 5-6
Teaching Assistants

Leon Lin (Head TA), Mason Chua, Thomas Dimson, Milind Ganjoo, Kevin Nguyen and Rukmani Ravisundaram

TA Office Hours
  • Tuesdays 2:15 to 4:00 p.m.
  • Wednesdays 7:00 to 10:00 p.m. (Group Coding Session)
  • Thursdays 6:00 to 8:00 p.m.

Locations change, and will be updated on Piazza.

Class Time

Tuesday and Thursday 9:30-10:45am in 260-113

Portal

The portal for the online part of the class is available on Coursera (sign up).

Discussion

The class forum for all technical questions and bug reports is available on Piazza (sign up)

Email

Mail non-technical questions only to cs124-win1213-staff@lists.stanford.edu. We will not reply to email sent to individual staff members. If you have a matter to be discussed privately, please come to office hours, or use cs124-win1213-staff@lists.stanford.edu to make an appointment.

We prefer that most questions are posted on the Piazza forum - responses tend to be quicker and have a wider audience.

We use the mailing list generated by Axess to convey messages to the class. We will assume that all students read these messages.

Textbooks
  • Required: Jurafsky and Martin. 2009. Speech and Language Processing (2nd Edition). Pearson
  • Recommended: Manning, Raghavan, and Schutze. 2008. Introduction to Information Retrieval. Cambridge University Press.

Readings from MR+S are required, but the reading are available here (the published book).

Course Description

Extracting meaning, information, and structure from human language text, web pages, social networks, genome sequences, or any less structured information. Methods include: string algorithms, edit distance, naive Bayes and MaxEnt classifiers, language modeling, XML processing. Applications such as information retrieval, question answering, text classification, social network models, machine translation, genomic sequence alignment, word meaning extraction.

Prerequisites

CS 103, CS 107 and CS 109.

Required Work

Video Lectures

Each week, we will ask you to watch a set of video lectures (2 to 2.5 hours total). The videos will have some in-video questions embedded in them, which you should answer. You are required to watch the videos, but the embedded quizzes are not counted toward the final grade.

Automated Review Quizzes

After watching a week's video lectures, we will ask you to answer an open-notes, open-book review quiz (about 5 questions) on the content that you just learned. Each review quiz may be attempted several times, with a time lag of 10 minutes in between each attempt. The questions, as well as the options for each question, are randomly selected from a larger pool each time you take a quiz. We will take the highest score over all attempts for each quiz. The first two attempts will not be penalized; subsequent attempts will incur a cumulative 20% penalty (e.g., the maximum score possible is 80% on the 3rd attempt and 60% on the 4th attempt). Review Quizzes for each week are due 11:59pm Tuesday of the following week. There are no late days for review quizzes.

Class Participaton

Since lectures are on-line, the in-class sessions Tuesday and Thursday mornings will be used for problem-solving, reviews, discussions, guest speakers from industry, and presentation of state-of-the-art research. You can get extra credit for class participation by answering questions on the class forum.

Homeworks

7 programming assignments (in Java or Python, your choice). Each assignment is due at 5:00pm on the Friday it is due.

Homework Collaboration: You may talk to anybody you want about the assignments and bounce ideas off each other. But you must write the actual homeworks and programs yourself.

Late homeworks

You have 4 free late (calendar) days to use on the homeworks. Once these are exhausted, any homework turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day.

Readings

We will expect you to do a significant amount of textbook reading in this course.

Final exam

Friday Mar 22, 12:15pm-3:15pm in Cubberly Auditorium

(Alternate) Tuesday Mar 19, 12:15pm-3:15pm in Annenberg Auditorium

Final grade
  • 56% homeworks
  • 30% final exam
  • 9% weekly review quizzes
  • 5% attendance and participation