Stanford University

CS276B / SYMBSYS 239J / LING 239J
Text Information Retrieval, Mining, and Exploitation
Winter 2003

Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze


Course Information

Lecture: 3 units, TuTh 4:15-5:30 Gates B08 [NB: Different room this quarter!]
TA: Teg Grenager
Staff e-mail: cs276b-win0203-staff@lists.stanford.edu

Announcements:

Course Description:

Document clustering, classification, routing, and recommendation systems. Machine learning methods. Information extraction methods: terminologies and ontology acquisition, named entity recognition, coreference resolution, web wrappers and web agents. Natural language processing techniques: summarization, cross-lingual retrieval, event tracking, question answering and text mining. Biomedical text: special constraints, knowledge discovery, improved performance from integrating textual information.

Prerequisites:

Prerequisites: either CS276A or reasonable background in some background in text and statistical machine learning techniques, such as from CS224N, CS229, or Stat315. (You're not required to have done CS276A to do this course, and the focus is rather different. On the other hand, we will only very briefly review material covered there, and so unless you already know appropriate topics from CS276A, you will need to do additional outside reading.)

The course project will require extensive programming in Java, so previous object oriented programming experience will be very helpful.

Textbooks:

There is no required or recommended text. We will distribute readings for each topic. Books which contain considerable material of relevance to the course that you may wish to look at include:

  1. Soumen Chakrabarti. 2003. Mining the Web: Discovering Knowledge from Hypertext Data. Amsterdam: Morgan Kaufmann.
  2. Christopher Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
  3. Tom Mitchell. 1997. Machine Learning. McGraw Hill.
  4. Ian Witten and Eibe Frank. 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco, CA: Morgan Kaufmann.

Grading Policy:

Midterm Exam

20%

Final Exam

40%

Project

40% (divided as follows)

Part 1A

8%

Part 1B

8%

Part 2

24%

Staff Contact Information:

We request that you send questions to the staff mailing list at cs276b-win0203-staff@lists.stanford.edu when appropriate. There is also a course newsgroup at su.class.cs276b where students can help one another.

Professor: Christopher Manning
Office: Gates Bldg., Rm 418
Office Hours: F 10:00-12:00
E-mail: manning@cs.stanford.edu

Professor: Prabhakar Raghavan
Office:
Office Hours: By Appt.
E-mail: pragh@db.stanford.edu

Professor: Hinrich Schütze
Office:
Office Hours: By Appt.
E-mail: schuetze@csli.stanford.edu

TA: Teg Grenager
Office: Gates Bldg., Rm 454
Office Hours: Mon 2:00-3:00, Thurs 10:00-11:00
E-mail: teg@cs.stanford.edu

Additional Information:


Last modified: Fri Mar 14 15:18:36 PST 2003