Stanford University

CS276B / SYMBSYS 239J / LING 239J
Text Information Retrieval, Mining, and Exploitation
Winter 2003

Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze


Provisional Course Syllabus



Date Topics Notes Who Readings Project
Tue Jan 7 Clustering I: Introduction to the problem. Partitioning: k means/BFR.
Course administrivia.
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
PR paper to read
Scatter/Gather
Data Clustering Review
Thu Jan 9 Clustering II: hierarchical clustering.
Applications to text: features and details.
Course Overview
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
PR Initialization of iterative refinement clusting algorithms
Scaling Clustering Algorithms to Large Databases
Tue Jan 14 Discussion of Project [ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
Project information handout
Project tools tutorial handout
Project part 1A assigned
Thu Jan 16 Clustering III
Link-Based Clustering
Enumerative clustering/trawling
Syntactic clustering of the web
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
PR http://citeseer.nj.nec.com/agrawal93mining.html
http://citeseer.nj.nec.com/agrawal94fast.html
http://citeseer.nj.nec.com/azar00spectral.html
http://citeseer.nj.nec.com/272770.html
http://citeseer.nj.nec.com/context/843212/0
http://citeseer.nj.nec.com/72529.html
Tue Jan 21 Text Classification I: Introduction
Naive Bayes methods
Spam Filtering
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
CM http://citeseer.nj.nec.com/mccallum98comparison.html
http://citeseer.nj.nec.com/yang99reexamination.html
A Plan for Spam, by Paul Graham.
Better Bayesian Filtering. Paul Graham. 2003 Spam Conference 2003 Spam Conference proceedings
Thu Jan 23 Text Classification II
Features for text classification
Nearest-neighbor (kNN) approaches
Evaluation of Classification
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
HS http://citeseer.nj.nec.com/yang97comparative.html
http://citeseer.nj.nec.com/lewis95evaluating.html
Tue Jan 28 Information Extraction I
Introduction
Named entity recognition
FSA-based methods
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
CM readings Project part 1A due Monday
Thu Jan 30 Project [ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
http://www.ai.sri.com/~appelt/ie-tutorial
Kushmerick, Weld, Doorenbos. Wrapper induction for information extraction, IJCAI 1997.
http://citeseer.nj.nec.com/soderland99learning.html
Project part 1B assigned
Tue Feb 4 Information Extraction II
Learning information extractors
HMMs
Web wrappers and agents
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
CM http://citeseer.nj.nec.com/califf97relational.html
http://citeseer.nj.nec.com/leek97information.html
http://citeseer.nj.nec.com/bikel97nymble.html
http://citeseer.nj.nec.com/seymore99learning.html
http://citeseer.nj.nec.com/freitag00information.html
Thu Feb 6 Midterm to be held in-class

Midterm answer key
Tue Feb 11 Text Classification III
Overview of other methods: Decision trees, Maximum Entropy/Logistic Regression, Meta tagging
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
CM Dumais, Platt, Heckerman, and Sahami. 1998. Inductive learning algorithms and representations for text categorization. CIKM 1998.
http://citeseer.nj.nec.com/zhang00text.html
Reuters dataset
Tim Berners Lee on semantic web
Resource Description Framework
Berkeley HMM Tutorial
Project part 1B due
Thu Feb 13 Text Classification IV
Even more methods: support vector machines, Link-based, neural nets
Active Learning
Language ID
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
HS readings
Tue Feb 18 Recommendation Systems I
Collaborative Filtering
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
PR http://citeseer.nj.nec.com/resnick94grouplens.html
http://citeseer.nj.nec.com/shardanand95social.html
http://citeseer.nj.nec.com/sarwar01itembased.html
Project part 2 project plan due
Thu Feb 20 Recommendation Systems II
Contextualization
Personalization
Expert search
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
PR readings
Tue Feb 25 Text Mining I: What it is?
Terminology learning
Ontologies from/for IE Metadata
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
CM/HS readings
Thu Feb 27 Text Mining II
Coreference resolution
Topic Detection and Tracking
Summarization
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
HS readings
Tue March 4 Text Mining III
Question Answering
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
CM readings Project part 2 checkpoint submission
Thu Mar 6 Bioinformatics
Special constraints in bioinformatics
IR with textual and non-textual data
[ pdf (large) ] HS readings
Tue Mar 11 Bioinformatics
Text mining for bioinformatics: gene functions; gene-drug interactions
[ powerpoint ]
[ pdf (large) ]
[ pdf (small) ]
HS readings
Thu March 13 Presentation of Projects


Project part 2 due
March 21, 2003 Final Exam
12:15-3:15pm
Gates B08
Practice Questions



Back to the CS276B homepage
Last modified: Fri Mar 14 15:15:07 PST 2003