![]() |
CS276B / SYMBSYS 239J / LING 239J |
Date | Topics | Notes | Who | Readings | Project |
---|---|---|---|---|---|
Tue Jan 7 |
Clustering I: Introduction to the problem. Partitioning: k
means/BFR. Course administrivia. |
[ powerpoint ]
[ pdf (large) ] [ pdf (small) ] |
PR |
paper to read
Scatter/Gather Data Clustering Review |
|
Thu Jan 9 |
Clustering II: hierarchical clustering. Applications to text: features and details. Course Overview |
[ powerpoint ]
[ pdf (large) ] [ pdf (small) ] |
PR |
Initialization of iterative refinement clusting algorithms Scaling Clustering Algorithms to Large Databases |
|
Tue Jan 14 | Discussion of Project |
[ powerpoint ]
[ pdf (large) ] [ pdf (small) ] |
Project information handout Project tools tutorial handout |
Project part 1A assigned | |
Thu Jan 16 | Clustering III Link-Based Clustering Enumerative clustering/trawling Syntactic clustering of the web |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
PR |
http://citeseer.nj.nec.com/agrawal93mining.html http://citeseer.nj.nec.com/agrawal94fast.html http://citeseer.nj.nec.com/azar00spectral.html http://citeseer.nj.nec.com/272770.html http://citeseer.nj.nec.com/context/843212/0 http://citeseer.nj.nec.com/72529.html |
|
Tue Jan 21 |
Text Classification I: Introduction Naive Bayes methods Spam Filtering |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
CM |
http://citeseer.nj.nec.com/mccallum98comparison.html http://citeseer.nj.nec.com/yang99reexamination.html A Plan for Spam, by Paul Graham. Better Bayesian Filtering. Paul Graham. 2003 Spam Conference 2003 Spam Conference proceedings |
|
Thu Jan 23 |
Text Classification II Features for text classification Nearest-neighbor (kNN) approaches Evaluation of Classification |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
HS |
http://citeseer.nj.nec.com/yang97comparative.html http://citeseer.nj.nec.com/lewis95evaluating.html |
|
Tue Jan 28 |
Information Extraction I Introduction Named entity recognition FSA-based methods |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
CM | readings | Project part 1A due Monday |
Thu Jan 30 | Project |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
http://www.ai.sri.com/~appelt/ie-tutorial Kushmerick, Weld, Doorenbos. Wrapper induction for information extraction, IJCAI 1997. http://citeseer.nj.nec.com/soderland99learning.html |
Project part 1B assigned | |
Tue Feb 4 |
Information Extraction II Learning information extractors HMMs Web wrappers and agents |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
CM |
http://citeseer.nj.nec.com/califf97relational.html http://citeseer.nj.nec.com/leek97information.html http://citeseer.nj.nec.com/bikel97nymble.html http://citeseer.nj.nec.com/seymore99learning.html http://citeseer.nj.nec.com/freitag00information.html |
|
Thu Feb 6 | Midterm to be held in-class |
|
|
Midterm answer key | |
Tue Feb 11 |
Text Classification III Overview of other methods: Decision trees, Maximum Entropy/Logistic Regression, Meta tagging |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
CM |
Dumais, Platt, Heckerman, and Sahami. 1998. Inductive learning algorithms and representations for text categorization. CIKM 1998. http://citeseer.nj.nec.com/zhang00text.html Reuters dataset Tim Berners Lee on semantic web Resource Description Framework Berkeley HMM Tutorial |
Project part 1B due |
Thu Feb 13 |
Text Classification IV Even more methods: support vector machines, Link-based, neural nets Active Learning Language ID |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
HS |
readings
|
|
Tue Feb 18 |
Recommendation Systems I Collaborative Filtering |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
PR |
http://citeseer.nj.nec.com/resnick94grouplens.html http://citeseer.nj.nec.com/shardanand95social.html http://citeseer.nj.nec.com/sarwar01itembased.html |
Project part 2 project plan due |
Thu Feb 20 |
Recommendation Systems II Contextualization Personalization Expert search |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
PR | readings | |
Tue Feb 25 |
Text Mining I: What it is? Terminology learning Ontologies from/for IE Metadata |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
CM/HS | readings | |
Thu Feb 27 |
Text Mining II Coreference resolution Topic Detection and Tracking Summarization |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
HS | readings | |
Tue March 4 |
Text Mining III Question Answering |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
CM | readings | Project part 2 checkpoint submission |
Thu Mar 6 |
Bioinformatics Special constraints in bioinformatics IR with textual and non-textual data |
[ pdf (large) ] | HS | readings | |
Tue Mar 11 |
Bioinformatics Text mining for bioinformatics: gene functions; gene-drug interactions |
[
powerpoint
]
[ pdf (large) ] [ pdf (small) ] |
HS | readings | |
Thu March 13 | Presentation of Projects |
|
|
|
Project part 2 due |
March 21, 2003 |
Final Exam 12:15-3:15pm Gates B08 |
Practice Questions
|
|
|