Stanford University

CS276B / SYMBSYS 239J / LING 239J
Text Information Retrieval, Mining, and Exploitation
Winter 2003

Christopher Manning, Prabhakar Raghavan, and Hinrich Schütze


Grading Information

General Grading Policy:

Midterm Exam

20%

Final Exam

40%

Project

40% (divided as follows)

Part 1A

8%

Part 1B

8%

Part 2

24%


Midterm Grading Details

The midterm is a one hour exam held during classtime on Thursday, February 6, 2003. It is open book, open notes. The only things disallowed are networked devices. It will cover the following topics:

Document Clustering
- Agglomerative clustering
- Hierarchical clustering
- k-means
- Selecting the number of clusters
- Term vs. document space
- Feature selection
- Clustering to speed up scoring
- Labeling clusters
- Clustering as dimensionality reduction
- Evaluation of text clustering
- Link-based clustering
- Enumerative clustering/trawling

Text Classification
- Methods
- Generative models
- Maximum likelihood
- Naive Bayes
- Multivariate binomial vs. multinomial
- Feature selection via mutual information
- Conditional independence assumption
- Relation to information extraction
- Feature selection
- Evaluating categorization methods

Information Extraction
- Hand coded wrappers
- Wrapper induction: LR, HLRT, BWI wrappers
- Named entity recognition
- FSA-based methods: FASTUS
- Learning information extractors
- HMMs for information extraction
- Web wrappers and agents

Project 1A Grading Details

Project part 1A submissions are due by Monday, January 27, 2003 at 11:59 pm. Your submission should include:

We will grade your project submissions using the following criteria:


Project 1B Grading Details

Project part 1B submissions are due by Tuesday, February 11, 2003 at 11:59 pm. Your submission should include:

We will grade your project submissions using the following criteria:


Back to the CS276B homepage
Last modified: Tue Feb 11 14:38:19 PST 2003