Stanford University

CS276B / LING 239J
Web Search and Mining
Winter 2005


Meeting Times and Locations

Lecture: TuTh 4:15-5:30 in Gates B12
Review sessions: TBD

Course Description:

From the bulletin: Advanced topics and project in information retrieval. Web search engines including crawling and indexing, link-based algorithms, and web metadata. Collaborative filtering and recommender systems. Text-centric XML indexing and ranked retrieval. User interfaces for IR. Students work in teams to implement a project of their choosing.

Staff Contact Information:

Students should post most questions on the course newsgroup, su.class.cs276b.
Send questions of an individual nature to the staff mailing list at cs276b-win0405-staff@lists.stanford.edu.

Professor: Christopher Manning
Office: Gates 158
Office Hours: Tue 3-4, Wed 2-3
E-mail: manning@cs.stanford.edu

Professor: Prabhakar Raghavan
Office: none
Office Hours: by appointment
E-mail: pragh@db.stanford.edu

TA: Louis Eisenberg
Office: Gates B26 (during office hours only)
Office Hours: Tuesday and Thursday 10:50-11:50 a.m., Thursday 3:10-4:10 p.m.
E-mail: tarheel@stanford.edu

Course admin: Sarah Weden
Office: Gates 419
Email: sweden@db.stanford.edu

Grading Policy:

Prerequisites:

Either CS276A or reasonable background in some text and statistical machine learning techniques, such as from CS224N, CS229, or Stat315. (You're not required to have done CS276A to do this course, and the focus is rather different. On the other hand, we will only very briefly review material covered there, so unless you already know appropriate topics from CS276A, you will need to do additional outside reading.)

The course project will require extensive programming.

Textbooks:

There is no required or recommended text. We will distribute readings for each topic. Books that contain considerable material of relevance to the course that you may wish to look at include:

  1. Soumen Chakrabarti. 2003. Mining the Web: Discovering Knowledge from Hypertext Data. Amsterdam: Morgan Kaufmann.
  2. Pierre Baldi, Paolo Frasconi, and Padhraic Smyth. 2003. Modeling the Internet and the Web: Probabilistic Methods and Algorithms. John Wiley.
  3. Christopher Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
  4. Ian Witten and Eibe Frank. 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco, CA: Morgan Kaufmann.
  5. Peter Jackson and Isabelle Moulinier. 2002. Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization. John Benjamins.

Assignment Policies:

Honor Code:

All actual, detailed work on the solution of problem sets must be individual work. You are encouraged to discuss problem sets with each other in a general way, but if you do so, then you must acknowledge the people with whom you discussed the problem set at the top of your submission.

You should not look for problem answers elsewhere; but again, if material is taken from elsewhere, then you should acknowledge it. For practical exercises, you are not permitted to get programming help from people other than your partner. Normally, you are permitted to use pre-existing code, but you must acknowledge code that you have taken from other sources. In general, we will act and expect you to act according to the Stanford Honor Code.


Back to the CS276B homepage