CS276B / LING 239J
Web Search and Mining
Winter 2005

Meeting Times and Locations

Lecture: TuTh 4:15-5:30 in Gates B12
Review sessions: TBD

Course Description:

From the bulletin: Advanced topics and project in information retrieval. Web search engines including crawling and indexing, link-based algorithms, and web metadata. Collaborative filtering and recommender systems. Text-centric XML indexing and ranked retrieval. User interfaces for IR. Students work in teams to implement a project of their choosing.

Staff Contact Information:

Students should post most questions on the course newsgroup, su.class.cs276b.
Send questions of an individual nature to the staff mailing list at cs276b-win0405-staff@lists.stanford.edu.

Professor: Christopher Manning
Office: Gates 158
Office Hours: Tue 3-4, Wed 2-3
E-mail: manning@cs.stanford.edu

Professor: Prabhakar Raghavan
Office: none
Office Hours: by appointment
E-mail: pragh@db.stanford.edu

TA: Louis Eisenberg
Office: Gates B26 (during office hours only)
Office Hours: Tuesday and Thursday 10:50-11:50 a.m., Thursday 3:10-4:10 p.m.
E-mail: tarheel@stanford.edu

Course admin: Sarah Weden
Office: Gates 419
Email: sweden@db.stanford.edu

Grading Policy:

Project: 50%
- initial proposal: 5%
- milestone #1: 7.5%
- milestone #2: 7.5%
- final submission: 30%
Midterm: 20%
Homework (2 problem sets): 20%
Research paper appraisal/evaluation: 10%

Prerequisites:

Either CS276A or reasonable background in some text and statistical machine learning techniques, such as from CS224N, CS229, or Stat315. (You're not required to have done CS276A to do this course, and the focus is rather different. On the other hand, we will only very briefly review material covered there, so unless you already know appropriate topics from CS276A, you will need to do additional outside reading.)

The course project will require extensive programming.

Textbooks:

There is no required or recommended text. We will distribute readings for each topic. Books that contain considerable material of relevance to the course that you may wish to look at include:

Soumen Chakrabarti. 2003. Mining the Web: Discovering Knowledge from Hypertext Data. Amsterdam: Morgan Kaufmann.
Pierre Baldi, Paolo Frasconi, and Padhraic Smyth. 2003. Modeling the Internet and the Web: Probabilistic Methods and Algorithms. John Wiley.
Christopher Manning and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
Ian Witten and Eibe Frank. 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco, CA: Morgan Kaufmann.
Peter Jackson and Isabelle Moulinier. 2002. Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization. John Benjamins.

Assignment Policies:

Delivery: Assignments must be submitted by 5:30 p.m. Pacific on the due date. Problem sets should be handed to Louis in class or left in the box outside of Professor Manning's office.
Late days: Each student has 5 late days to use at his or her discretion. Please reserve your late days for legitimate emergencies. Each late day constitutes a 24-hour extension; you cannot split late days into smaller increments. If project partners want to take a late day, each student must contribute a day from his or her allotment.
Late penalties: Once a student runs out of late days, any late submissions are penalized at a rate of 10% per day. No assignment may be handed in more than 5 days late.
Collaboration: You may talk to anybody you want about the problem sets, including working through problems together in groups. Indeed, we encourage you to work in groups, and to work with different people through the quarter. However:
- you must state on your written assignment the people you discussed problems with, and
- you are not allowed to take detailed notes in any group sessions that will appear verbatim in assignment write-ups. Everybody has to turn in written homework answers that are written solely by himself/herself.
Regrades: If you feel that we made a mistake in grading one of your assignments, you can resubmit the assignment for a regrade. Please include a brief statement describing which portion(s) you would like us to review and why. Note that when you request a regrade, we reserve the right to review your entire assignment -- i.e. we may find errors in your work that we missed before.

Honor Code:

All actual, detailed work on the solution of problem sets must be individual work. You are encouraged to discuss problem sets with each other in a general way, but if you do so, then you must acknowledge the people with whom you discussed the problem set at the top of your submission.

You should not look for problem answers elsewhere; but again, if material is taken from elsewhere, then you should acknowledge it. For practical exercises, you are not permitted to get programming help from people other than your partner. Normally, you are permitted to use pre-existing code, but you must acknowledge code that you have taken from other sources. In general, we will act and expect you to act according to the Stanford Honor Code.

Back to the CS276B homepage

CS276B / LING 239J Web Search and Mining Winter 2005