MS&E 317: Algorithms for Modern Data Models
Spring 2014, Stanford University
Mon, Wed 2:15 PM - 3:30 PM at Meyer 143
Ashish Goel |
We traditionally think of algorithms as running on data available in a single location, typically main memory. In many modern applications including web analytics, search and data mining, computational biology, finance, and scientific computing, the data is often too large to reside in a single location, is arriving incrementally over time, is noisy/uncertain, or all of the above.
Paradigms such as map-reduce, streaming, sketching, Distributed Hash Tables, Resilient Distributed Datasets, and random walks have proved useful for these applications. This course will provide an introduction to the design and analysis of algorithms for these modern data models. The class will be largely theoretical, but also involve at least two hands-on programming exercises.
Pre-requisites: Targeting Doctorate students having taken Algorithms at the level of CS 261. Being able to competently program in any main-stream high level language (C, C++, Ruby, Java, Scala, Python, Perl).
There will be 3 homeworks, one scribed lecture, and a take-home final exam. Students can substitute the take-home final with a project. Students taking the class for credit/no credit instead of letter grade can skip the final.
Students are strongly encouraged to attend the Spark Workshop.
Algorithm Design by Kleinberg and Tardos [KT]
Randomized Algorithms by Rajeev Motwani and Prabakhar Raghavan [MR]
Homework 1 [pdf]
Homework 2 [pdf]
Homework 3 [pdf]
Final Exam Due June 9th by midnight
Lectures and References
- Lecture 1: Introduction to Modern Data Models [Scribe Notes], MapReduce Paper
- Lecture 2: Triangle Counting: [Scribe Notes], Curse of the Last Reducer, Counting Triangles with Spark
- Lecture 3: Squaring Matrices on MapReduce [Scribe Notes], DISCO paper, DIMSUM paper, Tall and Skinny SVD, Combiners Description
- Lecture 4: Power iterations and PageRank [Scribe Notes] [slides], PageRank implementation
- Lecture 5: Sketches [Scribe Notes]
- Lecture 6: Applications Sketches, MinHash [Scribe Notes]
- Lecture 7: P-Stability [Scribe Notes]
- Lecture 8: The Secretary Problem [Scribe Notes], Optimizing expected value for the Secretary problem
- Lecture 9: The Perceptron, Intro to Stochastic Gradient Descent [Scribe Notes], Parallel Stochastic Gradient Descent, SGD updates for many ML losses, Sorting/Shuffling on MapReduce
- Lecture 10: Sampling, Locality Sensitive Hashing [Scribe Notes]
- Lecture 11: Locality Sensitive Hashing [Scribe Notes]
- Lecture 12: Parallel SGD, Alternating Direction Method of Multipliers [Scribe Notes], Many Resouces from Stephen Boyd, ADMM on Spark
- Lecture 13: Overview of Large scale optimization. PLANET: Distributed Decision Tree Learning [Scribe Notes] PLANET paper
- Lecture 14: Approximating Cuts. Clustering. [Scribe Notes]
- Lecture 15: Incremental Facility Location. Implementation of LSH. [Scribe Notes]
- Lecture 16: Social Search [Scribe Notes] Sketch Paper
Ashish: ashishg at stanford.edu
Reza: rezab at stanford.edu