CME 323: Distributed Algorithms and Optimization
Spring 2016, Stanford University
Mon, Wed 1:30 PM - 2:50 PM at Braun Lecture Hall, Mudd Chemistry Building
Instructor: Reza Zadeh
The emergence of large distributed clusters of commodity machines
has brought with it a slew of new algorithms and tools.
Many fields such as Machine Learning and Optimization
have adapted their algorithms to handle such clusters.
The class will cover widely used
distributed algorithms in academia and industry.
The course will begin with an introduction
to fundamentals of parallel and distributed runtime analysis. Afterwards,
we will cover distributed algorithms for:
- Convex Optimization
- Matrix Factorization
- Machine Learning
- Neural Networks
- Numerical Linear Algebra
- Large Graph analysis
- Streaming algorithms
We will focus on the analysis of parallelism and distribution costs of algorithms.
Sometimes, topics will be illustrated with hands-on exercises
using Apache Spark.
Pre-requisites: Targeting graduate students having
taken Algorithms at the level of CME 305 or CS 261.
Being able to competently program in any main-stream high level language.
There will be homeworks, a midterm, one scribed lecture, and a project.
Homeworks and scribing: 40%
The midterm will be Monday May 2nd in class
by Guy E. Blelloch and Bruce M. Maggs [BB]
Models of Computation
by John E. Savage [S]
Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein [CLRS]
by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia [KKWZ]
by Boyd and Vandenberghe [BV]
Algorithm Design, by Kleinberg and Tardos [KT]
Homework 1 [pdf] [tex] Due April 11th. [Solution]
Homework 2 [pdf] [tex] Due April 27th. [Solution]
Homework 3 [pdf] [tex] Due May 18th. [Solution]
Homework 4 [pdf] [tex] Due May 25th. [Solution]
Lectures and References
- Lecture 1: Fundamentals of Distributed and Parallel algorithm analysis, Reading: BB Chapter 1
- Lecture 2: Caveats of Parallel Algorithms, the Master Theorem, and parallel matrix multiplication, Reading: KT Chapter 5, Strassen 1969
- Lecture 3: Parallel Strassen's Algorithm and Parallel Mergesort, Cole 1988
- Lecture 4: Parallel Mergesort, General Divide and Conquer, Parallel Selection, Parallel Quicksort, Blum et. al. 1971, Prefix-scan QSort, Reading: BB Chapter 2.1, CLRS 27.3.
- Lecture 5: Minimum Spanning Trees, Boruvka's Algorithm, Boruvka (1926), Reading: KT Chapters 3, 4.5, 4.6,
- Lecture 6: Review of Master Theorem, Closest Pair Problem, Master Theorem Examples, KT 5.4,
- Lecture 7: Set Representation, Graph Contractions, Connectivity AVL Trees , Red Black Trees, Reading: Primer on BST's (Sections 6, 7), CLRS Ch. 12, 13
- Lecture 8: Multicore Optimization (Seperable Objective Functions), Job Scheduling. Hogwild!
- Lecture 9: Job Scheduling, Intro to Distributed Computing, Intro to Streaming Algorithms.
- Lecture 10: Intro to Distributed Computing, Communication Protocols, Midterm Review.
- Lecture 11: Midterm, Midterm Solutions
- Lecture 12: Intro to MapReduce, Spark and other Distributed Computing Tools Intro to Spark, Intro to Distributed Optimization,
- Lecture 13: Gradient Descent in Spark, Communication Patterns. Intro to Distributed Optimization, Communication Patterns,
- Lecture 14: Measures of Complexity, Triangle Counting.
- Lecture 15: Wrap-Up Node Iterator (Triangle Counting), Combiners, Broadcasting, Spark in other Programming Languages. Curse of the Last Reducer, Broadcasting in Spark
- Lecture 16: Sorting, Partitioning for Page Rank, Distributed Matrix Computations. Pregel Slides, Page Rank Slides, Pregel: A System for Large Scale Graph Processing, Scaling! But at what COST?
- Lecture 17: Covariance Matrices and All-Pairs Similarity, Matrix Computations, General Convex Optimization. ADMM, Dimension Independent Similarity Computation.
- Lecture 18: Matrix Multiplications, Singular Value Decomposition. Large Scale Distributed Deep Networks.
Amy Shoemaker and Sagar Vare: Edmonds' Blossom algorithm. [slides][report][Github]
Sebastian Dubois and Sebastien Levy: Distributed Lasso. [slides][report]
Alex Adamson: GloVe on Spark. [slides][report][Github]
Wissam Baalbaki: Large-scale matrix factorization with distributed stochastic gradient descent: implementation in Spark and testing on Netflix movie recommendation. [slides][report]
Jan Bae: Distributed graph coloring. [slides][report]
Irwan Bello: Asynchronous lock-free parallel Deep Reinforcement Learning. [slides]
Max Bodoia: MapReduce algorithms for k-means clustering. [slides][report]
Erik Burton: Parallel Held-Karp algorithm for the Hamiltonian cycle problem. [slides][report]
Yi-Chun Chen and Yu-Sheng Chen: A distributed implementation for Reinforcement Learning. [slides][report]
Henry Ehrenberg: Gibbs PRAMpling. [slides][report][Github]
David Flatow and Daniel Penner: A distributed algorithm for global min-cut. [slides][report]
Rolland He: PARADIS: a parallel in-place radix sort algorithm. [slides][report][Github]
Vishakh Hedge and Sheema Usmani: Parallel and distributed learning. [slides][report]
Xin Jin: Parallel auction algorithm for linear assignment problem. [slides][report][Github]
Stephen Kline and Kevin Shaw: Distributed CUR decomposition for bi-clustering. [slides][report]
Christopher Kurrus and Henry Neeb: Distributed k-nearest neighbors. [slides][report]
Patrick Landreman: Alternative least squares in Spark. [slides][report]
Ting-Po Lee and Taman Narayan: Distributed language models using RNNs. [slides][report][Github]
Nikhil Parthasarathy and Pin Pin Tea-Mangkornpan: Low-rank matrix factorization using distributed SGD in Spark. [slides][report][Github]
Milind Rao: Distributed multi-armed bandits. [slides][report]
Jayanth Ramesh and Suhas Suresha: Convex hull - parallel and distributed algorithms. [slides][report]
Alfredo Lainez Rodrigo and Luke de Oliveira: Distributed Bayesian personalized ranking in Spark. [slides][report][Github]
Victor Storchan: Parallel sparse k-means for document clustering. [slides][report][Github]
Alex Williams: Initializing nonnegative matrix factorizations on distributed architectures. [slides][report]
Hao Wu: Generalized linear models in collaborative filtering. [slides][report]
Spring 2015: [class webpage]
Reza: rezab at stanford
Office hours: by appointment
Andreas Santucci: santucci at stanford
Office hours: Tuesdays 3:15 pm - 5:15 pm
Nolan Skochdopole: naskoch at stanford
TA office hours will be held in the Huang Engineering Center basement
(in front of the ICME office)
Office hours: Mondays, 3:15 pm - 5:15 pm