CME 323: Distributed Algorithms and Optimization

Spring 2016, Stanford University
Mon, Wed 1:30 PM - 2:50 PM at Braun Lecture Hall, Mudd Chemistry Building

Instructor: Reza Zadeh

The emergence of large distributed clusters of commodity machines has brought with it a slew of new algorithms and tools. Many fields such as Machine Learning and Optimization have adapted their algorithms to handle such clusters. The class will cover widely used distributed algorithms in academia and industry.

The course will begin with an introduction to fundamentals of parallel and distributed runtime analysis. Afterwards, we will cover distributed algorithms for:

  • Convex Optimization
  • Matrix Factorization
  • Machine Learning
  • Neural Networks
  • Numerical Linear Algebra
  • Large Graph analysis
  • Streaming algorithms

Class Format

We will focus on the analysis of parallelism and distribution costs of algorithms. Sometimes, topics will be illustrated with hands-on exercises using Apache Spark.

Pre-requisites: Targeting graduate students having taken Algorithms at the level of CME 305 or CS 261. Being able to competently program in any main-stream high level language. There will be homeworks, a midterm, one scribed lecture, and a project.

Grade Breakdown:
Homeworks and scribing: 40%
Midterm: 30%
Project: 30%

The midterm will be Monday May 2nd in class

Required textbook: Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs [BB]

Optional textbooks:
Models of Computation by John E. Savage [S]
Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein [CLRS]
Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia [KKWZ]
Convex Optimization by Boyd and Vandenberghe [BV]
Algorithm Design, by Kleinberg and Tardos [KT]

Homework

Homework 1 [pdf] [tex] Due April 11th. [Solution]

Homework 2 [pdf] [tex] Due April 27th. [Solution]

Homework 3 [pdf] [tex] Due May 18th. [Solution]

Homework 4 [pdf] [tex] Due May 25th. [Solution]

Scribe template

Lectures and References

Projects

Amy Shoemaker and Sagar Vare: Edmonds' Blossom algorithm. [slides][report][Github]

Sebastian Dubois and Sebastien Levy: Distributed Lasso. [slides][report]

Alex Adamson: GloVe on Spark. [slides][report][Github]

Wissam Baalbaki: Large-scale matrix factorization with distributed stochastic gradient descent: implementation in Spark and testing on Netflix movie recommendation. [slides][report]

Jan Bae: Distributed graph coloring. [slides][report]

Irwan Bello: Asynchronous lock-free parallel Deep Reinforcement Learning. [slides]

Max Bodoia: MapReduce algorithms for k-means clustering. [slides][report]

Erik Burton: Parallel Held-Karp algorithm for the Hamiltonian cycle problem. [slides][report]

Yi-Chun Chen and Yu-Sheng Chen: A distributed implementation for Reinforcement Learning. [slides][report]

Henry Ehrenberg: Gibbs PRAMpling. [slides][report][Github]

David Flatow and Daniel Penner: A distributed algorithm for global min-cut. [slides][report]

Rolland He: PARADIS: a parallel in-place radix sort algorithm. [slides][report][Github]

Vishakh Hedge and Sheema Usmani: Parallel and distributed learning. [slides][report]

Xin Jin: Parallel auction algorithm for linear assignment problem. [slides][report][Github]

Stephen Kline and Kevin Shaw: Distributed CUR decomposition for bi-clustering. [slides][report]

Christopher Kurrus and Henry Neeb: Distributed k-nearest neighbors. [slides][report]

Patrick Landreman: Alternative least squares in Spark. [slides][report]

Ting-Po Lee and Taman Narayan: Distributed language models using RNNs. [slides][report][Github]

Nikhil Parthasarathy and Pin Pin Tea-Mangkornpan: Low-rank matrix factorization using distributed SGD in Spark. [slides][report][Github]

Milind Rao: Distributed multi-armed bandits. [slides][report]

Jayanth Ramesh and Suhas Suresha: Convex hull - parallel and distributed algorithms. [slides][report]

Alfredo Lainez Rodrigo and Luke de Oliveira: Distributed Bayesian personalized ranking in Spark. [slides][report][Github]

Victor Storchan: Parallel sparse k-means for document clustering. [slides][report][Github]

Alex Williams: Initializing nonnegative matrix factorizations on distributed architectures. [slides][report]

Hao Wu: Generalized linear models in collaborative filtering. [slides][report]

Previous Years

Spring 2015: [class webpage]

Contact

Reza: rezab at stanford
Office hours: by appointment

TA

Andreas Santucci: santucci at stanford
Office hours: Tuesdays 3:15 pm - 5:15 pm

Nolan Skochdopole: naskoch at stanford
Office hours: Mondays, 3:15 pm - 5:15 pm

TA office hours will be held in the Huang Engineering Center basement (in front of the ICME office)