CME 323: Distributed Algorithms and Optimization

Spring 2015, Stanford University
Mon, Wed 12:35 PM - 1:50 PM at 530-127

Instructor: Reza Zadeh

The emergence of large distributed clusters of commodity machines has brought with it a slew of new algorithms and tools. Many fields such as Machine Learning and Optimization have adapted their algorithms to handle such clusters. The class will cover widely used distributed algorithms in academia and industry.

We will cover distributed algorithms for:

  • Convex Optimization
  • Matrix Factorization
  • Machine Learning
  • Neural Networks
  • The Bootstrap
  • Numerical Linear Algebra
  • Large Graph analysis
  • Streaming and online algorithms

A shorter version of this class was given at Spark Summit 2015: [video] [slides]

Class Format

Throughout the class, topics will be illustrated with hands-on exercises using the high-speed cluster programming framework, Spark, with computing resources provided by the instructor. The design of distributed algorithms primarily differs from traditional algorithms in the requirement to consider communication cost, so there will be analysis of communication cost.

Pre-requisites: Targeting graduate students having taken Algorithms at the level of CME 305 or CS 261. Being able to competently program in any main-stream high level language.

There will be 3 homeworks, one scribed lecture, and a project. Students taking the class for credit/no credit instead of letter grade can skip the project.

Optional textbooks:
Convex Optimization by Boyd and Vandenberghe [BV]
Randomized Algorithms by Rajeev Motwani and Prabakhar Raghavan [MR]
Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, Jerome Friedman [HTF]

Homework

Homework 1 [pdf] [tex] [solutions], Collected Monday April 20th in class
Homework 2 [pdf] [tex] [solutions], Collected Monday May 4th in class
Homework 3 [pdf] [tex] [solutions], Collected Monday May 18th in class

Scribe template

Lectures and References

Projects

Swaroop Indra Ramaswamy and Rohit Patki: Distributed minimum spanning trees. [slides] [report]

Carlos Riquelme, Lan Nguyen and Sven Schmit: Cascading vector machines. [slides] [report] [Github]

Benoit Dancoisne, Emilien Dupont and William Zhang: Distributed Max-Flow in Spark. [slides] [report] [Github]

Kevin Chavez, Hao Yi Ong and Augustus Hong: Distributed Deep Q-Learning. [slides] [report] [Github]

Zi Yin and Zhiang Hu (Harvy): Parallelized Union Find Set, with an Application in Finding Connected Components in a Graph. [slides] [report]

Charles Y. Zheng, Jingshu Wang and Arzav Jain: All-Pairs Shortest Paths in Spark. [slides] [report] [Github]

Haoming Li and Bangzheng He: A Distributed Solver for Kernalized SVM. [slides] [report]

Yilong Geng and Mingyu Gao: Distributed Stable Marriage with Incomplete List and Ties using Spark. [slides] [report] [Github]

David Daniels, Eric Liu and Charles Zhang: Distributed Structural Estimation of Graph Edge-Type Weights from Noisy PageRank Orders. [slides] [report] [Github]

Yifan Jin and Shaun Benjamin: Monte Carlos Tree Search. [slides] [report] [Code]

Orren Karniol-Tambour: Data Parallel EM for estimating the Genome Relative Abundance (GRA) in Metagenomic Samples. [slides] [report] [Github]

Supplementary Materials

Advanced Data Science on Spark: [slides]

Spark Intro Tutorial: [slides] [code and data - 1 GB]

Spark Devops Slides: Spark Summit slides

Tutorial: Stanford Spark Workshop Exercises

Tutorial: Movie Recommendation with MLlib

Tutorial: Graph Analytics with GraphX

Contact

Reza: rezab at stanford.edu
Office hours: by appointment

TA

Dieterich Lawson: jdlawson at stanford.edu
Office hours: Tuesdays 4-6pm

Simon Anastasiadis: simonsa at stanford.edu
Office hours: Wednesdays 2:15-4:15pm

TA office hours will be held in the Huang Engineering Center basement (in front of the ICME office)