CME 323: Distributed Algorithms and OptimizationSpring 2015, Stanford UniversityMon, Wed 12:35 PM  1:50 PM at 530127 Instructor: Reza Zadeh
The emergence of large distributed clusters of commodity machines has brought with it a slew of new algorithms and tools. Many fields such as Machine Learning and Optimization have adapted their algorithms to handle such clusters. The class will cover widely used distributed algorithms in academia and industry. We will cover distributed algorithms for:
A shorter version of this class was given at Spark Summit 2015: [video] [slides] Class FormatThroughout the class, topics will be illustrated with handson exercises using the highspeed cluster programming framework, Spark, with computing resources provided by the instructor. The design of distributed algorithms primarily differs from traditional algorithms in the requirement to consider communication cost, so there will be analysis of communication cost. Prerequisites: Targeting graduate students having taken Algorithms at the level of CME 305 or CS 261. Being able to competently program in any mainstream high level language. There will be 3 homeworks, one scribed lecture, and a project. Students taking the class for credit/no credit instead of letter grade can skip the project.
Optional textbooks: HomeworkHomework 1 [pdf] [tex] [solutions], Collected Monday April 20th in classHomework 2 [pdf] [tex] [solutions], Collected Monday May 4th in class Homework 3 [pdf] [tex] [solutions], Collected Monday May 18th in class Lectures and References
ProjectsSwaroop Indra Ramaswamy and Rohit Patki: Distributed minimum spanning trees. [slides] [report] Carlos Riquelme, Lan Nguyen and Sven Schmit: Cascading vector machines. [slides] [report] [Github] Benoit Dancoisne, Emilien Dupont and William Zhang: Distributed MaxFlow in Spark. [slides] [report] [Github] Kevin Chavez, Hao Yi Ong and Augustus Hong: Distributed Deep QLearning. [slides] [report] [Github] Zi Yin and Zhiang Hu (Harvy): Parallelized Union Find Set, with an Application in Finding Connected Components in a Graph. [slides] [report] Charles Y. Zheng, Jingshu Wang and Arzav Jain: AllPairs Shortest Paths in Spark. [slides] [report] [Github] Haoming Li and Bangzheng He: A Distributed Solver for Kernalized SVM. [slides] [report] Yilong Geng and Mingyu Gao: Distributed Stable Marriage with Incomplete List and Ties using Spark. [slides] [report] [Github] David Daniels, Eric Liu and Charles Zhang: Distributed Structural Estimation of Graph EdgeType Weights from Noisy PageRank Orders. [slides] [report] [Github] Yifan Jin and Shaun Benjamin: Monte Carlos Tree Search. [slides] [report] [Code] Orren KarniolTambour: Data Parallel EM for estimating the Genome Relative Abundance (GRA) in Metagenomic Samples. [slides] [report] [Github] Supplementary MaterialsAdvanced Data Science on Spark: [slides] Spark Intro Tutorial: [slides] [code and data  1 GB] Spark Devops Slides: Spark Summit slides Tutorial: Stanford Spark Workshop Exercises Tutorial: Movie Recommendation with MLlib Tutorial: Graph Analytics with GraphX 
