CME 323: Distributed Algorithms and Optimization
Spring 2016, Stanford University
Mon, Wed 1:30 PM  2:50 PM at Braun Lecture Hall, Mudd Chemistry Building
Instructor: Reza Zadeh
The emergence of large distributed clusters of commodity machines
has brought with it a slew of new algorithms and tools.
Many fields such as Machine Learning and Optimization
have adapted their algorithms to handle such clusters.
The class will cover widely used
distributed algorithms in academia and industry.
The course will begin with an introduction
to fundamentals of parallel and distributed runtime analysis. Afterwards,
we will cover distributed algorithms for:
 Convex Optimization
 Matrix Factorization
 Machine Learning
 Neural Networks
 Numerical Linear Algebra
 Large Graph analysis
 Streaming algorithms
Class Format
We will focus on the analysis of parallelism and distribution costs of algorithms.
Sometimes, topics will be illustrated with handson exercises
using Apache Spark.
Prerequisites: Targeting graduate students having
taken Algorithms at the level of CME 305 or CS 261.
Being able to competently program in any mainstream high level language.
There will be homeworks, a midterm, one scribed lecture, and a project.
Grade Breakdown:
Homeworks and scribing: 40%
Midterm: 30%
Project: 30%
The midterm will be Monday May 2nd in class
Required textbook:
Parallel Algorithms
by Guy E. Blelloch and Bruce M. Maggs [BB]
Optional textbooks:
Models of Computation
by John E. Savage [S]
Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein [CLRS]
Learning Spark
by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia [KKWZ]
Convex Optimization
by Boyd and Vandenberghe [BV]
Algorithm Design, by Kleinberg and Tardos [KT]
Homework
Homework 1 [pdf] [tex] Due April 11th. [Solution]
Homework 2 [pdf] [tex] Due April 27th. [Solution]
Homework 3 [pdf] [tex] Due May 18th. [Solution]
Homework 4 [pdf] [tex] Due May 25th.
Scribe template
Lectures and References
 Lecture 1: Fundamentals of Distributed and Parallel algorithm analysis, Reading: BB Chapter 1, Notes
 Lecture 2: Caveats of Parallel Algorithms, the Master Theorem, and parallel matrix multiplication, Reading: KT Chapter 5, Notes, Strassen 1969
 Lecture 3: Parallel Strassen's Algorithm and Parallel Mergesort, Cole 1988, Notes
 Lecture 4: Parallel Mergesort, General Divide and Conquer, Parallel Selection, Parallel Quicksort, Blum et. al. 1971, Prefixscan QSort, Reading: BB Chapter 2.1, CLRS 27.3. Notes
 Lecture 5: Minimum Spanning Trees, Boruvka's Algorithm, Boruvka (1926), Reading: KT Chapters 3, 4.5, 4.6, Notes
 Lecture 6: Review of Master Theorem, Closest Pair Problem, Master Theorem Examples, KT 5.4, Notes
 Lecture 7: Set Representation, Graph Contractions, Connectivity AVL Trees , Red Black Trees, Reading: Primer on BST's (Sections 6, 7), CLRS Ch. 12, 13 Notes
 Lecture 8: Multicore Optimization (Seperable Objective Functions), Job Scheduling. Hogwild! Notes
 Lecture 9: Job Scheduling, Intro to Distributed Computing, Intro to Streaming Algorithms. Notes
 Lecture 10: Intro to Distributed Computing, Communication Protocols, Midterm Review. Notes
 Lecture 11: Midterm, Midterm Solutions
 Lecture 12: Intro to MapReduce, Spark and other Distributed Computing Tools Intro to Spark, Intro to Distributed Optimization, Notes.
 Lecture 13: Gradient Descent in Spark, Communication Patterns. Intro to Distributed Optimization, Communication Patterns, Notes.
 Lecture 14: Measures of Complexity, Triangle Counting. Notes.
 Lecture 15: WrapUp Node Iterator (Triangle Counting), Combiners, Broadcasting, Spark in other Programming Languages. Curse of the Last Reducer , Broadcasting in Spark, Notes.
 Lecture 16: Sorting, Partitioning for Page Rank, Distributed Matrix Computations. Pregel Slides, Page Rank Slides, Pregel: A System for Large Scale Graph Processing, Scaling! But at what COST?. Notes.
 Lecture 17: Covariance Matrices and AllPairs Similarity, Matrix Computations, General Convex Optimization. ADMM, Dimension Independent Similarity Computation. Notes.
Previous Years
Spring 2015: [class webpage]
Supplementary Materials
Advanced Data Science on Spark: [slides]
Spark Intro Tutorial: [slides] [code and data  1 GB]
Spark Devops Slides: Spark Summit slides



Contact
Reza: rezab at stanford
Office hours: by appointment
TA
Andreas Santucci: santucci at stanford
Office hours: Tuesdays 3:15 pm  5:15 pm
Nolan Skochdopole: naskoch at stanford
Office hours: Mondays, 3:15 pm  5:15 pm
TA office hours will be held in the Huang Engineering Center basement
(in front of the ICME office)

