CME 323: Distributed Algorithms and Optimization
Spring 2017, Stanford University
Mon, Wed 10:30 AM - 11:50 AM at 200-205
Instructor: Reza Zadeh
The emergence of large distributed clusters of commodity machines
has brought with it a slew of new algorithms and tools.
Many fields such as Machine Learning and Optimization
have adapted their algorithms to handle such clusters.
The class will cover widely used
distributed algorithms in academia and industry.
The course will begin with an introduction
to fundamentals of parallel and distributed runtime analysis. Afterwards,
we will cover parallel and distributed algorithms for:
- Convex Optimization
- Matrix Factorization
- Machine Learning
- Neural Networks
- Numerical Linear Algebra
- Large Graph analysis
- Streaming algorithms
We will focus on the analysis of parallelism and distribution costs of algorithms.
Sometimes, topics will be illustrated with hands-on exercises
using Apache Spark.
Pre-requisites: Targeting graduate students having
taken Algorithms at the level of CME 305 or CS 261.
Being able to competently program in any main-stream high level language.
There will be homeworks, a midterm, and a final.
The midterm will be in class on Monday May 8th.
by Guy E. Blelloch and Bruce M. Maggs [BB]
Models of Computation
by John E. Savage [S]
Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein [CLRS]
by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia [KKWZ]
by Boyd and Vandenberghe [BV]
Algorithm Design, by Kleinberg and Tardos [KT]
Homework 1 [pdf] [tex] Due April 19th. [soln]
Homework 2 [pdf] [tex] Due May 1st. [soln]
Homework 3 [pdf] [tex] Due May 22nd. [soln]
Homework 4 [pdf] [tex] Due June 7th.
Lectures and References
Midterm Practice Problems. [pdf] [sol hints]
- Lecture 1: Fundamentals of Distributed and Parallel algorithm analysis. Reading: BB Chapter 1.
- Lecture 2: Scalable algorithms, Scheduling. Reading: BB 5.
Handbook of Scheduling
- Lecture 3: Prefix Sum, Mergesort. Reading: KT 5, BB 8.
Cole's parallel merge sort (1988)
- Lecture 4: Parallel quick-select, quicksort.
Linear time bounds for median select,
Prefix scan qsort.
- Lecture 5: Quicksort, Strassen's Algorithm, Minimum Spanning Trees. Reading: KT 3, 4.5, 4.6.
- Lecture 6: Graph contraction, star contraction, MST algorithms. Reading: CLRS 12, 13.
- Lecture 7: (Stochastic) Gradient Descent, Parallel SGD (HOGWILD!). HOGWILD!.
- Lecture 8: Intro to distributed computing, sampling, communication patterns.
- Lecture 9: Network Topology and communication patterns. Distributed summation, and remarks on sorting.
Lecture Notes (draft).
- Lecture 10: Distributed sort, intro to map reduce, applications to map reduce.
- Lecture 11: Midterm, Solution.
- Lecture 12: Map Reduce (indexing), Sparse Matrix Multiplies using SQL, Joins using Map Reduce.
- Lecture 13: Joins using map reduce, measures of complexity, triangle counting. Curse of the Last Reducer.
Lecture Notes (a), Lecture Notes (b).
- Lecture 14: Triangle Counting in Map Reduce, matrix multiplies with a small matrix, optimization and gradient descent.
Lecture Notes (node iterator via map reduce), Lecture Notes (analysis of node iterator), Lecture Notes (matrix multiplies).
- Lecture 15: Data Flow Systems: Spark, MapReduce shortcomings.
Lecture Notes. Slides: Intro to DAO. Slides: Distributed Computations with MapReduce.
- Lecture 16: Optimization in Spark, Broadcasting, SGD on parameter servers.
Spring 2015: [class webpage]
Spring 2016: [class webpage]
Reza: rezab at stanford
Office hours: by appointment
Andreas Santucci: santucci at stanford
Office hours: Mondays 12-2, Wednesdays 12-1.
Wissam Baalbaki: baalbaki at stanford
TA office hours will be held in the Huang Engineering Center basement
(in front of the ICME office)
Office hours: Tuesday, 3:30-5:30, Thursday 3:30-4:30.