CME 323: Distributed Algorithms and Optimization

Spring 2020, Stanford University
04/07/2020 - 06/10/2020
Lectures will be posted online (two per week)

Instructor: Reza Zadeh

Computer Science is evolving to utilize new hardware such as GPUs, TPUs, CPUs, and large commodity clusters thereof. Many subfields such as Machine Learning and Optimization have adapted their algorithms to handle such clusters.

Topics include distributed and parallel algorithms for: Optimization, Numerical Linear Algebra, Machine Learning, Graph analysis, Streaming algorithms, and other problems that are challenging to scale on a commodity cluster. The class will focus on analyzing programs, with some implementation using Apache Spark and TensorFlow.

The course will be split into two parts: first, an introduction to fundamentals of parallel algorithms and runtime analysis on a single multicore machine. Second, we will cover distributed algorithms running on a cluster of machines.

Class Format

We will focus on the analysis of parallelism and distribution costs of algorithms. Sometimes, topics will be illustrated with exercises using Apache Spark and TensorFlow.

Pre-requisites: Targeting graduate students having taken Algorithms at the level of CME 305 or CS 161. Being able to competently program in any main-stream high level language. There will be homeworks, a midterm, and a final exam.

Grade Breakdown:
Homeworks: 40%
Midterm: 30%
Final: 30%

Textbooks:
Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs [BB]
Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein [CLRS]
Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia [KKWZ]
TensorFlow for Deep Learning by Bharath Ramsundar and Reza Zadeh [RZ]

Logistics

Homeworks will be assigned via Piazza and due on Gradescope.

Lecture videos will be posted under the Resources tab on Piazza.

We will be hosting office hours via Zoom, however, we encourage students to post questions publicly on Piazza.

The midterm and final will be take-home (exact dates TBD).

Homework

Homework 1 [pdf] Due April 23rd.

Homework 2 [pdf] [code] Due May 7th.

Homework 3 [pdf] Due May 28th.

Homework 4 [pdf] Due June 9.

Lectures and References

Lecture 1 (4/7): Introduction to Parallel Algorithms (PRAM Model, Work + Depth, Computation DAGs, Brent's Theorem, Parallel Summation)
Lecture 1
Lecture 2 (4/9): Scalability, Scheduling, All Prefix Sum Reading: BB 5.
Lecture 2, Handbook of Scheduling, Graham's Algorithm, TensorFlow Scheduling, Better Bounds for Online Scheduling
Lecture 3 (4/14): All Prefix Sum, Mergesort. Reading: KT 5, BB 8.
Lecture 3, Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques, Cole's parallel merge sort (1988)
Lecture 4 (4/16): Divide and Conquer Algorithms, Master Theorem, Quick Selection, Quick Sort.
Lecture 4
Lecture 5 (4/21): Quicksort, Matrix Multiplication (Strassen's Algorithm), Minimum Spanning Tree (Kruskal's Algorithm). Reading: KT 3, 4.5, 4.6.
Lecture 5, Linear time bounds for median select, Prefix scan qsort
Lecture 6 (4/23): Minimum Spanning Tree (Boruvka's Algorithm). Reading: CLRS 12, 13.
Lecture 6, Boruvka (1926)
Lecture 7 (4/28): Solving Linear Systems, Intro to Optimization. Lecture 7.
Lecture 8 (4/30): Optimization for Machine Learning, HOGWILD!.
Lecture 8., HOGWILD!, Omnivore.
Lecture 9 (5/5): Midterm Review
Lecture 10 (5/7): Midterm (In Class).
Lecture 11 (5/12): Introduction to Distributed Algorithms
Lecture 11, Intro to Spark, Spark Cheat Sheet, Communication Patterns
Lecture 12 (5/14): Communication Networks, Cluster Computing, Broadcast Networks, and Communication Patterns
Lecture 12
Lecture 13 (5/19): Distributed Summation, Simple Random Sampling, Distributed Sort, Introduction to MapReduce
Lecture 13
Lecture 14 (5/21): Converting SQL to MapReduce, Matrix representations on a cluster, Matrix Computations in SQL and Spark
Lecture 14, Matrix Computations and Optimization in Apache Spark, Sparse matrix multiplication using SQL, Sparse matrix multiplication in MapReduce.
Lecture 15 (5/26): Partitioning for PageRank
Lecture 15, Partitioning for Pagerank.
Lecture 16 (5/28): Complexity Measures for MapReduce, Triangle Counting in a Graph
Lecture 16, Counting Triangles and the Curse of the Last Reducer
Lecture 17 (6/2): Singular Value Decomposition
Lecture 17, Singular Value Decomposition.
Lecture 18 (6/4): Covariance Matrices and All-pairs similarity
Lecture 18 , Covariance Matrices and All-pairs similarity, DIMSUM

Previous Years

Spring 2015: [class webpage]

Spring 2016: [class webpage]

Spring 2017: [class webpage]

Spring 2018: [class webpage]

Contact

Reza: rezab at stanford
Office hours: by appointment

TA

Robin Brown: rabrown1 at stanford
Office hours: Time TBD.
Zoom link: On piazza