CME 323: Distributed Algorithms and Optimization

Spring 2022, Stanford University
03/28/2022 - 06/10/2022
Lectures will be posted online (two per week)

Instructor: Reza Zadeh

Computer Science is evolving to utilize new hardware such as GPUs, TPUs, CPUs, and large commodity clusters thereof. Many subfields such as Machine Learning and Optimization have adapted their algorithms to handle such clusters.

Topics include distributed and parallel algorithms for: Optimization, Numerical Linear Algebra, Machine Learning, Graph analysis, Streaming algorithms, and other problems that are challenging to scale on a commodity cluster. The class will focus on analyzing programs, with some implementation using Apache Spark and TensorFlow.

The course will be split into two parts: first, an introduction to fundamentals of parallel algorithms and runtime analysis on a single multicore machine. Second, we will cover distributed algorithms running on a cluster of machines.

Class Format

We will focus on the analysis of parallelism and distribution costs of algorithms. Sometimes, topics will be illustrated with exercises using Apache Spark and TensorFlow.

Pre-requisites: Targeting graduate students having taken Algorithms at the level of CME 305 or CS 161. Being able to competently program in any main-stream high level language. There will be homeworks, a midterm, and a final exam.

Grade Breakdown:
Homeworks: 40%
Midterm: 30%
Final: 30%

Textbooks:
Parallel Algorithms by Guy E. Blelloch and Bruce M. Maggs [BB]
Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein [CLRS]
Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia [KKWZ]
TensorFlow for Deep Learning by Bharath Ramsundar and Reza Zadeh [RZ]
Algorithm Design by Kleinberg and Tardos [KT]

Logistics

Homeworks will be assigned via Ed and due on Gradescope.

Lecture videos will be available on resources tab in Ed.

We will be hosting office hours via Zoom, however, we encourage students to post questions publicly on Ed.

The midterm and the final exam will be take-home and can be given anytime within a 24 hour window.

Homework

Homework 1 [pdf] Due April 14th | [soln]

Homework 2 [pdf] Due April 28th | [soln]

Homework 3 [pdf] Due May 19th | [soln]

Homework 4 [pdf] Due June 1st | [soln]

Lectures and References

Lecture 1 (3/29): Introduction to Parallel Algorithms (PRAM Model, Work + Depth, Computation DAGs, Brent's Theorem, Parallel Summation)
Lecture 1
Lecture 2 (3/31): Scalability, Scheduling, All Prefix Sum Reading: BB 5.
Lecture 2, Handbook of Scheduling, Graham's Algorithm, TensorFlow Scheduling, Better Bounds for Online Scheduling
Lecture 3 (4/5): All Prefix Sum, Mergesort. Reading: KT 5, BB 8.
Lecture 3, Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques, Cole's parallel merge sort (1988)
Lecture 4 (4/7): Divide and Conquer Algorithms, Master Theorem, Quick Selection, Quick Sort.
Lecture 4, Linear time bounds for median select, Prefix scan qsort
Lecture 5 (4/12): Matrix Multiplication (Strassen's Algorithm).
Lecture 5
Lecture 6 (4/14): Graph Algorithms, Star Contraction, Minimum Spanning Tree (Boruvka's Algorithm. Reading: KT 3, 4.5, 4.6, CLRS 12, 13.
Lecture 6, Boruvka (1926)
Lecture 7 (4/19): Solving Linear Systems
Lecture 7
Lecture 8 (4/21): Optimization for Machine Learning, HOGWILD!
Lecture 8, HOGWILD!, Omnivore.
Lecture 9 (4/26): Skipped for Midterm Review
Lecture 10 (5/2): Midterm Exam
Lecture 11 (5/3): Introduction to Distributed Algorithms and Spark
Lecture 11, Intro to Spark, Spark Cheat Sheet,
Lecture 12 (5/5): Communication Networks, Cluster Computing, Broadcast Networks, and Communication Patterns
Lecture 12, Communication Patterns
Lecture 13 (5/10): Distributed Summation, Simple Random Sampling, Distributed Sort, Introduction to MapReduce
Lecture 13
Lecture 14 (5/12): Converting SQL to MapReduce, Matrix representations on a cluster, Matrix Computations in SQL and Spark
Lecture 14, Matrix Computations and Optimization in Apache Spark, Sparse matrix multiplication using SQL, Sparse matrix multiplication in MapReduce.
Lecture 15 (5/17): Partitioning for PageRank
Lecture 15, Partitioning for Pagerank.
Lecture 16 (5/19): Complexity Measures for MapReduce, Triangle Counting in a Graph
Lecture 16, Counting Triangles and the Curse of the Last Reducer
Lecture 17 (5/24): Singular Value Decomposition
Lecture 17, Singular Value Decomposition.
Lecture 18 (5/26): Covariance Matrices and All-pairs similarity
Lecture 18 , Covariance Matrices and All-pairs similarity, DIMSUM

Previous Years

Spring 2015: [class webpage]

Spring 2016: [class webpage]

Spring 2017: [class webpage]

Spring 2018: [class webpage]

Spring 2020: [class webpage]

Contact

Reza: rezab at stanford
Office hours: by appointment

TA

Anuj Nagpal: anujnag at stanford
Office hours: Tue-Wed 4-6 PM
Zoom link: Here