CME 323: Distributed Algorithms and Optimization
Spring 2024, Stanford University
04/01/2024 - 06/07/2024
Lectures: Tue, Thu 11:30 AM - 1:20 PM, Room 300-303
Instructor: Reza Zadeh
Computer Science is evolving to utilize new hardware such as GPUs, TPUs, CPUs, and large commodity clusters thereof.
Many subfields such as Machine Learning and Optimization have adapted their algorithms to handle such clusters.
Topics include distributed and parallel algorithms for: Optimization, Numerical Linear Algebra, Machine Learning, Graph analysis, Streaming algorithms, and other problems that are challenging to scale on a commodity cluster.
The class will focus on analyzing programs, with some implementation using Apache Spark and TensorFlow.
The course will be split into two parts: first, an introduction
to fundamentals of parallel algorithms and runtime analysis on a single multicore machine.
Second, we will cover distributed algorithms running on a cluster of machines.
Class Format
We will focus on the analysis of parallelism and distribution costs of algorithms.
Sometimes, topics will be illustrated with exercises
using Apache Spark and TensorFlow.
Pre-requisites: Targeting graduate students having
taken Algorithms at the level of CME 305 or CS 161.
Being able to competently program in any main-stream high level language.
There will be homeworks, a midterm, and a final exam.
Grade Breakdown:
Homeworks: 40%
Midterm: 30%
Final: 30%
Textbooks:
Parallel Algorithms
by Guy E. Blelloch and Bruce M. Maggs [BB]
Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein [CLRS]
Learning Spark
by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia [KKWZ]
TensorFlow for Deep Learning
by Bharath Ramsundar and Reza Zadeh [RZ]
Algorithm Design
by Kleinberg and Tardos [KT]
Logistics
Homeworks will be assigned via Ed and due on Gradescope.
We will be hosting office hours in Huang B019 and simultaneously open a Zoom session for those unable to attend in person. However, we encourage students to post questions publicly on Ed.
The midterm and final exams will be held exclusively in class. If you encounter any scheduling conflicts, please reach out to the teaching staff to arrange accommodations.
Midterm Exam: May 7th, Tuesday, 11:30am - 1:20pm in 300-303.
Final Exam: June 12nd, Wednesday, 12:15pm - 3:15pm in 300-303.
Homework
Homework 1 [pdf][tex] Due April 18th | [soln]
Homework 2 [pdf][tex] Due May 4th | [soln]
Homework 3 [pdf][tex] Due May 23rd | [soln]
Homework 4 [pdf][tex] Due June 6th | [soln]
Lectures and References
- Lecture 1 (4/2): Introduction to Parallel Algorithms (PRAM Model, Work + Depth, Computation DAGs, Brent's Theorem, Parallel Summation)
Lecture 1
- Lecture 2 (4/4): Scalability, Scheduling, All Prefix Sum Reading: BB 5.
Lecture 2,
Handbook on Scheduling,
Graham's Algorithm,
TensorFlow Scheduling,
Better Bounds for Online Scheduling
- Lecture 3 (4/9): All Prefix Sum, Mergesort. Reading: KT 5, BB 8.
Lecture 3,
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques,
Cole's parallel merge sort (1988)
- Lecture 4 (4/11): Divide and Conquer Algorithms, Master Theorem, Quick Selection, Quick Sort.
Lecture 4,
Linear time bounds for median select,
Prefix scan qsort
- Lecture 5 (4/16): Matrix Multiplication (Strassen's Algorithm).
Lecture 5
- Lecture 6 (4/18): Graph Algorithms, Star Contraction, Minimum Spanning Tree(Sequential, Kruskal's algorithm)
Lecture 6,
Boruvka (1926)
- Lecture 7 (4/23): Minimum Spanning Tree (Parallel, Boruvka's Algorithm). Reading: KT 3, 4.5, 4.6, CLRS 12, 13.
Lecture 7,
Boruvka (1926)
- Lecture 8 (4/25): Optimization for Machine Learning, HOGWILD!
Lecture 8,
HOGWILD!,
Omnivore.
- Lecture 9 (4/30): Skipped for Midterm Review
- Lecture 10 (5/2): Introduction to Distributed Algorithms and Spark
Lecture 10,
Intro to Spark,
Spark Cheat Sheet
- Lecture 11 (5/7): Midterm Exam
- Lecture 12 (5/9): Distributed Sort, Relational Operators, Distributed Optimization, Introduction to Spark
Lecture 13 (Distributed Sort),
Lecture 12 (Distributed Optimization, Intro to Spark),
Communication Patterns
- Lecture 13 (5/14): Introduction to Spark, MapReduce
Intro to Spark
- Lecture 14 (5/16): Converting SQL to MapReduce, Matrix representations on a cluster, Matrix Computations in SQL and Spark
Communication Patterns,
- Lecture 15 (5/21): Complexity Measures for MapReduce, Triangle Counting in a Graph
Lecture 15,
Counting Triangles and the Curse of the Last Reducer
- Lecture 16 (5/23): Representations, Matrix Multipilication, Singular Value Decomposition
Lecture 16,
Singular Value Decomposition.
- Lecture 17 (5/28): Singular Value Decomposition, Partitioning for PageRank
Lecture 17,
Partitioning for Pagerank.
- Lecture 18 (5/30): Covariance Matrices and All-pairs similarity
Lecture 18 ,
Covariance Matrices and All-pairs similarity,
DIMSUM
Previous Years
Spring 2015: [class webpage]
Spring 2016: [class webpage]
Spring 2017: [class webpage]
Spring 2018: [class webpage]
Spring 2020: [class webpage]
Spring 2022: [class webpage]
|
|
|
Contact
Reza: rezab at stanford
Office hours: by appointment
TA
Alex Yang: yangzj at stanford
Office hours: Mon,Wed 10-12 AM, Huang B019
Zoom link: Here
|
|