Hosted by Stanford ICME
August 13-15, 2014
Clark Center Auditorium, Stanford University

Reza Zadeh | Matei Zaharia | Ion Stoica

A three-day class on distributed computing, using the high-speed cluster programming framework, Spark. Throughout the class, there will be hands-on exercises with computing resources provided by the organizers.

The class will include introductions to the many Spark features, case studies from current users, best practices for deployment and tuning, future development plans, and hands-on exercises.

Wednesday August 13th to 15th, 2014

Please register here: Spark class registration

Hands-on Exercises

Please download the course materials here and slides

Course Prerequisites:

  • Laptop with WiFi capabilities
  • Java 6 or 7


Day 1 (10am-4pm, lunch break 12:30-1:30pm)

An introduction to Distributed Computing and Spark (Reza Zadeh) [slides]

Hands-on exercises (Paco Nathan): [slides]

  • Installing Spark, then running a first app
  • Theory of operation, major abstractions
  • Historical background
  • Writing/running several example apps
  • Review of the API in Scala, Python, Java

Language Clustering Demo

Databricks Cloud Demo

Day 2 (10am-4pm, lunch break 12:30-1:30pm)

Hands-on exercises (Paco Nathan): [slides]

  • Review: coding assignment
  • Extended Spark examples
  • Unified engine across batch, iterative, SQL, ML, etc.
  • Software development lifecycle: build, deploy, monitor
  • Tooling: Maven, SBT, IPython notebook, etc.
  • Production case studies
  • Other resources for learning

Installing the Cassandra / Spark OSS Stack

Additional materials and exercises

Day 3 (10am-2:30pm, lunch break 11:30-1pm)

  • MLlib and Distributing the Singular Value Decomposition (Reza Zadeh) [slides]
  • Towards an Optimizer for MLbase (Ameet Talwalkar) [slides]
  • Graph Processing with the GraphX library (Ankur Dave) [slides]
  • Spark Streaming (Tathagata Das) [slides]


