Distributed Machine Learning and Matrix ComputationsA NIPS 2014 Workshop
Level 5; room 510 a
Friday December 12th 2014
Reza Zadeh | Ameet Talwalkar | Ion Stoica
The emergence of large distributed matrices in many applications has brought with it a slew of new algorithms and tools. Over the past few years, machine learning and numerical linear algebra on distributed matrices has become a thriving field. Manipulating such large matrices makes it necessary to think about distributed systems issues such as communication cost.
This workshop aims to bring closer researchers in distributed systems and large scale numerical linear algebra to foster cross-talk between the two fields. The goal is to encourage distributed systems researchers to work on machine learning and numerical linear algebra problems, to inform machine learning researchers about new developments on large scale matrix analysis, and to identify unique challenges and opportunities.
The workshop will conclude with a session of contributed posters.
08:15-08:30 Introduction, Reza Zadeh
08:30-09:00 Ameet Talwalkar, MLbase: Simplified Distributed Machine Learning [slides]
09:00-09:30 David Woodruff, Principal Component Analysis and Higher Correlations for Distributed Data [slides]
09:30-10:00 Virginia Smith, Communication-Efficient Distributed Dual Coordinate Ascent [slides]
10:00-10:30 Coffee Break
10:30-11:30 Jeff Dean (Keynote), Techniques for Training Neural Networks Quickly [slides]
11:30-12:00 Reza Zadeh, Distributing the Singular Value Decomposition with Spark [slides]
12:00-12:30 Jure Leskovec, In-memory graph analytics [slides]
12:30-14:30 Lunch Break
14:30-15:00 Carlos Guestrin, SFrame and SGraph: Scalable, Out-of-Core, Unified Tabular and Graph Processing
15:00-15:30 Inderjit Dhillon, NOMAD: A Distributed Framework for Latent Variable Models [slides]
15:30-16:00 Ankur Dave, GraphX: Graph Processing in a Distributed Dataflow Framework [slides]
16:00-16:30 Jeremy Freeman, Large-scale decompositions of brain activity [slides]
16:30-17:00 Coffee Break
17:00-18:30 Posters for accepted papers
Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning
Maxios: Large Scale Nonnegative Matrix Factorization for Collaborative Filtering
Factorbird - a Parameter Server Approach to Distributed Matrix Factorization
Improved Algorithms for Distributed Boosting
Parallel and Distributed Inference in Coupled Tensor Factorization Models supplementary
Dogwild! — Distributed Hogwild for CPU and GPU
Generalized Low Rank Models
Elastic Distributed Bayesian Collaborative Filtering
LOCO: Distributing Ridge Regression with Random Projections
Logistic Matrix Factorization for Implicit Feedback Data
Tighter Low-rank Approximation via Sampling the Leveraged Element
A Comparison of Lasso-type Algorithms on Distributed Parallel Machine Learning Platforms
A Randomized Algorithm for CCA
FROGWILD! – Fast PageRank Approximations on Graph Engines
CometCloudCare (C3): Distributed Machine Learning Platform-as-a-Service with Privacy Preservation
Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems (to appear)
This workshop will consist of invited talks and paper submissions for a poster session. The target audience of this workshop includes industry and academic researchers interested in machine learning, large distributed systems, numerical linear algebra, and related fields.
As this is a workshop, there will be no printed proceedings.
Jeff Dean: Over the past few years, we have built a software infrastructure for training neural networks that applies to a wide variety of deep learning models. This system has been used for training and deploying models for a wide variety of applications at Google. One of the properties we focus on in the system is that we want to be able to train large models on large datasets quickly, so that we can turn around experiments rapidly and quickly figure out the next set of experiments to perform, given the results of the previous round of experiments. As such, we have developed a number of different techniques that aid in rapid training of large models. I will discuss many of these techniques in this talk.