Distributed Machine Learning and Matrix ComputationsA NIPS 2014 WorkshopLevel 5; room 510 a Friday December 12th 2014 Montreal, Canada Organizers: Reza Zadeh | Ameet Talwalkar | Ion Stoica
The emergence of large distributed matrices in many applications has brought with it a slew of new algorithms and tools. Over the past few years, machine learning and numerical linear algebra on distributed matrices has become a thriving field. Manipulating such large matrices makes it necessary to think about distributed systems issues such as communication cost. This workshop aims to bring closer researchers in distributed systems and large scale numerical linear algebra to foster cross-talk between the two fields. The goal is to encourage distributed systems researchers to work on machine learning and numerical linear algebra problems, to inform machine learning researchers about new developments on large scale matrix analysis, and to identify unique challenges and opportunities. The workshop will conclude with a session of contributed posters. Speakers
ScheduleSession 1======== 08:15-08:30 Introduction, Reza Zadeh 08:30-09:00 Ameet Talwalkar, MLbase: Simplified Distributed Machine Learning [slides] 09:00-09:30 David Woodruff, Principal Component Analysis and Higher Correlations for Distributed Data [slides] 09:30-10:00 Virginia Smith, Communication-Efficient Distributed Dual Coordinate Ascent [slides] 10:00-10:30 Coffee Break Session 2 ======== 10:30-11:30 Jeff Dean (Keynote), Techniques for Training Neural Networks Quickly [slides] 11:30-12:00 Reza Zadeh, Distributing the Singular Value Decomposition with Spark [slides] 12:00-12:30 Jure Leskovec, In-memory graph analytics [slides] 12:30-14:30 Lunch Break Session 3 ======== 14:30-15:00 Carlos Guestrin, SFrame and SGraph: Scalable, Out-of-Core, Unified Tabular and Graph Processing 15:00-15:30 Inderjit Dhillon, NOMAD: A Distributed Framework for Latent Variable Models [slides] 15:30-16:00 Ankur Dave, GraphX: Graph Processing in a Distributed Dataflow Framework [slides] 16:00-16:30 Jeremy Freeman, Large-scale decompositions of brain activity [slides] 16:30-17:00 Coffee Break Poster Session ======== 17:00-18:30 Posters for accepted papers Accepted PapersMinerva: A Scalable and Highly Efficient Training Platform for Deep Learning Maxios: Large Scale Nonnegative Matrix Factorization for Collaborative Filtering Factorbird - a Parameter Server Approach to Distributed Matrix Factorization Improved Algorithms for Distributed Boosting Parallel and Distributed Inference in Coupled Tensor Factorization Models supplementary Dogwild! — Distributed Hogwild for CPU and GPU Generalized Low Rank Models Elastic Distributed Bayesian Collaborative Filtering LOCO: Distributing Ridge Regression with Random Projections Logistic Matrix Factorization for Implicit Feedback Data Tighter Low-rank Approximation via Sampling the Leveraged Element A Comparison of Lasso-type Algorithms on Distributed Parallel Machine Learning Platforms A Randomized Algorithm for CCA FROGWILD! – Fast PageRank Approximations on Graph Engines CometCloudCare (C3): Distributed Machine Learning Platform-as-a-Service with Privacy Preservation Global Convergence of Stochastic Gradient Descent for Some Nonconvex Matrix Problems (to appear) FormatThis workshop will consist of invited talks and paper submissions for a poster session. The target audience of this workshop includes industry and academic researchers interested in machine learning, large distributed systems, numerical linear algebra, and related fields. As this is a workshop, there will be no printed proceedings. Keynote AbstractJeff Dean: Over the past few years, we have built a software infrastructure for training neural networks that applies to a wide variety of deep learning models. This system has been used for training and deploying models for a wide variety of applications at Google. One of the properties we focus on in the system is that we want to be able to train large models on large datasets quickly, so that we can turn around experiments rapidly and quickly figure out the next set of experiments to perform, given the results of the previous round of experiments. As such, we have developed a number of different techniques that aid in rapid training of large models. I will discuss many of these techniques in this talk. |
|