Low-rank approximation

Low rank matrix approximations form a fundamental primitive for an exceedingly wide variety of algorithms and fit a wide variety of datasets well. Intuitively, this means that measurements of a complex object, such as a patient in a hospital, respondent on a survey, or even a machine learning dataset, can often be well described as simple functions of an underlying low-dimensional latent vector. Our work develops methods to extract low-dimensional structure from big, messy datasets and to use this structure to denoise and impute entries in messy datasets, reduce the dimensionality of feature vectors, recommend better machine learning algorithms, and speed up optimization and model validation.

We've developed scalable, parallel, distributed, and low-memory methods to find low rank approximations with provable guarantees that can handle matrices with billions of entries on a laptop, including streaming methods that can approximate a matrix or tensor too large to fit in memory from a single pass over the data. These methods identify interesting structure in a broad range of applications, including medicine, social science survey data, revenue management, environmental monitoring, and electronics. Variants of low rank models even work with very weak information like assortment choices and sparse data.

Talks

Low Rank Tucker Approximation of a Tensor from Streaming Data, IPAM, 2021
Big Data is Low Rank, Alan Turing Institute, 2020
Big Data is Low Rank, Mathematics of Data Science at Tufts, 2020

Software

LowRankModels: low rank models for missing value imputation
TenIPS: tensor low rank models for missing value imputation
Factor group-sparse regularization

Papers

Low-Rank Tensor Recovery with Euclidean-Norm-Induced Schatten-p Quasi-Norm Regularization
J. Fan, L. Ding, C. Yang, and M. Udell
Transactions on Machine Learning Research, 2023
[arxiv][bib]

Sparse Data Reconstruction, Missing Value and Multiple Imputation through Matrix Factorization
N. Sengupta, M. Udell, N. Srebro, and J. Evans
Sociological Methodology, 2022
[url][bib]

TenIPS: Inverse Propensity Sampling for Tensor Completion
C. Yang, L. Ding, Z. Wu, and M. Udell
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
[arxiv][url][bib]

Robust Non-Linear Matrix Factorization for Dictionary Learning, Denoising, and Clustering
J. Fan, C. Yang, and M. Udell
IEEE Trans. Signal Processing (TSP), 2021
[arxiv][pdf][url][bib]

Approximate Cross-Validation with Low-Rank Data in High Dimensions
W. Stephenson, M. Udell, and T. Broderick
Advances in Neural Information Processing Systems (NeurIPS), 2020
[arxiv][url][bib]

TenIPS: Inverse Propensity Sampling for Tensor Completion (Workshop)
C. Yang, L. Ding, Z. Wu, and M. Udell
OPT2020: 12th Annual Workshop on Optimization for Machine Learning, 2020
[url][bib]

Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula
Y. Zhao and M. Udell
Advances in Neural Information Processing Systems (NeurIPS), 2020
[arxiv][pdf][bib]

An Information-Theoretic Approach to Persistent Environment Monitoring Through Low Rank Model Based Planning and Prediction
E. Ricci, M. Udell, and R. Knepper
2020
[arxiv][url][bib]

Low-Rank Tucker Approximation of a Tensor From Streaming Data
Y. Sun, Y. Guo, C. Luo, J. Tropp, and M. Udell
SIAM Journal on Mathematics of Data Science (SIMODS), 2020
[arxiv][pdf][url][slides][bib]

Factor Group-Sparse Regularization for Efficient Low-Rank Matrix Recovery
J. Fan, L. Ding, Y. Chen, and M. Udell
Advances in Neural Information Processing Systems (NeurIPS), 2019
[arxiv][pdf][url][bib]

Streaming Low-Rank Matrix Approximation with an Application to Scientific Simulation
J. Tropp, A. Yurtsever, M. Udell, and V. Cevher
SIAM Scientific Computing (SISC), 2019
[arxiv][pdf][url][bib]

Big Data is Low Rank
M. Udell
SIAG/OPT Views and News, 2019
[pdf][url][slides][bib][video]

Why are Big Data Matrices Approximately Low Rank?
M. Udell and A. Townsend
SIAM Journal on Mathematics of Data Science (SIMODS), 2019
[arxiv][url][bib]

Dynamic Assortment Personalization in High Dimensions
N. Kallus and M. Udell
Operations Research, 2019
[arxiv][url][bib]

Tensor Random Projection for Low Memory Dimension Reduction
Y. Sun, Y. Guo, J. Tropp, and M. Udell
NeurIPS Workshop on Relational Representation Learning, 2018
[arxiv][pdf][url][bib]

More Practical Sketching Algorithms for Low-Rank Matrix Approximation
J. Tropp, A. Yurtsever, M. Udell, and V. Cevher
California Institute of Technology ACM Technical Report 2018-01, 2018
[pdf][bib]

Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data
J. Tropp, A. Yurtsever, M. Udell, and V. Cevher
Advances in Neural Information Processing Systems, 2017
[arxiv][pdf][url][bib]

Graph-Regularized Generalized Low Rank Models
M. Paradkar and M. Udell
CVPR Workshop on Tensor Methods in Computer Vision, 2017
[pdf][bib]

Practical Sketching Algorithms for Low-Rank Matrix Approximation
J. Tropp, A. Yurtsever, M. Udell, and V. Cevher
SIAM Journal of Matrix Analysis and Applications (SIMAX), 2017
[arxiv][pdf][url][bib]

Discovering Patient Phenotypes Using Generalized Low Rank Models
A. Schuler, V. Liu, J. Wan, A. Callahan, M. Udell, D. Stark, and N. Shah
Pacific Symposium on Biocomputing (PSB), 2016
[pdf][url][bib]

Revealed Preference at Scale: Learning Personalized Preferences from Assortment Choices
N. Kallus and M. Udell
The 2016 ACM Conference on Economics and Computation, 2016
[pdf][url][bib]

Learning Preferences from Assortment Choices in a Heterogeneous Population
N. Kallus and M. Udell
ICML Workshop on Computational Frameworks for Personalization, 2016
[arxiv][bib]

Generalized Low Rank Models
M. Udell, C. Horn, R. Zadeh, and S. Boyd
Foundations and Trends in Machine Learning, 2016
[arxiv][pdf][url][slides][code][bib]

Generalized Low Rank Models
M. Udell
Stanford University Thesis, 2015
[pdf][code][bib]

PCA on a Data Frame
M. Udell and S. Boyd
2015
[pdf][code][bib]

Factorization for Analog-to-Digital Matrix Multiplication
E. Lee, M. Udell, and S. Wong
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015
[pdf][url][bib]

Beyond Principal Component Analysis (PCA)
M. Udell and S. Boyd
Biomedical Computation Review, 2014
[pdf][url][bib]

Generalized Low Rank Models
M. Udell, C. Horn, R. Zadeh, and S. Boyd
NeurIPS Workshop on Distributed Machine Learning and Matrix Computations, 2014
[pdf][code][bib]

Linear Bandits, Matrix Completion, and Recommendation Systems
M. Udell and R. Takapoui
NeurIPS Workshop on Large Scale Matrix Analysis and Inference, 2013
[pdf][bib]