Low-rank approximationLow rank matrix approximations form a fundamental primitive for an exceedingly wide variety of algorithms and fit a wide variety of datasets well. Intuitively, this means that measurements of a complex object, such as a patient in a hospital, respondent on a survey, or even a machine learning dataset, can often be well described as simple functions of an underlying low-dimensional latent vector. Our work develops methods to extract low-dimensional structure from big, messy datasets and to use this structure to denoise and impute entries in messy datasets, reduce the dimensionality of feature vectors, recommend better machine learning algorithms, and speed up optimization and model validation. We've developed scalable, parallel, distributed, and low-memory methods to find low rank approximations with provable guarantees that can handle matrices with billions of entries on a laptop, including streaming methods that can approximate a matrix or tensor too large to fit in memory from a single pass over the data. These methods identify interesting structure in a broad range of applications, including medicine, social science survey data, revenue management, environmental monitoring, and electronics. Variants of low rank models even work with very weak information like assortment choices and sparse data. Talks
Software
PapersLow-Rank Tensor Recovery with Euclidean-Norm-Induced Schatten-p Quasi-Norm Regularization Sparse Data Reconstruction, Missing Value and Multiple Imputation through Matrix Factorization TenIPS: Inverse Propensity Sampling for Tensor Completion Robust Non-Linear Matrix Factorization for Dictionary Learning, Denoising, and Clustering Approximate Cross-Validation with Low-Rank Data in High Dimensions TenIPS: Inverse Propensity Sampling for Tensor Completion (Workshop) Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula An Information-Theoretic Approach to Persistent Environment Monitoring Through Low Rank Model Based Planning and Prediction Low-Rank Tucker Approximation of a Tensor From Streaming Data Factor Group-Sparse Regularization for Efficient Low-Rank Matrix Recovery Streaming Low-Rank Matrix Approximation with an Application to Scientific Simulation Big Data is Low Rank Why are Big Data Matrices Approximately Low Rank? Dynamic Assortment Personalization in High Dimensions Tensor Random Projection for Low Memory Dimension Reduction More Practical Sketching Algorithms for Low-Rank Matrix Approximation Fixed-Rank Approximation of a Positive-Semidefinite Matrix from Streaming Data Graph-Regularized Generalized Low Rank Models Practical Sketching Algorithms for Low-Rank Matrix Approximation Discovering Patient Phenotypes Using Generalized Low Rank Models Revealed Preference at Scale: Learning Personalized Preferences from Assortment Choices Learning Preferences from Assortment Choices in a Heterogeneous Population Generalized Low Rank Models Generalized Low Rank Models PCA on a Data Frame Factorization for Analog-to-Digital Matrix Multiplication Beyond Principal Component Analysis (PCA) Generalized Low Rank Models Linear Bandits, Matrix Completion, and Recommendation Systems |