Multiple Optimality Guarantees in Statistical Learning

John C. Duchi

Ph.D. Thesis, Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, 2014.

Manuscript

Classically, the performance of estimators in statistical learning problems is measured in terms of their predictive ability or estimation error as the sample size n grows. In modern statistical and machine learning applications, however, computer scientists, statisticians, and analysts have a variety of additional criteria they must balance: estimators must be efficiently computable, data providers may wish to maintain anonymity, large datasets must be stored and accessed. In this thesis, we consider the fundamental questions that arise when trading between multiple such criteria--computation, communication, privacy--while maintaining statistical performance. Can we develop lower bounds that show there must be tradeoffs? Can we develop new procedures that are both theoretically optimal and practically useful? To answer these questions, we explore examples from optimization, confidentiality preserving statistical inference, and distributed estimation under communication constraints. Viewing our examples through a general lens of constrained minimax theory, we prove fundamental lower bounds on the statistical performance of any algorithm subject to the constraints--computational, confidentiality, or communication--specified. These lower bounds allow us to guarantee the optimality of the new algorithms we develop addressing the additional criteria we consider, and additionally, we show some of the practical benefits that a focus on multiple optimality criteria brings. In somewhat more detail, the central contributions of this thesis include the following: we

develop several new stochastic optimization algorithms, applicable to general classes of stochastic convex optimization problems, including methods that are automatically adaptive to the structure of the underlying problem, parallelize naturally to attain linear speedup in the number of processors available, and may be used asynchronously,
prove lower bounds demonstrating the optimality of these methods,
provide a variety of information-theoretic tools--strong data processing inequalities--useful for proving lower bounds in privacy-preserving statistical inference, communication-constrained estimation, and optimization,
develop new algorithms for private learning and estimation, guaranteeing their optimality, and
give simple distributed estimation algorithms and prove fundamental limits showing that they (nearly) optimally trade off between communication (in terms of the number of bits distributed processors may send) and statistical risk.