Developing Bug-Free Machine Learning Systems With Formal Mathematics
In In ICML: Proceedings of the 34th International Conference on Machine Learning, August 2017.
Noisy data, non-convex objectives, model misspecification, and
numerical instability can all cause undesired behaviors in machine
learning systems. As a result, detecting actual implementation errors
can be extremely difficult. We demonstrate a methodology in which
developers use an interactive proof assistant to both implement their
system and to state a formal theorem defining what it means for their
system to be correct. The process of proving this theorem
interactively in the proof assistant exposes all implementation errors
since any error in the program would cause the proof to fail. As a
case study, we implement a new system, Certigrad, for optimizing over
stochastic computation graphs, and we generate a formal (i.e.
machine-checkable) proof that the gradients sampled by the system are
unbiased estimates of the true mathematical gradients. We train a
variational autoencoder using Certigrad and find the performance
comparable to training the same model in TensorFlow.
Data Programming: Creating Large Training Sets, Quickly
In NIPS: Proceedings of the 29th Neural Information Processing Systems Conference, December 2016.
Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive part of applying machine learning. We therefore propose a paradigm for the programmatic creation of training sets called data programming in which users provide a set of labeling functions, which are programs that heuristically label large subsets of data points, albeit noisily. By viewing these labeling functions as implicitly describing a generative model for this noise, we show that we can recover the parameters of this model to "denoise" the training set. Then, we show how to modify a discriminative loss function to make it noise-aware. We demonstrate our method over a range of discriminative models including logistic regression and LSTMs. We establish theoretically that we can recover the parameters of these generative models in a handful of settings. Experimentally, on the 2014 TAC-KBP relation extraction challenge, we show that data programming would have obtained a winning score, and also show that applying data programming to an LSTM model leads to a TAC-KBP score almost 6 F1 points over a supervised LSTM baseline (and into second place in the competition). Additionally, in initial user studies we observed that data programming may be an easier way to create machine learning models for non-experts.
Congruence Closure in Intensional Type Theory
In IJCAR: Proceedings of the 8th International Joint Conference on Automated Reasoning, June 2016.
Congruence closure procedures are used extensively in automated reasoning and are a core component of most satisfiability modulo theories solvers. However, no known congruence closure algorithms can support any of the expressive logics based on intensional type theory (ITT), which form the basis of many interactive theorem provers. The main source of expressiveness in these logics is dependent types, and yet existing congruence closure procedures found in interactive theorem provers based on ITT do not handle dependent types at all and only work on the simply-typed subsets of the logics. Here we present an efficient and proof-producing congruence closure procedure that applies to every function in ITT no matter how many dependencies exist among its arguments, and that only relies on the commonly assumed uniqueness of identity proofs axiom. We demonstrate its usefulness by solving interesting verification problems involving functions with dependent types.
Venture: A Higher-order Probabilistic Programming Platform with Programmable Inference
We describe Venture, an interactive virtual machine for probabilistic programming that aims to be sufficiently expressive, extensible, and efficient for general-purpose use. Like Church, probabilistic models and inference problems in Venture are specified via a Turing-complete, higher-order probabilistic language descended from Lisp. Unlike Church, Venture also provides a compositional language for custom inference strategies built out of scalable exact and approximate techniques. We also describe four key aspects of Venture's implementation that build on ideas from probabilistic graphical models. First, we describe the stochastic procedure interface (SPI) that specifies and encapsulates primitive random variables. The SPI supports custom control flow, higher-order probabilistic procedures, partially exchangeable sequences and "likelihood-free" stochastic simulators. It also supports external models that do inference over latent variables hidden from Venture. Second, we describe probabilistic execution traces (PETs), which represent execution histories of Venture programs. PETs capture conditional dependencies, existential dependencies and exchangeable coupling. Third, we describe partitions of execution histories called scaffolds that factor global inference problems into coherent sub-problems. Finally, we describe a family of stochastic regeneration algorithms for efficiently modifying PET fragments contained within scaffolds. Stochastic regeneration linear runtime scaling in cases where many previous approaches scaled quadratically. We show how to use stochastic regeneration and the SPI to implement general-purpose inference strategies such as Metropolis-Hastings, Gibbs sampling, and blocked proposals based on particle Markov chain Monte Carlo and mean-field variational inference techniques.