alt text 

About Me

I am seeking an academic position for Fall 2020. My materials are here: CV, research statement, teaching statement, and diversity and inclusion statement. Thank you for your interest!

I am a postdoctoral researcher in the Stanford CS department, working with Chris Ré.

Previously, I was a short-term postdoc at in the UCLA CS and EE departments, working with Guy Van den Broeck and Lara Dolecek. I received my Ph.D. in electrical engineering from UCLA in December 2016. Before joining UCLA, I studied electrical engineering at the University of Michigan, Ann Arbor.

My CV and my Google Scholar profile.

My dissertation, Algorithms and Coding Techniques for Reliable Data Management and Storage (Outstanding Ph.D. Dissertation in Signals & Systems Award, UCLA EE Dept.) and my thesis, Novel Coding Strategies for Multi-Level Non-Volatile Memories (Edward K. Rice Outstanding Masters Student Award, UCLA HSSEAS, Outstanding M.S. Thesis Award, UCLA EE Dept.).


What's New

  • 1-6: Our paper on instrumental variable synthesis for causal inference was accepted to AISTATS 2020!

  • 9-3: Honored to receive a top reviewer award (along with a free registration!) to NeurIPS 2019.

  • 5-26: I gave a talk on non-Euclidean embeddings and ML at the 2019 Physics in ML as part of ml4science, hosted by Berkeley.

Research Interests

My research focuses on providing a fundamental understanding of data-driven systems. I derive theoretical results (tradeoffs and limits) and use these insights to extend and improve practical systems. I work on problems in machine learning, statistics, data science, and information and coding theory. Projects I have worked on include:

  • Geometry and structure of data. Modern ML methods require first embedding data into a continuous space — traditionally Euclidean space. However, the structure of data makes Euclidean space unsuitable for many types of data (like hierarchies!) We show that non-Euclidean spaces like hyperbolic space (and other manifolds!) are more suitable for embeddings and study the limits and tradeoffs of these techniques in our ICML ’18 and ICLR ’19 papers. Check out our blog posts (intro to non-Euclidean ML and on hyperbolics) for a gentle introduction.

  • Weak supervision for machine learning models. Obtaining large amounts of labeled data is such a bottleneck that practitioners have increasingly turned to weaker forms of supervision. We studied efficient algorithms for synthesizing labels from weak supervision sources (AAAI ’19) with theoretical guarantees, an efficient way to learn the structure of a model of such sources (ICML ’19), and a new way to tackle labeling data for large-scale video and time-series applications (NeurIPS ’19).

  • Efficient data synchronization and reconstruction. What is the least amount of information we must exchange to synchronize between two versions of a file, or to reconstruct a core piece of data from noisy samples? My work studies bounds and algorithms for these techniques in IT ’17 and TCOM ’16.

  • Reliable data storage in next-gen memories: New memories have revolutionized the world of storage with their speed and power efficiency. However, modern memories suffer from specific physical limitations that lead to errors and corruption. Novel reliability and error-correcting techniques are critical to the future of these devices. My work develops new data representations (TCOM ’13), new coding techniques (Comm. Letters ’14), and how to make algorithms more robust (TCOM ’17). I am also interested in theoretical frameworks (SELSE ’16 best of) to evaluate broad ranges of error-correction techniques.

I am also very interested in scientific writing and communication. I strongly believe in the importance of clearly and effectively communicating research ideas to a broad and popular audience. I have written a book on channel coding for non-volatile memories, a book chapter on advanced error-correction techniques for 3D flash memories, and an expository article on dealing with flash deficiencies.