Sophia Lu

 

Research Interests

Efficient sampling algorithms for black-box models, generative modeling and synthetic data, distributional regression, probabilistic modeling and Bayesian methods, approximate Bayesian inference, and their applications to Bioengineering, Genetics and Genomics, Astrophysics, Econometrics, etc.

About

I am is a fifth-year Ph.D. candidate in the Department of Statistics at Stanford University, where I am fortunate to be advised by Wing H. Wong.

Broadly, my research interests lie at the intersection of efficient Bayesian inference and the development of robust, interpretable, and theoretically grounded statistical machine learning methods. My work aims to bridge the gap between advanced statistical methodologies and real-world applications by delivering reliable guarantees, with the goal of advancing principled statistical foundations for scientific discovery.

Before pursuing my doctoral studies, I graduated with honors in Mathematical and Computational Science and a minor in Mathematics from Stanford University. My research is supported by the Stanford Data Science Graduate Fellowship, the Two Sigma Graduate Fellowship Fund, and Google Cloud Academic Research Grant (Co-PI), and Stanford HAI-Google Cloud Credits Grant (Co-PI).

Research

Publications and Preprints

  1. Univariate-Guided Sparse Regression for Biobank-Scale High-Dimensional Omics Data
    Joshua Richland, Tuomo Kiiskinen, William Wang, Wenhui Sophia Lu, Balasubramanian Narasimhan, Trevor Hastie, Manuel Rivas, Robert Tibshirani

  2. Steering Protein Generative Models at Test-Time for Guided AAV2 Capsid Design
    Ben Viggiano*, Wenhui Sophia Lu*, Xiaowei Zhang*, Luis Santiago Mille-Fragoso, Xiaojing J Gao, Euan Ashley, Wing Hung Wong
    Accepted in Proceedings & Oral at Pacific Symposium on Biocomputing (PSB), 2026

  3. ProVADA: Generating Subcellular Protein Variants via Ensemble-Guided Test-Time Steering
    Wenhui Sophia Lu*, Xiaowei Zhang*, Luis Santiago Mille-Fragoso, Haoyu Dai, Xiaojing J Gao, Wing Hung Wong
    Spotlight & Oral at Generative AI for Biology Workshop, ICML

  4. Likelihood-Free Adaptive Bayesian Inference via Nonparametric Distribution Matching
    Wenhui Sophia Lu, Wing Hung Wong

  5. Generative Modeling for Tabular Data via Penalized Optimal Transport Network
    Wenhui Sophia Lu*, Chenyang Zhong*, Wing Hung Wong
    Package available here
    Submitted

  6. Comparison of REML methods for the study of phenome-wide genetic variation
    Damian Pavlyshyn, Wenhui Sophia Lu, Iain M. Johnstone, and Jacqueline L. Sztepanacz
    Under revision at Genetics

  7. Sc-compReg enables the comparison of gene regulatory networks between conditions using single-cell data
    Zhana Duren*, Wenhui Sophia Lu*, Joseph G. Arthur, Preyas Shah, Jingxue Xin, Francesca Meschi, Miranda Lin Li, Corey M. Nemec, Yifeng Yin, and Wing Hung Wong
    Nature Communications, 2021

* indicates equal contribution

Invited and Contributed Talks

  • Likelihood-Free Adaptive Bayesian Inference via Nonparametric Distribution Matching

  • Efficient Likelihood-Free Adaptive Bayesian Inference

  • Efficient Generative Modeling via Penalized Optimal Transport Network

  • ProVADA: Generation of Subcellular Protein Variants via Ensemble-Guided Test-Time Steering

  • Likelihood-Free Adaptive Bayesian Inference

  • An Introduction to Approximate Bayesian Computation and Likelihood-Free Inference

    • Guest lectures (3 sessions), Applied Bayesian Statistics | STATS 371, Stanford University, May 2025.

  • Towards faithful synthetic data generation via penalized optimal transport network

  • Modern Bayesian Modeling and Adaptive Bayesian Inference

    • Guest lecture, Topics in Computing for Data Science | STATS/BIODS 352, Stanford University, Apr 2025.

  • Towards faithful synthetic data generation via penalized optimal transport network

  • Towards faithful synthetic data generation via penalized optimal transport network

    • Lightning talk, Women in Data Science Workshop, Stanford University, Mar 2025.

  • Modern Bayesian modeling and adaptive Bayesian inference

    • Citadel GQS PhD Colloquium, New York, May 2024.

Selected Presentations

  • Efficient Likelihood-Free Adaptive Bayesian Inference

    • Stanford Berkeley Joint Colloquium, Stanford University, October 2025.

  • ProVADA: Generation of Subcellular Protein Variants via Ensemble-Guided Test-Time Steering

    • Stanford Bio-X Interdisciplinary Initiatives Seed Grants Poster Session, Aug 2025.

    • Selected as one of the 11 Rank 1 poster awards (among 336 poster submissions).

  • Efficient Generative Modeling via Penalized Optimal Transport Network

    • Berkeley Stanford Joint Colloquium, UC Berkeley, May 2025.

  • Efficient Generative Modeling via Penalized Optimal Transport Network

    • The Past, Present & Future of Statistics in the Era of AI, The George Washington University, May 2025.

    • Gratefully supported by travel award from NSF.

  • Efficient Generative Modeling via Penalized Optimal Transport Network

    • Optimization and Statistical Learning Workshop, Columbia University, Apr 2025.

    • Gratefully supported by travel award from NSF.

  • Efficient Generative Modeling via Penalized Optimal Transport Network

    • Statistics and Optimal Transport Workshop, Columbia University, Mar 2025.

    • Gratefully supported by travel award.

  • Efficient Generative Modeling via Penalized Optimal Transport Network

    • Stanford Data Science Conference, Stanford University, Apr 2024.

Contact

sophialu (at) stanford (dot) edu