Education

Stanford
2014 - (2019)
Ph.D.
Ph.D. in Computer Science, applying deep learning to study regulatory genomics
Ph.D. Advisor: Anshul Kundaje.
MIT
2009 - 2013
B.S.
B.S. Computer Science with Molecular Biology, Minor in Mathematics
Undergraduate GPA: 5.0/5.0

Experience

Kundaje Lab
Sep 2014 - Present
Palantir Technologies
June 2013 - Sep 2014
Forward Deployed Engineer for the Healthcare Team.
  • Primary developer for pilot project which successfully became the Healthcare Team's first enterprise deployment. Continued as primary developer until I left the company.
  • Created a way to integrate Palantir's two major platforms at the time into a seamless experience; success was highlighted company-wide.
  • Went beyond my assigned duties to build multiple handy software utilities for use team-wide and company-wide.
  • Consistently rated high on bi-annual reviews. Was told that I had at times done the work of 2-3 coders and had set a new standard for productivity among my teammates.

Selected Talks

Biological Data Science
Nov 2018
Suggested best practices for interpreting deep learning models of regulatory DNA
Slides here.
NVIDIA GTC
March 2018
Not Just a Black Box: Interpretable Deep Learning for Genomics and Beyond
Video here.
NIPS MLCB
Dec 2017
TF-MoDISco: Deep learning non-redundant, predictive sequence motifs of transcription factors
Video here.
ICML 2017
Aug 2017
Learning Important Features Through Propagating Activation Differences
Video and slides here and here, paper here.
Deep Learning In Healthcare Summit, Boston
May 2017
Not Just a Black Box: Interpretable Deep Learning for Genomics
Interview here.
CEHG Symposium
March 2016
Not Just a Black Box: Interpretable Deep Learning for Genomics
Video here.

Selected Publications

Proceedings of the ISMB
July 2019
Gkmexplain: Fast and Accurate Interpretation of Nonlinear Gapped k-mer SVMs.
Avanti Shrikumar*†, Eva Prakash*, Anshul Kundaje†.
*co-first authors
† co-corresponding authors

Computationally efficient algorithm for explaining individual predictions made by nonlinear gapped-kmer SVMs trained on genomic sequences.

Nature Biotechnology
May 2019
Kipoi: accelerating the community exchange and reuse of predictive models for genomics.
Žiga Avsec*, Roman Kreuzhuber*, Johnny Israeli, Nancy Xu, Jun Cheng, Avanti Shrikumar, Abhimanyu Banerjee, Daniel S. Kim, Lara Urban, Anshul Kundaje, Oliver Stegle, Julien Gagneur
*co-first authors
† co-corresponding authors

Model zoo for genomics. I was involved in designing the API during the early stages and converted the DeepBind models to Keras using the code here

arXiv
Jan 2019
Adapting to Label Shift with Bias-Corrected Calibration.
Amr Alexandari*,Anshul Kundaje,Avanti Shrikumar*†.
*co-first authors
† co-corresponding authors

Studied the impact of calibration on domain adaptation to label shift in the context of modern neural networks. Identified a principled strategy for computing source-domain priors in EM-based domain adaptation that is particularly important when the calibrated probabilities have systematic bias. When source-domain priors are computed using our proposed approach, label shift estimates from EM can perform surprisingly well compared to the recently-proposed Black Box Shift Estimation (ICML 2018) even when predictions are not calibrated. In experiments with image classification & diabetic retinopathy detection, found that the best results were obtained by using EM-based domain adaptation with a calibration approach that contains class-specific bias parameters capable of reducing systematic bias.

arXiv
July 2018
Computationally Efficient Measures of Internal Neuron Importance.
Avanti Shrikumar*†, Jocelin Su*, Anshul Kundaje†.
*co-first authors
† co-corresponding authors

Showed an equivalence between Total Conductance, a recently-proposed method for computing internal neuron importance, and Path Intergrated Gradients, thereby providing a computationally efficient way to compute the former. The reformulation of Total Conductance was referred to as Neuron Integrated Gradients. Benchmarked Neuron Integrated Gradients against DeepLIFT. Colab notebook reproducing results here

Journal of the Royal Society Interface
April 2018
Opportunities And Obstacles For Deep Learning In Biology And Medicine

Collaboratively written review on deep learning for biology and medicine. I wrote the section on interpretation. Note that the final published version was stripped down due to word limits. I have linked to the original submission from my end.

arXiv
Feb 2018
A Flexible and Adaptive Framework for Abstention Under Class Imbalance.
Avanti Shrikumar*,Amr Alexandari*, Anshul Kundaje†.
*co-first authors
† co-corresponding authors

Proposes a framework for identifying which examples to abstain on in order to optimize for a specific metric of interest. Leverages the insight that because the calibrated probabilities can be used as a proxy for the true label, optimization is possible even when the ground-truth labels are not known. Derived computationally efficient algorithms for optimizing auROC, sensitivity at a target specificity and weighted cohen's kappa. Showed that by leveraging strategies for domain adaptation to label shift, the abstention algorithms can apply even in situations where the test-set distribution has a different class imbalance compared to the training-set distribution.

ICML Comp Bio Workshop (Spotlight Talk)
June 2017
Separable Fully-Connected Layers Improve Deep Learning Models For Genomics.
Amr Alexandari*, Avanti Shrikumar*, Anshul Kundaje.
*co-first authors

Adapts deep learning models for genomics by leveraging known patterns in transcription factor binding data.

ICML
April 2017
Learning Important Features Through Propagating Activation Differences.
Avanti Shrikumar, Peyton Greenside, Anshul Kundaje.

Details a computationally efficient algorithm to explain individual predictions of a deep learning model by assigning contribution scores to individual parts of the input. Code here.

BioRxiv
January 2017
Reverse-Complement Parameter Sharing Improves Deep Learning Models For Genomics.
Avanti Shrikumar*, Peyton Greenside*, Anshul Kundaje.
*co-first authors

Adapts deep learning models for genomics by leveraging the reverse-complement property of DNA sequence.

Circulation Research
Dec 2014
Transcriptional Reversion of Cardiac Myocyte Fate During Mammalian Cardiac Regeneration.
O'Meara CC, Wamstad JA, Gladstone RA, Fomovsky GM, Butty VL, Shrikumar A, Gannon JB, Boyer LA, Lee RT

Collaboration between Boyer lab at MIT and Lee lab at Harvard. I analysed RNA-seq data to study transcriptional reversion.

Cell
Sep 2012
Dynamic and coordinated epigenetic regulation of developmental transitions in the cardiac lineage.
Wamstad JA, Alexander JM, Truty RM, Shrikumar A, Li F, Eilertson KE, Ding H, Wylie JN, Pico AR, Capra JA, Erwin G, Kattman SJ, Keller GM, Srivastava D, Levine SS, Pollard KS, Holloway AK, Boyer LA, Bruneau BG.

Collaboration between Boyer lab at MIT and Gladstone Institutes. I performed the bulk of bioinformatics analysis at the Boyer lab. 355 citations as of Jan 2018.

Recognition

HHMI International Student Research Fellowship
2016
Awarded to 20 international students. Announcement here.
Stanford Bio-X Fellowship
2016
Bio-X fellowships are awarded to about 25 students annually for interdisciplinary research. Announcement here.
Microsoft Women's Fellowship
2016
Awarded to one woman per participating University pursuing or interested in pursuing a PhD. Announcement here.
Outstanding Research Award
Spring 2013
Awarded to 3 projects completed as part of MIT's SuperUROP program. My project was done in the Kellis lab. Announcement here.
Sophomore Academic Excellence Award
Fall 2011
AIChE Sophomore Academic Excellence Award for the student with the highest GPA among chemical engineers after sophomore year at MIT. Announcement here.
IGCSE Examinations
June 2006 & 2007
The IGCSE was administered in roughly 300 schools in India. I had the highest score in India in Extended Mathematics Without Coursework (June 2006; press release), Physics (June 2007) and Geography (June 2007).

Selected Coursework

Stanford Probabilistic Graphical Models CS 228 Winter 2015 A+
Stanford Machine Learning CS 229 Fall 2015 A
MIT Statistics for Applications 18.443 Spring 2013 A
MIT Advanced Computational Biology 6.878 Fall 2012 A+
MIT Design and Analysis of Algorithms 6.046 Spring 2012 A
MIT Software Construction 6.005 Spring 2012 A+
MIT Evolutionary Biology 7.33 Spring 2012 A+