Ph.D. in Computer Science, applying deep learning to study regulatory genomics
Ph.D. Advisor: Anshul Kundaje.
2009 - 2013
B.S. Computer Science with Molecular Biology, Minor in Mathematics
Undergraduate GPA: 5.0/5.0


Kundaje Lab
Sep 2014 - Sep 2020
Palantir Technologies
June 2013 - Sep 2014
Forward Deployed Engineer for the Healthcare Team.
  • Primary developer for pilot project which successfully became the Healthcare Team's first enterprise deployment. Continued as primary developer until I left the company.
  • Created a way to integrate Palantir's two major platforms at the time into a seamless experience; success was highlighted company-wide.
  • Went beyond my assigned duties to build multiple handy software utilities for use team-wide and company-wide.
  • Consistently rated high on bi-annual reviews. Was told that I had at times done the work of 2-3 coders and had set a new standard for productivity among my teammates.

Selected Talks

ICML 2020
July 2020
Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Domain Adaptation
Video here.
ISMB 2019
July 2019
GkmExplain: Fast and Accurate Interpretation of Nonlinear Gapped k-mer SVMs
Video here.
Biological Data Science
Nov 2018
Suggested best practices for interpreting deep learning models of regulatory DNA
Slides here.
March 2018
Not Just a Black Box: Interpretable Deep Learning for Genomics and Beyond
Video here.
Dec 2017
TF-MoDISco: Deep learning non-redundant, predictive sequence motifs of transcription factors
Video here.
ICML 2017
Aug 2017
Learning Important Features Through Propagating Activation Differences
Video and slides here and here, paper here.
Deep Learning In Healthcare Summit, Boston
May 2017
Not Just a Black Box: Interpretable Deep Learning for Genomics
Interview here.
CEHG Symposium
March 2016
Not Just a Black Box: Interpretable Deep Learning for Genomics
Video here.

Selected Publications

ICML Workshop on Computational Biology
July 2020
Look at the Loss: Towards Robust Detection of False Positive Feature Interactions Learned by Neural Networks on Genomic Data.
Mara Finkelstein*, Avanti Shrikumar* Anshul Kundaje
*co-first authors
† co-corresponding authors

Novel strategy to detect when feature interactions learned by a neural network may be false positives, by looking at the impact that the learned interaction effect has on the model's prediction loss on held-out data.

ICML 2020
July 2020
Maximum Likelihood With Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation.
Amr Alexandari*, Anshul Kundaje†, Avanti Shrikumar*
*co-first authors
† co-corresponding authors

Algorithm that gives state-of-the-art results at a problem called domain adaptation to label shift, which arises when adapting a trained classifier to perform well in a scenario where the class proportions are different compared to when the classifier was first trained (e.g. adapting a disease predictor to account for a surge in cases due to a pandemic)

Proceedings of the ISMB
July 2019
Gkmexplain: Fast and Accurate Interpretation of Nonlinear Gapped k-mer SVMs.
Avanti Shrikumar*, Eva Prakash*, Anshul Kundaje
*co-first authors
† co-corresponding authors

Computationally efficient algorithm for explaining individual predictions made by nonlinear gapped-kmer SVMs trained on genomic sequences.

Nature Biotechnology
May 2019
Kipoi: accelerating the community exchange and reuse of predictive models for genomics.
Žiga Avsec*, Roman Kreuzhuber*, Johnny Israeli, Nancy Xu, Jun Cheng, Avanti Shrikumar, Abhimanyu Banerjee, Daniel S. Kim, Lara Urban, Anshul Kundaje, Oliver Stegle, Julien Gagneur
*co-first authors
† co-corresponding authors

Model zoo for genomics. I was involved in designing the API and converted the DeepBind models to Keras using the code here

July 2018
Computationally Efficient Measures of Internal Neuron Importance.
Avanti Shrikumar*, Jocelin Su*, Anshul Kundaje†.
*co-first authors
† co-corresponding authors

Showed an equivalence between Total Conductance, a recently-proposed method for computing internal neuron importance, and Path Intergrated Gradients, thereby providing a computationally efficient way to compute the former. The reformulation of Total Conductance was referred to as Neuron Integrated Gradients. Benchmarked Neuron Integrated Gradients against DeepLIFT. Colab notebook reproducing results here

Journal of the Royal Society Interface
April 2018
Opportunities And Obstacles For Deep Learning In Biology And Medicine

Collaboratively written review on deep learning for biology and medicine. I wrote the section on interpretation. Note that the final published version was stripped down due to word limits. I have linked to the original submission from my end.

Feb 2018
A Flexible and Adaptive Framework for Abstention Under Class Imbalance.
Avanti Shrikumar*,Amr Alexandari*, Anshul Kundaje†.
*co-first authors
† co-corresponding authors

Proposes a framework for identifying which examples to abstain on in order to optimize for a specific metric of interest. Leverages the insight that because the calibrated probabilities can be used as a proxy for the true label, optimization is possible even when the ground-truth labels are not known. Derived computationally efficient algorithms for optimizing auROC, sensitivity at a target specificity and weighted cohen's kappa. Showed that by leveraging strategies for domain adaptation to label shift, the abstention algorithms can apply even in situations where the test-set distribution has a different class imbalance compared to the training-set distribution.

ICML Workshop on Computational Biology (Spotlight Talk, Best Poster)
June 2017
Separable Fully-Connected Layers Improve Deep Learning Models For Genomics.
Amr Alexandari*, Avanti Shrikumar*, Anshul Kundaje.
*co-first authors

Adapts deep learning models for genomics by leveraging known patterns in transcription factor binding data.

April 2017
Learning Important Features Through Propagating Activation Differences.
Avanti Shrikumar, Peyton Greenside, Anshul Kundaje.

Details a computationally efficient algorithm to explain individual predictions of a deep learning model by assigning contribution scores to individual parts of the input. Code here.

January 2017
Reverse-Complement Parameter Sharing Improves Deep Learning Models For Genomics.
Avanti Shrikumar*, Peyton Greenside*, Anshul Kundaje.
*co-first authors

Adapts deep learning models for genomics by leveraging the reverse-complement property of DNA sequence.

Circulation Research
Dec 2014
Transcriptional Reversion of Cardiac Myocyte Fate During Mammalian Cardiac Regeneration.
O'Meara CC, Wamstad JA, Gladstone RA, Fomovsky GM, Butty VL, Shrikumar A, Gannon JB, Boyer LA, Lee RT

Collaboration between Boyer lab at MIT and Lee lab at Harvard. I analysed RNA-seq data to study transcriptional reversion.

Sep 2012
Dynamic and coordinated epigenetic regulation of developmental transitions in the cardiac lineage.
Wamstad JA, Alexander JM, Truty RM, Shrikumar A, Li F, Eilertson KE, Ding H, Wylie JN, Pico AR, Capra JA, Erwin G, Kattman SJ, Keller GM, Srivastava D, Levine SS, Pollard KS, Holloway AK, Boyer LA†, Bruneau BG†.
† co-corresponding authors

Collaboration between Boyer lab at MIT and Gladstone Institutes. I performed the bulk of bioinformatics analysis at the Boyer lab. 355 citations as of Jan 2018.


HHMI International Student Research Fellowship
Awarded to 20 international students. Announcement here.
Stanford Bio-X Fellowship
Bio-X fellowships are awarded to about 25 students annually for interdisciplinary research. Announcement here.
Microsoft Women's Fellowship
Awarded to one woman per participating University pursuing or interested in pursuing a PhD. Announcement here.
Outstanding Research Award
Spring 2013
Awarded to 3 projects completed as part of MIT's SuperUROP program. My project was done in the Kellis lab. Announcement here.
Sophomore Academic Excellence Award
Fall 2011
AIChE Sophomore Academic Excellence Award for the student with the highest GPA among chemical engineers after sophomore year at MIT. Announcement here.
IGCSE Examinations
June 2006 & 2007
The IGCSE was administered in roughly 300 schools in India. I had the highest score in India in Extended Mathematics Without Coursework (June 2006; press release), Physics (June 2007) and Geography (June 2007).

Selected Coursework

Stanford Probabilistic Graphical Models CS 228 Winter 2015 A+
Stanford Machine Learning CS 229 Fall 2015 A
MIT Statistics for Applications 18.443 Spring 2013 A
MIT Advanced Computational Biology 6.878 Fall 2012 A+
MIT Design and Analysis of Algorithms 6.046 Spring 2012 A
MIT Software Construction 6.005 Spring 2012 A+
MIT Evolutionary Biology 7.33 Spring 2012 A+