Amy X. Lu

I'm a third-year Computer Science PhD student at UC Berkeley and BAIR, advised by Pieter Abbeel and supported by a NSERC PGS-D award. I also work part-time on research at Prescient Design (Genentech).

Broadly, I'm interested in developing artificial intelligence methods for biology, and bringing task generalization capabilities in large models to drug discovery. I'm especially interested in generative approaches leveraging multimodal information and biophysical priors.

Previously, I was a Student Researcher at Google Brain and Machine Learning Engineer at insitro. I completed my MSc at the University of Toronto with Alan Moses and Marzyeh Ghassemi, during which I was a visitor in Anshul Kundaje's group at Stanford University. I completed my undergrad at the University of Waterloo, and completed research internships in clinical genomics at Harvard Medical School and molecular dynamics at EPFL.

CV  /  Google Scholar  /  GitHub  /  Twitter  /  amyxlu [at] berkeley [dot] edu

profile photo

*Equal contribution.

TOPH TOPH: Adapting A Contrastive Question-Answering Framework for Protein Search
Ron Boger*, Amy X. Lu*, Seyone Chithrananda*, Kevin Yang, Petr Skopintsev, Ben Adler, Eric Wallace, Peter Yoon, Pieter Abbeel, Jennifer Doudna
ICML Workshop on Computational Biology, 2023
Paper / Poster

We present a protein semantic similarity search method for RNA-Guided endonuclease discovery, inspired by dense retrieval methods in open-domain question answering, and introduce a new dataset of CRISPR-Cas and evolutionary-related nucleases.

Data-Drven Promoter Design Pretraining strategies for effective promoter-driven gene expression prediction
Aniketh Janardhan Reddy, Michael H. Herschl, Sathvik Kolli, Amy X. Lu, Xinyang Geng, Aviral Kumar, Patrick D. Hsu, Sergey Levine, Nilah M. Ioannidis
bioRxiv, 2023

Pretraining and transfer learning strategies for improving model-based design of promoters for cell type specific expression.

MLDD Data-Driven Optimization for Protein Design: Workflows, Algorithms and Metrics
Sathvik Kolli, Amy X. Lu, Xinyang Geng, Aviral Kumar, Sergey Levine
ICLR Workshop on Machine Learning for Drug Discovery (MLDD), 2022

Strategies for data curation, model-training, optimization, and evaluation heuristics for data-driven proposals of novel de novo proteins.

CPCProt Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning
Alex X Lu, Amy X Lu, Iva Pritišanac, Taraneh Zarin, Julie D Forman-Kay, Alan M Moses
PLOS Computational Biology, 2022
Paper / Preprint

Reverse Homology is a self-supervised method which captures evolutionary information by contrastive learning to discover molecular features of intrinsically disordered regions.

bio-embeddings Learned embeddings from deep learning to visualize and predict protein sets
Christian Dallago, Konstantin Schütze, Michael Heinzinger, Tobias Olenyi, Maria Littmann, Amy X Lu, Kevin K Yang, Seonwoo Min, Sungroh Yoon, James T Morton, Burkhard Rost
Current Protocols, 2021
Paper / Web Server / Code
CPCProt Evolution Is All You Need: Phylogenetic Augmentation for Contrastive Learning
Amy X Lu, Alex X. Lu, Alan Moses
Machine Learning for Computational Biology (MLCB), 2020
Paper / Poster

We outline how viewing evolution as natural sequence augmentation for contrastive learning recapitulates comparative genomics, and maximizes the mutual information between sequence and function.

CPCProt Self-Supervised Contrastive Learning of Protein Representations by Mutual Information Maximization
Amy X Lu, Haoran Zhang, Marzyeh Ghassemi, Alan Moses
Machine Learning for Computational Biology (MLCB), 2020
Paper / Poster / Code

CPCProt uses contrastive learning to learn a parameter-efficient way of embedding proteins, and performs competitively with large language models.

SciBERT model trained on medical notes generates biased sentences given the ethnicity of a patient. Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings
Haoran Zhang*, Amy X Lu*, Mohamed Abdalla, Matthew McDermott, Marzyeh Ghassemi.
ACM Conference on Health, Inference, and Learning (CHIL), 2020   (Spotlight)
Paper / arXiv / Poster / Code

We apply fairness definitions to quantify the cross-group bias in BERT embeddings pretrained on medical notes, and find statistically significant differences in classifier performance.

CPCProt The Cells Out of Sample (COOS) dataset and benchmarks for measuring out-of-sample generalization of image classifiers
Alex X Lu, Amy X Lu, Wiebke Schormann, Marzyeh Ghassemi, David Andrews, Alan Moses.
Neural Information Processing Systems (NeurIPS), 2019
Paper / arXiv

Introduces the COOS-7 dataset to benchmark and evaluate the capacity of feature learning methods to generalize to natural distribution shifts in microscopy images.

Coauthorship network History and publication trends in the diffusion and early uptake of indirect comparison meta-analytic methods to study drugs: animated coauthorship networks over time
Joann K Ban, Mina Tadrous, Amy X Lu, Erin A Cicinelli, Suzanne M Cadarette
BMJ open, 2018

Paired Cell Inpainting Schematic Paired Cell Inpainting: A Multiple-Instance Extension of Self-Supervised Learning for Bioimage Analysis
Alex X. Lu, Amy X Lu, Alan Moses
ICML Workshop on Self-Supervised Learning, 2019
Quantifying Fairness in a Multi-Group Setting and its Impact in the Clinical Setting
Mohamed Abdalla, Amy X Lu, Haoran Zhang, Irene Chen, Marzyeh Ghassemi
NeurIPS Workshop on Fair ML for Health, 2019

Transfer Learning vs. Batch Effects: what can we expect from neural networks in computational biology
Alan Moses, Alex X Lu, Amy X Lu, Marzyeh Ghassemi
Machine Learning for Computational Biology (MLCB), 2019

Misc. I enjoy road biking through the East Bay redwoods, and playing the piano, especially Chopin and hip-hop covers. I'm usually coding to EDM or Beethoven's complete piano sonatas while eating 90% dark chocolate. My car and bikes are named after F. Scott Fitzgerald characters, and administrative entities call me Xiaoping Lu (逯晓萍).

Template from here!