Annie S. Chen
Hi! I am a fourth-year computer science PhD student at Stanford University advised by Prof. Chelsea Finn and affiliated with the Stanford Artificial Intelligence Laboratory (SAIL).
I am supported by an NSF Graduate Research Fellowship and an OpenAI Superalignment Fellowship.
I recently spent six months as a full-time student researcher at Google DeepMind in London. Previously, in 2021, I received a joint B.S. in math and M.S. in computer science, both also at Stanford. I was also a research intern at Google Brain, where I learned a lot working with Pete Florence.
I am originally from Boulder, Colorado, and outside of research, I enjoy spending time outdoors (hiking and backpacking), playing tennis, and learning to play the guitar.
For three years I organized the Stanford CS Undergraduate Mentoring Program to help undergraduate students get involved with computer science research.
|
|
My research focuses on developing robust and adaptable machine learning systems that are capable of handling distribution shifts and efficiently respond to new information. I am excited by a broad range of machine learning topics, including robustness and adaptation to distribution shifts, reinforcement learning, and embodied AI.
In particular, here are some problems I've been working on recently:
-
Understanding and Manipulating Data Distributions:
-
How does the composition and quality of training data influence robustness, and how do we manipulate the data distribution to expose models to the right information?
-
On-The-Fly Adaptation:
-
What are approaches that effectively leverage prior knowledge and foundation models to adapt behavior at test time? How do we develop good signals for steering behavior during deployment?
-
Autonomous Improvement:
-
What are effective methods to facilitate models to learn and generalize from their own behavior? How do we effectively train and fine-tune models leveraging failures, self-generated feedback, and prior experience, and how do we provide or generate useful data for this process?
Please feel free to reach out about research or any advice I can help with!
[Email]
[CV]
[Google Scholar]
[Twitter]
[LinkedIn]
[GitHub]
|
Selected Research
Please see my CV or Google Scholar for a full list of work.
|
|
Reinforcement Learning via Implicit Imitation Guidance
Perry Dong*, Alec M. Lessing*, Annie S. Chen*, Chelsea Finn
Under submission, 2025
[PDF]
We introduce Data-Guided Noise (DGN), a framework that uses expert data to shape exploration in order to improve sample efficiency for online reinforcement learning.
|
|
|
Curating Demonstrations with Online Experience
Annie S. Chen*, Alec M. Lessing*, Yuejiang Liu, Chelsea Finn
Robotics: Science and Systems (RSS), 2025
[PDF]
[Website]
Data curation is crucial but is usually difficult and tedious. We introduce Demo-SCORE, an automatic way to curate, informed by online experience.
|
|
|
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models
Annie S. Chen*, Alec M. Lessing*, Andy Tang*, Govind Chada*, Laura Smith, Sergey Levine, Chelsea Finn
International Conference on Robotics and Automation (ICRA), 2025
[PDF]
[Website]
[Code]
We propose VLM-PC to provide adaptive high-level planning, so that robots can get unstuck by exploring multiple strategies.
|
|
|
Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment
Annie S. Chen*, Govind Chada*, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn
Conference on Lifelong Learning Agents (CoLLAs), 2025
[PDF]
[Website]
[Code]
We propose Robust Autonomous Modulation (ROAM), a framework for efficiently leveraging pre-trained behaviors to quickly adapt to changing situations at deployment time.
|
|
|
Calibrating Language Models with Adaptive Temperature Scaling
Johnathan Xie*, Annie S. Chen*, Yoonho Lee, Eric Mitchell, Chelsea Finn
EMNLP, 2024
[PDF]
[Code]
RLHF often degrades the calibration of pre-trained LLMs. We propose a lightweight post-hoc calibration method, Adaptive Temperature Scaling (ATS), which addresses post-RLHF calibration degradation while maintaining performance improvements.
|
|
|
Confidence-Based Model Selection: When to Take Shortcuts for Subpopulation Shifts
Annie S. Chen, Yoonho Lee, Amrith Setlur, Sergey Levine, Chelsea Finn
NeurIPS DistShift Workshop, 2023
[PDF]
We propose COSMOS, a method that adaptively selects models with different strengths to perform well on both majority and minority subpopulations without needing target labels or group annotations. |
|
|
Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features
Annie S. Chen*, Yoonho Lee*, Amrith Setlur, Sergey Levine, Chelsea Finn
International Conference on Learning Representations (ICLR), 2024 (Spotlight (top 5%))
[PDF]
We propose Project and Probe (Pro^2), a lightweight + data-efficient approach for domain adaptation.
|
|
|
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang
Robotics: Science and Systems (RSS), 2023 (Best Paper Finalist)
[PDF]
[Website]
[Code]
We propose Voltron, which uses language to learn better visual representations for a diverse range of robotics problems by trading off conditioning and generation.
|
|
|
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
Yoonho Lee*, Annie S. Chen*, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, Chelsea Finn
International Conference on Learning Representations (ICLR), 2023
[PDF]
[Code]
We show that selectively fine-tuning a subset of layers (surgical fine-tuning) outperforms fine-tuning all layers and reveals insights into the type of distribution shift present in the data.
|
|
|
You Only Live Once: Single-Life Reinforcement Learning
Annie S. Chen, Archit Sharma, Sergey Levine, Chelsea Finn
Neural Information Processing Systems (NeurIPS), 2022
[PDF]
[Code]
We introduce Single-Life RL, where agents must adapt to novel tasks in a single trial without supervision, and propose QWALE, to guide agents when out-of-distribution to recover to prior experience.
|
|
|
Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos
Annie S. Chen, Suraj Nair, Chelsea Finn
Robotics Science and Systems (RSS), 2021
[PDF]
[Website]
[Code]
We propose DVD: reward functions learned from in-the-wild human videos that generalize to new environments and tasks.
|
|
|
Just Train Twice: Improving Group Robustness without Training Group Information
Evan Z. Liu*, Behzad Haghgoo*, Annie S. Chen*, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, Chelsea Finn
International Conference on Machine Learning (ICML), 2021 (Long Talk (top 3%))
[PDF]
[Code]
JTT improves worst-group performance without needing group labels by extracting and upsampling difficult, informative examples.
|
|
|
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning
Annie S. Chen*, Hyunji Nam*, Suraj Nair*, Chelsea Finn
Robotics and Automation Letters (RA-L), 2021
[PDF]
[Website]
[Code]
BEE uses weak human supervision to guide better robotic exploration for scalable data collection, enabling better offline RL.
|
|
Website template from here.
|
|