Annie S. Chen

Hi! I am a fourth-year computer science PhD student at Stanford University advised by Prof. Chelsea Finn and affiliated with the Stanford Artificial Intelligence Laboratory (SAIL). I am supported by an NSF Graduate Research Fellowship and an OpenAI Superalignment Fellowship.

I recently spent six months as a full-time student researcher at Google DeepMind in London. Previously, in 2021, I received a joint B.S. in math and M.S. in computer science, both also at Stanford. I was also a research intern at Google Brain, where I learned a lot working with Pete Florence.

I am originally from Boulder, Colorado, and outside of research, I enjoy spending time outdoors (hiking and backpacking), playing tennis, and learning to play the guitar. For three years I organized the Stanford CS Undergraduate Mentoring Program to help undergraduate students get involved with computer science research.

My research focuses on developing robust and adaptable machine learning systems that are capable of handling distribution shifts and efficiently respond to new information. I am excited by a broad range of machine learning topics, including robustness and adaptation to distribution shifts, reinforcement learning, and embodied AI. In particular, here are some problems I've been working on recently:

Understanding and Manipulating Data Distributions:
- How does the composition and quality of training data influence robustness, and how do we manipulate the data distribution to expose models to the right information?
On-The-Fly Adaptation:
- What are approaches that effectively leverage prior knowledge and foundation models to adapt behavior at test time? How do we develop good signals for steering behavior during deployment?
Autonomous Improvement:
- What are effective methods to facilitate models to learn and generalize from their own behavior? How do we effectively train and fine-tune models leveraging failures, self-generated feedback, and prior experience, and how do we provide or generate useful data for this process?

Please feel free to reach out about research or any advice I can help with!

[Email] [CV] [Google Scholar] [Twitter] [LinkedIn] [GitHub]

Selected Research

Please see my CV or Google Scholar for a full list of work.

	Reinforcement Learning via Implicit Imitation Guidance Perry Dong, Alec M. Lessing, Annie S. Chen, Chelsea Finn Under submission*, 2025 [PDF] We introduce Data-Guided Noise (DGN), a framework that uses expert data to shape exploration in order to improve sample efficiency for online reinforcement learning.

	Curating Demonstrations with Online Experience Annie S. Chen, Alec M. Lessing, Yuejiang Liu, Chelsea Finn Robotics: Science and Systems (RSS), 2025 [PDF] [Website] Data curation is crucial but is usually difficult and tedious. We introduce Demo-SCORE, an automatic way to curate, informed by online experience.

	Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models Annie S. Chen, Alec M. Lessing, Andy Tang, Govind Chada, Laura Smith, Sergey Levine, Chelsea Finn International Conference on Robotics and Automation (ICRA), 2025 [PDF] [Website] [Code] We propose VLM-PC to provide adaptive high-level planning, so that robots can get unstuck by exploring multiple strategies.

	Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn Conference on Lifelong Learning Agents (CoLLAs), 2025 [PDF] [Website] [Code] We propose Robust Autonomous Modulation (ROAM), a framework for efficiently leveraging pre-trained behaviors to quickly adapt to changing situations at deployment time.

	Calibrating Language Models with Adaptive Temperature Scaling Johnathan Xie, Annie S. Chen, Yoonho Lee, Eric Mitchell, Chelsea Finn EMNLP, 2024 [PDF] [Code] RLHF often degrades the calibration of pre-trained LLMs. We propose a lightweight post-hoc calibration method, Adaptive Temperature Scaling (ATS), which addresses post-RLHF calibration degradation while maintaining performance improvements.

	Confidence-Based Model Selection: When to Take Shortcuts for Subpopulation Shifts Annie S. Chen, Yoonho Lee, Amrith Setlur, Sergey Levine, Chelsea Finn NeurIPS DistShift Workshop, 2023 [PDF] We propose COSMOS, a method that adaptively selects models with different strengths to perform well on both majority and minority subpopulations without needing target labels or group annotations.

	Project and Probe: Sample-Efficient Domain Adaptation by Interpolating Orthogonal Features Annie S. Chen, Yoonho Lee, Amrith Setlur, Sergey Levine, Chelsea Finn International Conference on Learning Representations (ICLR), 2024 (Spotlight (top 5%)) [PDF] We propose Project and Probe (Pro^2), a lightweight + data-efficient approach for domain adaptation.

	Language-Driven Representation Learning for Robotics Siddharth Karamcheti, Suraj Nair, Annie S. Chen, Thomas Kollar, Chelsea Finn, Dorsa Sadigh, Percy Liang Robotics: Science and Systems (RSS), 2023 (Best Paper Finalist) [PDF] [Website] [Code] We propose Voltron, which uses language to learn better visual representations for a diverse range of robotics problems by trading off conditioning and generation.

	Surgical Fine-Tuning Improves Adaptation to Distribution Shifts Yoonho Lee, Annie S. Chen, Fahim Tajwar, Ananya Kumar, Huaxiu Yao, Percy Liang, Chelsea Finn International Conference on Learning Representations (ICLR), 2023 [PDF] [Code] We show that selectively fine-tuning a subset of layers (surgical fine-tuning) outperforms fine-tuning all layers and reveals insights into the type of distribution shift present in the data.

	You Only Live Once: Single-Life Reinforcement Learning Annie S. Chen, Archit Sharma, Sergey Levine, Chelsea Finn Neural Information Processing Systems (NeurIPS), 2022 [PDF] [Code] We introduce Single-Life RL, where agents must adapt to novel tasks in a single trial without supervision, and propose QWALE, to guide agents when out-of-distribution to recover to prior experience.

	Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human Videos Annie S. Chen, Suraj Nair, Chelsea Finn Robotics Science and Systems (RSS), 2021 [PDF] [Website] [Code] We propose DVD: reward functions learned from in-the-wild human videos that generalize to new environments and tasks.

	Just Train Twice: Improving Group Robustness without Training Group Information Evan Z. Liu, Behzad Haghgoo, Annie S. Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, Chelsea Finn International Conference on Machine Learning (ICML), 2021 (Long Talk (top 3%))* [PDF] [Code] JTT improves worst-group performance without needing group labels by extracting and upsampling difficult, informative examples.

	Batch Exploration with Examples for Scalable Robotic Reinforcement Learning Annie S. Chen, Hyunji Nam, Suraj Nair, Chelsea Finn Robotics and Automation Letters (RA-L)*, 2021 [PDF] [Website] [Code] BEE uses weak human supervision to guide better robotic exploration for scalable data collection, enabling better offline RL.

Website template from here.