Abstract. To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previously learned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to select and adapt pre-trained behaviors to the situation at hand. Crucially, this adaptation process all happens within a single episode at test time, without any human supervision. We provide theoretical analysis of our selection mechanism and demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet. Our approach adapts over 2x as efficiently compared to existing methods when facing a variety of out-of-distribution situations during deployment by effectively choosing and adapting relevant behaviors on-the-fly.
Key Idea: Use the value functions of the behaviors to identify an appropriate behavior at every timestep during deployment. With proper regularization, value functions provide a good indication of how well different behaviors will perform in a given situation.
(1) Fine-tune each behavior's value function to encourage identifiability in familiar states with an additional behavior classification loss.
(2) At deployment time, sample behavior with respect to classification probability, execute action, optionally fine-tune further.
Benefits of ROAM: (1) Doesn't require learning a separate high-level controller, (2) Agnostic to how pre-trained policies and value functions are obtained, (3) Provides simple mechanism for adapting within a single episode to a variety of situations.
Here we show evaluation trials where we find that ROAM can adapt on-the-go to OOD situations in the real world. ROAM enables the robot to slide forward on roller skates without ever having seen roller skates during training. ROAM can also pull heavy luggage and pull loads with changing weights without having been trained to pull any object before.
Walking (No Behavior Modulation)
High-Level Classifier
ROAM (ours)
Walking (No Behavior Modulation)
High-Level Classifier
ROAM (ours)
Walking (No Behavior Modulation)
High-Level Classifier
ROAM (ours)
In our simulated experiments, we find that ROAM is over 2x as efficient as prior methods that are designed for fast adaptation. Here we plot the behavior distribution for ROAM over the course of a single-life trial, where the agent is tasked with adapting to different stiffness on-the-go. Green bars indicate relevant behaviors to the current situations while red bars indicate irrelevant behaviors. We find that ROAM can quickly react to changing situations by choosing and adapting relevant behaviors on-the-fly.
Simulation