MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics

  • Barekatain, Mohammadamin; Yonetani, Ryo*; Hamaya, Masashi
  • Accepted abstract
  • [PDF] [Slides] [Join poster session]
    Poster session from 15:00 to 16:00 EAT and from 20:45 to 21:45 EAT
    Obtain the zoom password from ICLR


This work explores a new challenge in transfer reinforcement learning (RL), where only a set of source policies collected under diverse unknown dynamics is available for quickly learning a target task. To address this problem, we propose MULTI-source POLicy AggRegation (MULTIPOLAR), which comprises two key techniques. 1) Learning to aggregate the actions provided by the source policies adaptively to maximize the target task performance. 2) Learning an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy's expressiveness even when some of the source policies perform poorly. We confirmed the significant effectiveness of MULTIPOLAR across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces. The videos and code are available on the project webpage:

If videos are not appearing, disable ad-block!