Think, Learn, and Act on An Episodic Memory Graph

  • Yang, Ge*; Zhang, Amy; Morcos, Ari S; Pineau, Joelle; Abbeel, Pieter; Calandra, Roberto
  • Spotlight talk
  • [PDF] [Slides] [Join poster session]
    Poster session from 15:00 to 16:00 EAT and from 20:45 to 21:45 EAT
    Obtain the zoom password from ICLR


Designing agents that adapt quickly in non-stationary environments remains an open challenge. Value-bootstrapping limits how quickly Q-learning converges, yet search-based methods are computationally heavy at decision time. In this work, we study how to obtain the best of both approaches when data is scarce and the environment is changing. We introduce the extit{Universal Value Prediction Network}, an approach that learns a partial model by distilling a distance metric from searches on an episodic memory graph. Results show that the learned value function contains an accurate metric map of the state space; the learned heuristic search have lower planning cost at inference time than exhaustive graph search methods; and that the learning system is quick at adapting to changes in the environment. Our method is a way to bring model-free, model-based, and episodic control together within the same agent to alleviate the deficiency of each.

If videos are not appearing, disable ad-block!