Visual Control with Variational Contrastive Dynamics

  • Luo, Calvin*; Hafner, Danijar
  • Accepted abstract
  • [PDF] [Slides] [Join poster session]
    Poster session from 15:00 to 16:00 EAT and from 20:45 to 21:45 EAT
    Obtain the zoom password from ICLR


To achieve goals in complex environments with image inputs, intelligent agents must generalize to unseen situations. World models provide a way to summarize past experience to facilitate generalization by letting the agent imagine counterfactual scenarios. Recent advances in deep learning have enabled learning world models directly from images through pixel reconstruction. However, in visually complex environments, this approach requires high model capacity to succeed. We present Variational Contrastive Dynamics (VCD), a latent world model learned purely by contrasting the embeddings of encoded images. Learning behaviors in the latent space of VCD closes the gap between approaches based on contrastive learning and pixel reconstruction reconstruction, as demonstrated on both discrete Atari games and continuous visual control tasks.

If videos are not appearing, disable ad-block!