Transfer Learning via Diverse Policies in Value-Relevant Features
- Luketina, Jelena*; Smith, Matthew; Igl, Maximilian; Whiteson, Shimon
- Accepted abstract
[Join poster session]
Poster session from 15:00 to 16:00 EAT and from 20:45 to 21:45 EAT
Obtain the zoom password from ICLR
Directly optimizing for reward is impractical in complex, sparse reward environments where agents are unlikely to encounter any reward unless they follow near optimal policies. Various information-theoretic methods (Gregor et al. (2016); Eysenbach et al. (2019); Sharma et al. (2020)) remedy this problem by learning a set of diverse, identifiable behaviors (or skills) independent of reward from the environment. However, being entirely task-agnostic, these methods often discover sets of trivially diverse instead of useful behaviours, and often rely on heuristics to define the space in which skills need to be diverse. To overcome this limitation, we propose a transfer learning approach that leverages learning of features relevant to value prediction to specify the space in which skills should be diverse. Concretely, our method employs three phases. First, we utilize standard reinforcement learning to learn a good representation by optimizing for reward on a source task. That representation is then employed in a phase of unsupervised, information-theoretic optimization of policies, which constructs skills that are both diverse, and able to manipulate reward relevant features. Lastly, these skills are used in the final phase to inform exploration on a target task. Our preliminary results demonstrate how this approach leads to the emergence of task-relevant skills.