Fast adaptation with importance weighted priors
- Galashov, Alexandre*; Sygnowski, Jakub; Desjardins, Guillaume; Humplik, Jan; Hasenclever, Leonard; Heess, Nicolas; Teh, Yee Whye
- Accepted abstract
[Join poster session]
Poster session from 15:00 to 16:00 EAT and from 20:45 to 21:45 EAT
Obtain the zoom password from ICLR
The ability to exploit prior experience to solve novel problems rapidly is a hallmark of biological learning systems and of great practical importance for artificial ones. In the reinforcement learning literature much recent work has focused on the problem of optimizing the learning process itself. In this paper we study a complementary approach which is conceptually simple, general,and modular and built on top of recent improvements in off-policy learning. The framework combines robust off-policy learning with a behavior prior, or default behavior that constrains the space of solutions and serves as a bias for exploration,as well as a learned representation for the value function, both of which are easily learned from a number of training tasks. We find that this simple combination performs surprisingly well on a number of challenging meta-learning problems,both in terms of initial training time as well as during adaptation to new tasks, often matching or exceeding the performance of more specialized algorithms.