ITER: Iterated Relearning for Improved Generalization in Reinforcement Learning
- Igl, Maximilian*; Boehmer, Wendelin; Whiteson, Shimon
- Accepted abstract
[Join poster session]
Poster session from 15:00 to 16:00 EAT and from 20:45 to 21:45 EAT
Obtain the zoom password from ICLR
Non-stationarity is unavoidable in deep reinforcement learning as the distribution of observed states and estimated state-values changes over time. We provide empirical evidence that this distributional drift negatively affects the generalization capabilities of neural networks used to represent policy and value function. We explain this observation with the Information Bottleneck theory: the input compression of a neural network weakens when the data distribution changes, resulting in worse generalization bounds. Consequently, if the flexibility of the network to adapt to a shifting data distribution is limited, a new network trained from scratch on the current data should generalize better. Based on this insight, we propose ITER, a simple method that repeatedly rejuvenates the network, leading to stronger compressed latent representations and improved generalization performance. We evaluate ITER against strong baselines on the Multiroom benchmark and show that it outperforms previous state of the art results by a large margin.