UCB exploration via Q-ensembles
Captured source
source ↗published Jun 5, 2017seen 6dcaptured 3dhttp 200method exa
UCB exploration via Q-ensembles | OpenAI
June 5, 2017
UCB exploration via Q-ensembles
Loading…
Share
Abstract
We show how an ensemble of Q*-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.
- Exploration & Games
- Learning Paradigms
Authors
Richard Chen, Szymon Sidor, Pieter Abbeel, John Schulman
Related articles
Scaling laws for reward model overoptimizationPublicationOct 19, 2022
Learning to play Minecraft with Video PreTrainingConclusionJun 23, 2022
Dota 2 with large scale deep reinforcement learningPublicationDec 13, 2019