WritingOpenAIOpenAIpublished Jun 5, 2017seen 6d

UCB exploration via Q-ensembles

Open original ↗

Captured source

source ↗
published Jun 5, 2017seen 6dcaptured 3dhttp 200method exa

UCB exploration via Q-ensembles | OpenAI

June 5, 2017

UCB exploration via Q-ensembles

Loading…

Share

Abstract

We show how an ensemble of Q*-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

  • Exploration & Games
  • Learning Paradigms

Authors

Richard Chen, Szymon Sidor, Pieter Abbeel, John Schulman

Related articles

Scaling laws for reward model overoptimizationPublicationOct 19, 2022

Learning to play Minecraft with Video PreTrainingConclusionJun 23, 2022

Dota 2 with large scale deep reinforcement learningPublicationDec 13, 2019