What does this writing signal mean?

OpenAI Writing: UCB exploration via Q-ensembles

Captured source

source ↗

openai.com/openai.com/index/ucb-exploration-via-q-ensembles

UCB exploration via Q-ensembles

Source ↗

published Jun 5, 2017seen 6dcaptured 3dhttp 200method exa

UCB exploration via Q-ensembles | OpenAI

June 5, 2017

UCB exploration via Q-ensembles

Loading…

Abstract

We show how an ensemble of Q*-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

Exploration & Games
Learning Paradigms

Authors

Richard Chen, Szymon Sidor, Pieter Abbeel, John Schulman

Scaling laws for reward model overoptimizationPublicationOct 19, 2022

Learning to play Minecraft with Video PreTrainingConclusionJun 23, 2022

Dota 2 with large scale deep reinforcement learningPublicationDec 13, 2019

UCB exploration via Q-ensembles

Abstract

Authors

Related articles