What does this writing signal mean?

OpenAI Writing: Dota 2 with large scale deep reinforcement learning

Captured source

source ↗

openai.com/openai.com/index/dota-2-with-large-scale-deep-reinforcement-learning

Dota 2 with large scale deep reinforcement learning

Source ↗

published Dec 13, 2019seen 6dcaptured 2dhttp 200method exa

Dota 2 with large scale deep reinforcement learning | OpenAI

December 13, 2019

Publication

Dota 2 with large scale deep reinforcement learning

Read paper

Loading…

Abstract

On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

OpenAI Five
Exploration & Games
Learning Paradigms
Software & Engineering

Authors

Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Christopher Hesse, Rafał Józefowicz, Scott Gray, Catherine Olsson

Authors

Jakub Pachocki, Michael Petrov, Henrique Pondé, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang

View all

Scaling laws for reward model overoptimizationPublicationOct 19, 2022

Learning to play Minecraft with Video PreTrainingConclusionJun 23, 2022

Techniques for training large neural networksPublicationJun 9, 2022

Dota 2 with large scale deep reinforcement learning

Abstract

Authors

Authors

Related articles