Universe
Captured source
source ↗Universe | OpenAI
December 5, 2016
Universe
We’re releasing Universe, a software platform for measuring and training an AI’s general intelligence across the world’s supply of games, websites and other applications.
Loading…
Share
Universe allows an AI agent to use a computer like a human does: by looking at screen pixels and operating a virtual keyboard and mouse. We must train AI systems on the full range of tasks we expect them to solve, and Universe lets us train a single agent on any task a human can complete with a computer.
Loading...
In April, we launched Gym, a toolkit for developing and comparing reinforcement learning(RL) algorithms. With Universe, any program can be turned into a Gym environment. Universe works by automatically launching the program behind a VNC remote desktop—it doesn’t need special access to program internals, source code, or bot APIs.
Today’s release consists of a thousand environments including Flash games, browser tasks, and games like slither.io and GTA V. Hundreds of these are ready for reinforcement learning, and almost all can be freely run with the universe Python library as follows:
Python
1import gym2import universe # register Universe environments into Gym34env = gym.make('flashgames.DuskDrive-v0') # any Universe environment ID here5observation_n = env.reset()67while True:8 # agent which presses the Up arrow 60 times per second9 action_n = [[('KeyEvent', 'ArrowUp', True)] for _ in observation_n]10 observation_n, reward_n, done_n, info = env.step(action_n)11 env.render()The sample code above will start your AI playing the Dusk Drive Flash game. Your AI will be given frames like the above 60 times per second. You’ll need to have Docker and universe installed.
Our goal is to develop a single AI agent that can flexibly apply its past experience on Universe environments to quickly master unfamiliar, difficult environments, which would be a major step towards general intelligence. There are many ways to help: giving us permission on your games, training agents across Universe tasks, (soon) integrating new games, or (soon) playing the games.
With support from EA, Microsoft Studios, Valve, Wolfram, and many others, we’ve already secured permission for Universe AI agents to freely access games and applications such as Portal, Fable Anniversary, World of Goo, RimWorld, Slime Rancher, Shovel Knight, SpaceChem, Wing Commander III, Command & Conquer: Red Alert 2, Syndicate, Magic Carpet, Mirror’s Edge, Sid Meier’s Alpha Centauri, and Wolfram Mathematica. We look forward to integrating these and many more.
Background
The area of artificial intelligence has seen rapid progress over the last few years. Computers can now see, hear, and translate languages with unprecedented accuracies. They are also learning to generate images, sound, and text. A reinforcement learning system, AlphaGo, defeated the world champion at Go. However, despite all of these advances, the systems we’re building still fall into the category of “Narrow AI”—they can achieve super-human performance in a specific domain, but lack the ability to do anything sensible outside of it. For instance, AlphaGo can easily defeat you at Go, but you can’t explain the rules of a different board game to it and expect it to play with you.
Systems with general problem solving ability—something akin to human common sense, allowing an agent to rapidly solve a new hard task—remain out of reach. One apparent challenge is that our agents don’t carry their experience along with them to new tasks. In a standard training regime, we initialize agents from scratch and let them twitch randomly through tens of millions of trials as they learn to repeat actions that happen to lead to rewarding outcomes. If we are to make progress towards generally intelligent agents, we must allow them to experience a wide repertoire of tasks so they can develop world knowledge and problem solving strategies that can be efficiently reused in a new task.
Universe Infrastructure
Universe exposes a wide range of environments through a common interface: the agent operates a remote desktop by observing pixels of a screen and producing keyboard and mouse commands. The environment exposes a VNC server and the universe library turns the agent into a VNC client.
Our design goal for universe was to support a single Python process driving 20 environments in parallel at 60 frames per second. Each screen buffer is 1024x768, so naively reading each frame from an external process would take 3GB/s of memory bandwidth. We wrote a batch-oriented VNC client in Go, which is loaded as a shared library in Python and incrementally updates a pair of buffers for each environment. After experimenting with many combinations of VNC servers, encodings, and undocumented protocol options, we now routinely drive dozens of environments at 60 frames per second with 100ms latency—almost all due to server-side encoding.
Here are some important properties of our current implementation:
General. An agent can use this interface (which was originally designed for humans) to interact with any existing computer program without requiring an emulator or access to the program’s internals. For instance, it can play any computer game, interact with a terminal, browse the web, design buildings in CAD software, operate a photo editing program, or edit a spreadsheet.
Familiar to humans. Since people are already well versed with the interface of pixels/keyboard/mouse, humans can easily operate any of our environments. We can use human performance as a meaningful baseline, and record human demonstrations by simply saving VNC traffic. We’ve found demonstrations to be extremely useful in initializing agents with sensible policies with behavioral cloning (i.e. use supervised learning to mimic what the human does), before switching to RL to optimize for the given reward function.
VNC as a standard. Many implementations of VNC are available online and some are packaged by default into the most common operating systems, including OSX. There are even VNC implementations in JavaScript, which allow humans to provide demonstrations without installing any new software—important for services like Amazon Mechanical Turk.
Easy to debug. We can observe our agent while it is training or being evaluated—we just attach a VNC client to the environment’s (shared) VNC desktop. We can also save the VNC traffic for…
Excerpt shown — open the source for the full document.