google-deepmind/dqn_zoo

Python

Open original ↗

Captured source

source ↗
published Sep 22, 2020seen 5dcaptured 8hhttp 200method plain

google-deepmind/dqn_zoo

Description: DQN Zoo is a collection of reference implementations of reinforcement learning agents developed at DeepMind based on the Deep Q-Network (DQN) agent.

Language: Python

License: Apache-2.0

Stars: 504

Forks: 83

Open issues: 2

Created: 2020-09-22T11:57:54Z

Pushed: 2026-05-04T20:34:05Z

Default branch: master

Fork: no

Archived: no

README:

DQN Zoo

*DQN Zoo* is a collection of reference implementations of reinforcement learning agents developed at DeepMind based on the Deep Q-Network (DQN) agent.

It aims to be research-friendly, self-contained and readable. Each agent is implemented using JAX, Haiku and RLax, and is a best-effort replication of the corresponding paper implementation. Each agent reproduces results on the standard set of 57 Atari games, on average.

| Directory | Paper | | ------------- | -------------------------------------------------------------------------------------------------------- | | dqn | [Human Level Control Through Deep Reinforcement Learning](http\://www.nature.com/articles/nature14236) | | double_q | [Deep Reinforcement Learning with Double Q-learning](http\://arxiv.org/abs/1509.06461) | | prioritized | [Prioritized Experience Replay](http\://arxiv.org/abs/1511.05952) | | c51 | [A Distributional Perspective on Reinforcement Learning](http\://arxiv.org/abs/1707.06887) | | qrdqn | [Distributional Reinforcement Learning with Quantile Regression](http\://arxiv.org/abs/1710.10044) | | rainbow | [Rainbow: Combining Improvements in Deep Reinforcement Learning](http\://arxiv.org/abs/1710.02298) | | iqn | [Implicit Quantile Networks for Distributional Reinforcement Learning](http\://arxiv.org/abs/1806.06923) |

Plot of median human-normalized score over all 57 Atari games for each agent:

![Plot summary](plot_atari_summary.svg)

Quick start

NOTE: Only Python 3.9 and above and Linux is supported.

Follow these steps to quickly clone the DQN Zoo repository, install all required dependencies and start running DQN. Prerequisites for these steps are a NVIDIA GPU with recent CUDA drivers.

1. Install Docker version 19.03 or later (for the --gpus flag). 1. Install NVIDIA Container Toolkit. 1. Enable sudoless docker.

1. Verify the previous steps were successful e.g. by running: \ docker run --gpus all --rm nvidia/cuda:11.1.1-base nvidia-smi

1. Download the script [run.sh](run.sh). This automatically downloads the Atari ROMs from http://www.atarimania.com. The ROMs are available here for free but make sure the respective license covers your particular use case.

Running this script will:

1. Clone the DQN Zoo repository.
1. Build a Docker image with all necessary dependencies and run unit tests.
1. Start a short run of DQN on Pong in a GPU-accelerated container.

NOTE: run.sh, Dockerfile and docker_requirements.txt together provide a self-contained example of the dependencies and commands needed to run an agent in DQN Zoo. Using Docker is not a requirement and if Dockerfile is not used then the list of dependencies to install may have to be adapted depending on your environment. Also it is not a hard requirement to run on the GPU. Agents can be run on the CPU by specifying the flag --jax_platform_name=cpu.

Goals

  • Serve as a collection of reference implementations of DQN-based agents

developed at DeepMind.

  • Reproduce results reported in papers, on average.
  • Implement agents purely in Python, using JAX, Haiku and RLax.
  • Have minimal dependencies.
  • Be easy to read.
  • Be easy to modify and customize after forking.

Non-goals

  • Be a library or framework (these agents are intended to be forked for

research).

  • Be flexible, general and support multiple use cases (at odds with

understandability).

  • Support many environments (users can easily add new ones).
  • Include every DQN variant that exists.
  • Incorporate many cool libraries (harder to read, easy for the user to do

this after forking, different users prefer different libraries, less self-contained).

  • Optimize speed and efficiency at the cost of readability or matching

algorithmic details in the papers (no C++, keep to a single stream of experience).

Code structure

  • Each directory contains a published DQN variant configured to run on Atari.
  • agent.py in each agent directory contains an agent class that includes

reset(), step(), get_state(), set_state() methods.

  • parts.py contains functions and classes used by many of the agents

including classes for accumulating statistics and the main training and evaluation loop run_loop().

  • replay.py contains functions and classes relating to experience replay.
  • networks.py contains Haiku networks used by the agents.
  • processors.py contains components for standard Atari preprocessing.

Implementation notes

Generally we went with a flatter approach for easier code comprehension. Excessive nesting, indirection and generalization have been avoided, but not to the extreme of having a single file per agent. This has resulted in some degree of code duplication, but this is less of a maintenance issue as the code base is intended to be relatively static.

Some implementation details:

  • The main training and evaluation loop parts.run_loop() is implemented as a

generator to decouple it from other concerns like logging statistics and checkpointing.

  • We adopted the pattern of returning a new JAX PRNG key from jitted

functions. This allows for splitting keys inside jitted functions which is currently more efficient than splitting outside and passing a key in.

  • Agent functions to be jitted are defined inline in the agent class

__init__() instead of as decorated class methods. This emphasizes such functions should be free of side-effects; class methods are generally not pure as they often alter the class instance.

  • parts.NullCheckpoint is a placeholder for users to optionally plug in a

checkpointing library appropriate for the file system they are using. This would allow resuming an interrupted training run.

  • The preprocessing and action repeat logic lives inside each agent. Doing

this instead of taking the common approach of environment…

Excerpt shown — open the source for the full document.