google-deepmind/dqn_zoo
Python
Captured source
source ↗google-deepmind/dqn_zoo
Description: DQN Zoo is a collection of reference implementations of reinforcement learning agents developed at DeepMind based on the Deep Q-Network (DQN) agent.
Language: Python
License: Apache-2.0
Stars: 504
Forks: 83
Open issues: 2
Created: 2020-09-22T11:57:54Z
Pushed: 2026-05-04T20:34:05Z
Default branch: master
Fork: no
Archived: no
README:
DQN Zoo
*DQN Zoo* is a collection of reference implementations of reinforcement learning agents developed at DeepMind based on the Deep Q-Network (DQN) agent.
It aims to be research-friendly, self-contained and readable. Each agent is implemented using JAX, Haiku and RLax, and is a best-effort replication of the corresponding paper implementation. Each agent reproduces results on the standard set of 57 Atari games, on average.
| Directory | Paper | | ------------- | -------------------------------------------------------------------------------------------------------- | | dqn | [Human Level Control Through Deep Reinforcement Learning](http\://www.nature.com/articles/nature14236) | | double_q | [Deep Reinforcement Learning with Double Q-learning](http\://arxiv.org/abs/1509.06461) | | prioritized | [Prioritized Experience Replay](http\://arxiv.org/abs/1511.05952) | | c51 | [A Distributional Perspective on Reinforcement Learning](http\://arxiv.org/abs/1707.06887) | | qrdqn | [Distributional Reinforcement Learning with Quantile Regression](http\://arxiv.org/abs/1710.10044) | | rainbow | [Rainbow: Combining Improvements in Deep Reinforcement Learning](http\://arxiv.org/abs/1710.02298) | | iqn | [Implicit Quantile Networks for Distributional Reinforcement Learning](http\://arxiv.org/abs/1806.06923) |
Plot of median human-normalized score over all 57 Atari games for each agent:

Quick start
NOTE: Only Python 3.9 and above and Linux is supported.
Follow these steps to quickly clone the DQN Zoo repository, install all required dependencies and start running DQN. Prerequisites for these steps are a NVIDIA GPU with recent CUDA drivers.
1. Install Docker version 19.03 or later (for the --gpus flag). 1. Install NVIDIA Container Toolkit. 1. Enable sudoless docker.
1. Verify the previous steps were successful e.g. by running: \ docker run --gpus all --rm nvidia/cuda:11.1.1-base nvidia-smi
1. Download the script [run.sh](run.sh). This automatically downloads the Atari ROMs from http://www.atarimania.com. The ROMs are available here for free but make sure the respective license covers your particular use case.
Running this script will:
1. Clone the DQN Zoo repository. 1. Build a Docker image with all necessary dependencies and run unit tests. 1. Start a short run of DQN on Pong in a GPU-accelerated container.
NOTE: run.sh, Dockerfile and docker_requirements.txt together provide a self-contained example of the dependencies and commands needed to run an agent in DQN Zoo. Using Docker is not a requirement and if Dockerfile is not used then the list of dependencies to install may have to be adapted depending on your environment. Also it is not a hard requirement to run on the GPU. Agents can be run on the CPU by specifying the flag --jax_platform_name=cpu.
Goals
- Serve as a collection of reference implementations of DQN-based agents
developed at DeepMind.
- Reproduce results reported in papers, on average.
- Implement agents purely in Python, using JAX, Haiku and RLax.
- Have minimal dependencies.
- Be easy to read.
- Be easy to modify and customize after forking.
Non-goals
- Be a library or framework (these agents are intended to be forked for
research).
- Be flexible, general and support multiple use cases (at odds with
understandability).
- Support many environments (users can easily add new ones).
- Include every DQN variant that exists.
- Incorporate many cool libraries (harder to read, easy for the user to do
this after forking, different users prefer different libraries, less self-contained).
- Optimize speed and efficiency at the cost of readability or matching
algorithmic details in the papers (no C++, keep to a single stream of experience).
Code structure
- Each directory contains a published DQN variant configured to run on Atari.
agent.pyin each agent directory contains an agent class that includes
reset(), step(), get_state(), set_state() methods.
parts.pycontains functions and classes used by many of the agents
including classes for accumulating statistics and the main training and evaluation loop run_loop().
replay.pycontains functions and classes relating to experience replay.networks.pycontains Haiku networks used by the agents.processors.pycontains components for standard Atari preprocessing.
Implementation notes
Generally we went with a flatter approach for easier code comprehension. Excessive nesting, indirection and generalization have been avoided, but not to the extreme of having a single file per agent. This has resulted in some degree of code duplication, but this is less of a maintenance issue as the code base is intended to be relatively static.
Some implementation details:
- The main training and evaluation loop
parts.run_loop()is implemented as a
generator to decouple it from other concerns like logging statistics and checkpointing.
- We adopted the pattern of returning a new JAX PRNG key from jitted
functions. This allows for splitting keys inside jitted functions which is currently more efficient than splitting outside and passing a key in.
- Agent functions to be jitted are defined inline in the agent class
__init__() instead of as decorated class methods. This emphasizes such functions should be free of side-effects; class methods are generally not pure as they often alter the class instance.
parts.NullCheckpointis a placeholder for users to optionally plug in a
checkpointing library appropriate for the file system they are using. This would allow resuming an interrupted training run.
- The preprocessing and action repeat logic lives inside each agent. Doing
this instead of taking the common approach of environment…
Excerpt shown — open the source for the full document.