microsoft/debug-gym
Python
Captured source
source ↗microsoft/debug-gym
Description: A Text-Based Environment for Interactive Debugging
Language: Python
License: MIT
Stars: 298
Forks: 40
Open issues: 15
Created: 2024-11-04T14:24:06Z
Pushed: 2026-06-15T02:47:42Z
Default branch: main
Fork: no
Archived: no
README:
debug-gym: A Text-Based Environment for Interactive Debugging
debug-gym is a text-based interactive debugging framework, designed for debugging Python programs.
[Technical Report] [Project Page]
The technical report corresponds to version 1.0.0. Please see CHANGELOG.md for recent updates.
1. Installation
It's recommended to create and activate a conda or virtual environment. debug-gym requires Python>=3.12:
conda create -n debug-gym python=3.12 conda activate debug-gym
Then, install debug-gym directly from PyPI:
pip install debug-gym
Alternatively, clone the repository and install locally:
git clone https://github.com/microsoft/debug-gym cd debug-gym pip install -e .
To install development dependencies, run:
pip install -e '.[dev]'
Set your API information in llm.yaml
First, create an LLM config template by running python -m debug_gym.llms.configure:
python -m debug_gym.llms.configure
> [!TIP] > Run python -m debug_gym.llms.configure --help for more options. By default, the template is created at $HOME/.config/debug_gym/llm.yaml, but you can specify any directory.
Then, edit this file with your endpoint and credentials. You can choose one of these authentication methods:
- For authenticating with an API key, provide
api_key. - For
az loginor Managed Identity authentication on Azure, removeapi_keyand includescopeinstead.
> [!WARNING] > When using open-sourced LLMs, e.g., via vLLM, you need to correctly setup HF_TOKEN required by the tokenizer. You can also provide tokenizer_kwargs in your llm.yaml entry (for example trust_remote_code: true) to control how the Hugging Face tokenizer is instantiated.
By default, debug-gym looks for the LLM config file at $HOME/.config/debug_gym/llm.yaml. You can change this behavior by exporting the environment variable LLM_CONFIG_FILE_PATH or by setting llm_config_file_path in your script config file (see [Running Baselines](#3-running-baselines)).
Overriding LLM parameters in experiment configs
You can override LLM generation parameters like temperature and max_tokens directly in your experiment config file under the llm: section. These values take precedence over the defaults in llm.yaml:
llm: name: gpt-4o temperature: 0.7 # optional, overrides llm.yaml default max_tokens: 4096 # optional, overrides llm.yaml default
---
2. System Design
The structure of debug-gym is as below:
debug_gym ├── gym │ ├── envs │ ├── terminals │ └── tools ├── agents └── llms
debug_gym.gym is a simulation environment. Given a code repository, an agent can iteratively interact with a set of tools, such as pdb, that are designed for investigate the code. Once gathered enough information, the agent can propose a patch that edits certain lines of the code. The terminal will subsequently execute the new code against a set of test cases.
debug_gym.agents are LLM-based debugging agents that use debug_gym.gym to interact with code repositories to seek necessary information and thus fix potential bugs. At an interaction step, the agent takes a text observation that describes the environment states and tool states as input, it is expected to generate a command, subsequently, the environment will provide a new text observation in response, describing the state change caused by that command.
debug_gym.llms are the different LLM backends that can be used to instantiate agents. Currently, we support OpenAI, Azure OpenAI, Hugging Face/vLLM deployments (via an OpenAI-compatible endpoint), and Anthropic. For Hugging Face models served through vLLM, the tokenizer's chat template is applied automatically to ensure token counting and truncation match the hosted model.
> [!WARNING] > debug-gym has limited support on non-Linux platforms. Interactive terminal sessions using PTY (pseudo-terminal) in Docker are not fully supported on macOS or Windows. As a result, the pdb tool (see [2.1. Environment and Tools](#21-environment-and-tools)) only works on Linux.
---
2.1. Environment and Tools
Our base environment, RepoEnv, is an interactive environment that follows the Gymnasium paradigm. Once the environment env is instantiated, one can use env.reset() to start an episode and receives initial informations. Then, one can interact with the environment using env.step(action), where action specifies one of the available tools (see below), doing so will return subsequent informations (e.g, error message, debugger stdout, etc.)
One of the core designs of debug-gym is the notion of tools. Users can dynamically import tools, or develop customized tools and utilize them in the environment. Tools are modules that augment an agent's action space, observation space, or provide additonal functionalities to the agent. Below are the set of tools we have implemented so far.
| Tool name | Description | | :-: | :----- | | bash | Run commands in a bash shell. You have access to common Linux and Python packages via pip. State is persistent across command calls within the same session. | | view | It is used to change an agent's focus to a particular source code file. This is particularly useful when dealing with a repository with multiple files. | | eval | It runs the current code repository using the provided entrypoint (e.g., pytest), and returns the terminal's output (e.g., error message). | | pdb | Interactive debugger wrapping the Python pdb tool. In addition, users can choose to maintain a set of persistent breakpoints (as in some programming IDEs), which are not reset after every eval. With such feature, a new pdb debugging session is activated automatically, with all the breakpoints restored. Note such breakpoints can be cleared by pdb commands such as cl. | | grep | Search for patterns in files within the repository. Supports both literal string matching and regular expressions. Can search in...
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10New Microsoft repo, 298 stars, solid but not major.