What does this repo signal mean?

ByteDance (Doubao/Seed) published ByteDance-Seed/Agent-R (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo ByteDance-Seed/Agent-R · language Python · New repo with modest traction, routine release. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Infrastructure in the data-business radar.

ByteDance (Doubao/Seed) Repo: ByteDance-Seed/Agent-R

Captured source

source ↗

GitHub/github.com/ByteDance-Seed/Agent-R

ByteDance-Seed/Agent-R repository metadata

Source ↗

published Jan 15, 2025seen 5dcaptured 8hhttp 200method plain

ByteDance-Seed/Agent-R

Description: Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"

Language: Python

License: Apache-2.0

Stars: 172

Forks: 20

Open issues: 0

Created: 2025-01-15T10:51:25Z

Pushed: 2025-10-20T02:30:17Z

Default branch: main

Fork: no

Archived: yes

README:

You can get to know us better through the following channels👇

!seed logo

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Updates

+ [2025.01.21] We release Agent-R. + 🔥 The paper is available at Agent-R Paper. + 🔥 The code is available at Agent-R Code.

Introduction

We propose an iterative self-training framework, Agent-R, that enables language Agent to Reflect on the fly. Unlike traditional methods that reward or penalize actions solely based on correctness, our approach leverages Monte Carlo Tree Search (MCTS) to construct training samples that recover correct trajectories from erroneous ones. A key challenge of agent task reflection lies in the necessity for timely revision rather than waiting until the end of a rollout to revise errors. To address this, we introduce a model-guided critique construction mechanism: the actor model identifies the first error step (within its current capability) in a failed trajectory. Starting from it, we splice it with the adjacent correct path, which shares the same parent node in the tree. To further explore the scalability of this self-improvement paradigm, we investigate iterative refinement of both error correction capabilities and dataset construction.

More details are in the paper.

Getting started

mcts_collection.py: Implements the MCTS-based trajectory generation framework.
path_collection.py: Generates revision trajectories based on model-guided evaluation.
eval.py: Script for evaluating the agent's performance across specified tasks.

Installation

1. Clone the Repository

git clone https://github.com/bytedance/Agent-R.git
cd Agent-R/

2. Create a Virtual Environment

It is recommended to use a virtual environment to manage dependencies:

Using `conda`

conda create --name agent-r python=3.11 -y
conda activate agent-r

3. Install Dependencies

Install the required Python packages:

pip install -r requirements.txt
cd AgentGym/agentenv
pip install -e .

Usage

Environment Setup

Ensure the TASK environment variable is set to one of the supported tasks:

webshop
sciworld
textcraft

Example:

export TASK=webshop

Install AgentGym and launch the environment server following the instructions below:

WebShop

# Install dependencies
cd AgentGym/agentenv-webshop
conda env create -n agentenv-webshop -f environment.yml
conda activate agentenv-webshop
bash ./setup.sh

# Launch server
webshop --host 0.0.0.0 --port 36001

SciWorld

# Install dependencies
cd AgentGym/agentenv-sciworld
conda create --name agentenv-sciworld python=3.8 -y
conda activate agentenv-sciworld
pip install -e .

# Launch server
sciworld --host 0.0.0.0 --port 36001

TextCraft

# Install dependencies
cd AgentGym/agentenv-textcraft
conda create --name agentenv-textcraft python=3.9 -y
conda activate agentenv-textcraft
pip install -e .

# Launch server
textcraft --host 0.0.0.0 --port 36001

Running MCTS-Based Trajectory Generation

To generate trajectories using MCTS, set the necessary environment variables and run the following command:

export OPENAI_API_KEY=YOUR_OPENAI_API_KEY # Replace with your OpenAI API key If you use OpenAI models
export MAX_DEPTH=YOUR_MAX_DEPTH # Replace with the maximum depth of the MCTS tree
export ITERA=YOUR_ITERA # Replace with the number of iterations for MCTS
export N_GEN=YOUR_N_GEN # Replace with the number of actions to generate per iteration
export MODEL_NAME=YOUR_MODEL_NAME # Replace with the name of the model to use
export MODEL_DIR=YOUR_MODEL_DIR # Replace with the directory where the model is stored
export TASK=YOUR_TASK # Replace with the task name (webshop, sciworld, textcraft)
export TEMP=YOUR_TEMP # Replace with the temperature for the model
export MAX_TOKEN_LENGTH=YOUR_MAX_TOKEN_LENGTH # Replace with the maximum token length for the model
python3 mcts_collection.py \
--env_server_base "" \
--model_name "" \
--min \
--max

Replace ` with the name of the model to use (e.g., Llama-3.1-8B-Instruct`).
Use --min and --max to specify the range of task indices.

Revision Trajectory Generation

To process paths and generate revised trajectories:

export OPENAI_API_KEY=YOUR_OPENAI_API_KEY
export MAX_DEPTH=YOUR_MAX_DEPTH
export ITERA=YOUR_ITERA
export N_GEN=YOUR_N_GEN
export MODEL_NAME=YOUR_MODEL_NAME
export MODEL_DIR=YOUR_MODEL_DIR
export TASK=YOUR_TASK
export TEMP=YOUR_TEMP
export MAX_TOKEN_LENGTH=YOUR_MAX_TOKEN_LENGTH # Replace with the maximum token length for the model
export ALPHA=YOUR_ALPHA # the lower bound for high-quality trajectories
export BETA=YOUR_BETA # The distinguishable gap

python3 path_collection.py \
--input_dir "" \
--output_dir "" \
--data_type centric \
--revise 1

Set --revise to 1 to enable on-policy revision.
Specify directories for input and output data.

Training with Revision Trajectory Using Xtuner

To train Llama-3.1-8B-Instruct with revision trajectory, we utilize Xtuner, an efficient tool for distributed training. Follow the steps below to set up and train the model.

1. Set Up the Environment

First, create and activate the virtual environment for Xtuner:

conda create --name xtuner-env python=3.10 -y
conda activate xtuner-env
pip install -U 'xtuner[deepspeed]'

2. Configure the Training Script

Before starting the training process, modify the llama3_8b_instruct_full_alpaca_e3_copy.py script with the necessary settings, as outlined in the Xtuner tutorial.

3. Start Training

Once the script is configured, run the training using Xtuner by executing the following command:

conda activate xtuner-env
cd xtuner_config/
NPROC_PER_NODE=${ARNOLD_WORKER_GPU} NNODES=${ARNOLD_WORKER_NUM} PORT=${ARNOLD_WORKER_0_PORT}…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

New repo with modest traction, routine release