What does this repo signal mean?

Amazon (Nova) published amazon-science/SAGE (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo amazon-science/SAGE · language Python · Low stars, routine research repo. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Amazon (Nova) Repo: amazon-science/SAGE

Captured source

source ↗

GitHub/github.com/amazon-science/SAGE

amazon-science/SAGE repository metadata

Source ↗

published Dec 23, 2025seen Jun 5captured Jun 11http 200method plain

amazon-science/SAGE

Language: Python

License: NOASSERTION

Stars: 18

Forks: 5

Open issues: 23

Created: 2025-12-23T21:43:53Z

Pushed: 2026-06-05T23:32:05Z

Default branch: main

Fork: no

Archived: no

README:

Skill Augmented GRPO for self-Evolution

This repository contains the official implementation of SAGE (Skill Augmented GRPO for self-Evolution), introduced in our paper Reinforcement Learning for Self-Improving Agent with Skill Library.

📖 Algorithm Preview

The figure below illustrates the Skill Library Agent and Sequential Rollout with Skill-integrated Reward, a core component of SAGE on which GRPO is applied. ![Algorithm Preview Image](algorithm_preview.png)

⚙️ Building Modified Dependencies

SAGE depends on two modified open-source repositories: appworld and LLaMA-Factory. The modifications are provided in the patches/appworld and patches/LLaMA-Factory directories. Please follow the instructions below to build these modified dependencies.

1. appworld Setup

# Install git lfs (appworld required)
apt-get update
apt-get install -y --no-install-recommends git-lfs

# Clone repositories
git clone https://github.com/StonyBrookNLP/appworld.git
git -C appworld checkout 57253350edf00922f370a8c7dbe94f1a4d3ee456

# Apply modifications
cp -r patches/appworld/* appworld/

2. LLaMA-Factory Setup

# Clone repositories
git clone https://github.com/hiyouga/LLaMA-Factory.git
git -C LLaMA-Factory checkout 2c6aded5d4f4ff23aa1887d16972afb3c2543ac3

# Apply modifications
cp -r patches/appworld/* appworld/

📂 Repository Structure

After building the modified dependencies, the final repository has the following strucrue:

`sage/` - Main SAGE algorithm implementation (requires AppWorld installation).

`appworld/` - AppWorld environment setup and Skill Library Agent evaluation. → Evaluation code located at: appworld/experiments/code/skill_library_agent/

`LLaMA-Factory/` - Code for Supervised Fine-Tuning (SFT).

🚀 Running SAGE

1. SAGE Environment Setup

Install the python environment for SAGE:

# Create the virtual python environment
cd sage
uv venv appworld_sage --python 3.11
source appworld_sage/bin/activate

# Install dependencies for sage
uv pip install --no-deps -r requirements.txt
uv pip install -e .

# Install flash attention 2
cd ..
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
uv pip install --no-build-isolation .

# Install appworld
cd ../appworld
uv pip install .
# Also install appworld_experiment
cd experiments
uv pip install .
# Fully install appworld
cd ..
appworld install
# Download appworld data
cd ../sage
appworld download data

2. Ray Launch for SAGE

Before running the sage script, launch ray on each compute node:

# Launch ray on the head node
ray start --head

# Launch ray on other nodes
ray start --address='HEAD_NODE_IP'

2. Launch SAGE Training

Start reinforcement learning by running the script after specifying the Ray head node IP and the SFT model path in run_sage_2nodes.sh. The training requires at least 2xNVIDIA H100 8-GPU nodes.

sh run_sage_2nodes.sh

📊 AppWorld Evaluation

1. AppWorld Setup

For AppWorld evaluation, you need to first deploy the model for evaluation with vllm. Since the vllm within appworld_sage environment is the old version, a new environment with updated version of vllm is required for optimal performance:

uv venv appworld_eval --python 3.12
source appworld_eval/bin/activate
uv pip install vllm --torch-backend=auto
cd appworld
uv pip install -e .
cd experiments
uv pip install -e .
cd ..
appworld install
appworld download data

2. Deploy vllm Server

Deploy the model for evaluation with vllm, which requires 1xNVIDIA A100 8-GPU node. Remember to replace the vllm server url in appworld/experiments/code/skill_library_agent/lite_llm_generator.py.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m vllm.entrypoints.openai.api_server --model MODEL_PATH --served-model-name sage --tensor-parallel-size 8 --dtype auto --port 2638 &

3. Launch AppWorld Evaluation

Run the following commands to perform the appworld evaluation and get results:

# Start evaluation
appworld run sage_test_normal --config-name sage_test_normal > sage_test_normal.log
# Obtain results
appworld evaluate sage_test_normal

🛠️ Supervised Fine-tuning

To reach the optimal performance on AppWorld, Supervised Fine-tuning (SFT) with expert experience dataset generation is still required. Thus, here we also provide the data generation pipeline and SFT code for reference.

1. Generate Expert Experience Dataset

The expert experience dataset in our paper is generated with Claude 3.5 Sonnet V2. The environment appworld_eval is still used for dataset generation. Before running the data generation script, you should first depoly the Claude model with litellm and replace the server url in appworld/experiments/code/skill_library_agent/lite_llm_generator.py.

source appworld_eval/bin/activate
cd appworld
litellm --config litellm_config.yaml # Set your aws api key in litellm_config.yaml
bash expert_dataset_generation_claude.sh

2. Launch SFT

SFT is performed under LLaMA-Factory repository. The training requires at least 4xNVIDIA H100 8-GPU nodes.

# Install the environment for LLaMA-Factory
cd LLaMA-Factory
uv venv llama-factory --python 3.11
source llama-factory/bin/activate
uv pip install -e ".[torch,metrics]" --no-build-isolation

# Lauch the SFT (with slurm script)
sbatch run_sft.sh

Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

License

This library is licensed under the CC BY-NC 4.0 License.

Notification

This code is being released solely for academic and scientific reproducibility purposes, in support of the methods and findings described in the associated publication. Pull requests are not being accepted in order to maintain the code exactly as it was used in the paper, but interested parties are encouraged to open an issue requesting open source community development.

Citation

Please cite our paper if you find this repository helpful:

@misc{wang2025reinforcementlearningselfimprovingagent,
title={Reinforcement Learning for Self-Improving Agent with Skill...

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low stars, routine research repo