What does this fork signal mean?

Baseten forked basetenlabs/prime-rl (forked from PrimeIntellect-ai/prime-rl). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo basetenlabs/prime-rl · parent PrimeIntellect-ai/prime-rl · Fork with 1 star, trivial. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Baseten Fork: basetenlabs/prime-rl

Captured source

source ↗

GitHub/github.com/basetenlabs/prime-rl

basetenlabs/prime-rl repository metadata

Source ↗

published Feb 15, 2026seen Jun 5captured Jun 11http 200method plain

basetenlabs/prime-rl

Description: Async RL Training at Scale

Language: Python

License: Apache-2.0

Stars: 1

Forks: 0

Open issues: 18

Created: 2026-02-15T01:09:10Z

Pushed: 2026-06-04T21:10:19Z

Default branch: main

Fork: yes

Parent repository: PrimeIntellect-ai/prime-rl

Archived: no

README:

---

PRIME-RL: Async RL Training at Scale

---

Overview

PRIME-RL is a framework for large-scale asynchronous reinforcement learning. It is designed to be easy-to-use and hackable, yet capable of scaling to 1000+ GPUs. Beyond that, here is why we think you might like it:

1. Integrates natively with `verifiers` environments via the Environments Hub 2. Supports end-to-end post-training, including SFT and RL training and evals 3. Multi-node deployment with FSDP2 training and vLLM inference backend 4. Designed for asynchronous agentic RL training at scale 5. Hackable, modular and extensible by nature

Setup

> *We develop and test on NVIDIA RTX 3090/4090/5090, A100, H100, H200, and B200. If your setup fails, please create an issue.*

Prerequisites

Currently, you need at least one NVIDIA GPU to use PRIME-RL. If you don't already have access to one, we recommend our compute platform for everything from renting on-demand single GPUs for developing, debugging and small ablations, to reserving 1000+ GPU clusters for production-scale training.

Quick Setup

Set up PRIME-RL in a single command.

curl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/prime-rl/main/scripts/install.sh | bash

Manual Setup

1. Clone the repository

git clone https://github.com/PrimeIntellect-ai/prime-rl.git
cd prime-rl

2. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

3. Install dependencies from the lock file

uv sync --all-extras

3.1. Optional: Install Flash Attention 3 (on Hopper GPUs only, for flash_attention_3 attention backend)

> *NOTE*: This step will take a while, as it builds the Flash Attention 3 extension from source, as it has no wheels prebuilt. > *NOTE*: After this step, you can't run uv sync --all-extras or uv run as it will uninstall the package, you can avoid it by running uv sync --inexact or uv run --no-sync

uv pip install "flash-attn-3 @ git+https://github.com/Dao-AILab/flash-attention.git@main#subdirectory=hopper" --no-build-isolation

Validate your environment setup

1. Check that the environment uses Python 3.12

uv run python -V

2. Check that flash-attn is installed

uv run python -c "import flash_attn"

3. Check that you can run SFT trainer (*this requires 1 GPU*)

uv run sft @ configs/debug/sft/train.toml

4. Check that you can run the RL trainer (*this requires 1 GPU*)

uv run trainer @ configs/debug/rl/train.toml

5. Check that you can run the inference server (*this requires 1 GPU*)

uv run inference @ configs/debug/infer.toml

*Keep the inference server running in the background for the next steps.*

5.1. Check that you can run the orchestrator against the inference server

uv run orchestrator @ configs/debug/orch.toml

5.2. Check that you can run evals against the inference server

uv run eval @ configs/debug/eval.toml

Additional Setup

1. If you want to log your runs to W&B, log in

uv run wandb login
# Or set `export WANDB_API_KEY=...`

2. If you require gated/ private models or datasets from HuggingFace, log in

uv run hf auth login
# Or set `export HF_TOKEN=...`

Training Examples

We provide end-to-end training examples in the [examples](examples) directory to highlight features of the framework and guide you through the process of training your own models. 1. [Reverse Text](examples/reverse_text/README.md): Train Qwen3-0.6B to reverse a small chunk of text. Demonstrates tiny-scale single-turn SFT and RL training. Can be trained on a single consumer GPU in a few minutes, and is ideal for getting started. 2. [Wordle](examples/wordle/README.md): Train Qwen3-1.7B to play Wordle. A fun example of multi-turn SFT and RL training. Can be trained on a 2-4 H100 GPUs in a few hours. Ideal for exploring the multi-turn training capabilities of the framework. 3. [Alphabet Sort](examples/alphabet_sort/README.md): Train Qwen3-4B-Instruct-2507 to sort names alphabetically. Demonstrates multi-turn RL training via LoRA without SFT warmup. Can be trained on a single H100 GPU in just over an hour. Ideal for exploring LoRA-based training. 4. [Wiki Search](examples/wiki_search/README.md): Train Qwen3-4B-Instruct-2507 to answer trivia questions by searching through a Wikipedia. Demonstrates multi-turn with web search tool use.

4. *More to come...*

Docs

Check out the [docs](docs) directory for in-depth guides on how to use PRIME-RL.

[Entrypoints](docs/entrypoints.md) - Overview of the main components (orchestrator, trainer, inference) and how to run SFT, RL, and evals
[Configs](docs/configs.md) - Configuration system using TOML files, CLI arguments, and environment variables
[Environments](docs/environments.md) - Installing and using verifiers environments from the Environments Hub
[Async Training](docs/async.md) - Understanding asynchronous off-policy training and step semantics
[Logging](docs/logging.md) - Logging with loguru, torchrun, and Weights & Biases
[Checkpointing](docs/checkpointing.md) - Saving and resuming training from checkpoints
[Benchmarking](docs/benchmarking.md) - Performance benchmarking and throughput measurement
[Deployment](docs/deployment.md) - Training deployment on single-GPU, multi-GPU, and multi-node clusters
[On-Policy Distillation](docs/on_policy_distillation.md) - Self-distillation with EMA teacher and top-K tail KL divergence
[Bring Your Own Algorithms](docs/bring-your-own-algorithms.md) - Custom loss functions, advantage functions, and reward shaping

-...

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Fork with 1 star, trivial