NousResearch/RL
forked from NVIDIA-NeMo/RL
Captured source
source ↗NousResearch/RL
Description: Scalable toolkit for efficient model reinforcement
License: Apache-2.0
Stars: 11
Forks: 2
Open issues: 1
Created: 2026-04-03T14:35:01Z
Pushed: 2026-06-04T18:36:40Z
Default branch: main
Fork: yes
Parent repository: NVIDIA-NeMo/RL
Archived: no
README:
📣 News
- [03/12/2026] GDPO Support
- Enabling Group reward-Decoupled Normalization Policy Optimization (GDPO) for multi-reward RL training is now supported.
- Example: [gdpo_math_1B.yaml](/examples/configs/gdpo_math_1B.yaml)
- Support Async RL training
- WIP: Nemo-gym compatibility
- [03/11/2026] Nemotron-3-Super was post-trained with NeMo-RL! Follow this guide to reproduce the full RL training recipe.
- [02/04/2026] LoRA Support
- LoRA SFT is supported on both DTensor and Megatron Core backends.
- LoRA GRPO is supported on both DTensor and Megatron Core backends.
- LoRA DPO is supported on both DTensor and Megatron Core backends.
- Nano v3 LoRA recipes:
- [sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml](examples/configs/recipes/llm/sft-nanov3-30BA3B-2n8g-fsdp2-lora.yaml)
- [grpo-nanov3-30BA3B-2n8g-fsdp2-lora.yaml](examples/configs/recipes/llm/grpo-nanov3-30BA3B-2n8g-fsdp2-lora.yaml)
- [grpo-nanov3-30BA3B-2n8g-megatron-lora.yaml](examples/configs/recipes/llm/grpo-nanov3-30BA3B-2n8g-megatron-lora.yaml)
- [01/30/2026] Release v0.5.0!
- Both linux/amd64 and linux/arm64 Docker containers are available on NGC nvcr.io/nvidia/nemo-rl:v0.5.0.
- NeMo-Gym + NeMo-RL support
- 📊 View the release run metrics on Google Colab to get a head start on your experimentation.
Previous News
- [12/15/2025] NeMo-RL is the framework that trained NVIDIA-NeMotron-3-Nano-30B-A3B-FP8! [This guide](docs/guides/nemotron-3-nano.md) provides reproducible instructions for the post-training process.
- [10/10/2025] DAPO Algorithm Support
NeMo RL now supports Decoupled Clip and Dynamic Sampling Policy Optimization (DAPO) algorithm that extends GRPO with Clip-Higher, Dynamic Sampling, Token-Level Policy Gradient Loss, and Overlong Reward Shaping for more stable and efficient RL training. See the [DAPO guide](docs/guides/dapo.md) for more details.
- [9/27/2025] FP8 Quantization in NeMo RL
- [9/25/2025] On-policy Distillation
- Student generates on-policy sequences and aligns logits to a larger teacher via KL, achieving near-larger-model quality at lower cost than RL. See [On-policy Distillation](#on-policy-distillation).
- [12/1/2025] Release v0.4.0!
- First release with official NGC Container nvcr.io/nvidia/nemo-rl:v0.4.0.
- 📊 View the release run metrics on Google Colab to get a head start on your experimentation.
- [9/30/2025] Accelerated RL on GCP with NeMo RL!
- [8/15/2025] NeMo-RL: Journey of Optimizing Weight Transfer in Large MoE Models by 10x
- [7/31/2025] NeMo-RL V0.3: Scalable and Performant Post-training with Nemo-RL via Megatron-Core
- [7/25/2025] Release v0.3.0!
- 📝 v0.3.0 Announcement
- 📊 View the release run metrics on Google Colab to get a head start on your experimentation.
- [5/14/2025] [Reproduce DeepscaleR with NeMo RL!](docs/guides/grpo-deepscaler.md)
- [5/14/2025] Release v0.2.1!
- 📊 View the release run metrics on Google Colab to get a head start on your experimentation.
Overview
NeMo RL is an open-source post-training library under the NVIDIA NeMo Framework, designed to streamline and scale reinforcement learning methods for multimodal models (LLMs, VLMs etc.). Designed for flexibility, reproducibility, and scale, NeMo RL enables both small-scale experiments and massive multi-GPU, multi-node deployments for fast experimentation in research and production environments.
What you can expect:
- Flexibility with a modular design that allows easy integration and customization.
- Efficient resource management using Ray, enabling scalable and flexible deployment across different hardware configurations.
- Hackable with native PyTorch-only paths for quick research prototypes.
- High performance with Megatron Core, supporting various parallelism techniques for large models and large context lengths.
- Seamless integration with Hugging Face for ease of use, allowing users to leverage a wide range of pre-trained models and tools.
- Comprehensive documentation that is both detailed and user-friendly, with practical examples.
Please refer to our design documents for more details on the architecture and design philosophy.
Training Backends
NeMo RL supports multiple training backends to accommodate different model sizes and hardware configurations:
- DTensor - PyTorch's next-generation distributed training with improved memory…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10low stars routine fork