Scaling Reinforcement Learning with torchforge on CoreWeave Cloud
Captured source
source ↗Scaling Reinforcement Learning with torchforge on CoreWeave Cloud
Announcement
Announcement
Webinar
Announcement
Podcast
Announcement
GTC 2026
Announcement
CoreWeave brings up the industry’s first NVIDIA Vera Rubin NVL72 deployment.
Read more
Products
Data and storage
Infrastructure control
Runtime acceleration
Model and agent development
Mission control
Solutions
Pricing
Resources
About us
Contact us Login
Contact us Login
Clear
Scaling RL on CoreWeave with torchforge CoreWeave continues to make Reinforcement Learning (RL) easy to use and scalable for researchers and developers. After launching the first publicly available Serverless Reinforcement Learning capability to build reliable AI Agents, CoreWeave is excited to announce support for torchforge , a new PyTorch-scalable RL framework. torchforge simplifies RL by separating algorithm design from distributed infrastructure, enabling researchers to scale complex RL workloads to thousands of GPUs with minimal code and maximum efficiency. Researchers can use torchforge with CoreWeave’s industry-leading Slurm-on-Kubernetes (SUNK) for training and post-training workflows on CoreWeave Cloud. In a 3-way partnership, Meta, Stanford's Scaling Intelligence Lab, and CoreWeave conducted a large-scale post-training run of a state-of-the-art dense coding model on a cluster with 512 NVIDIA H100 GPUs on the CoreWeave Cloud Platform. The collaboration confirmed torchforge’s stability, performance, and production-grade functionality at scale on CoreWeave, helping move RL from research into robust and reproducible production pipelines. Reinforcement learning: Better model performance at lower cost Reinforcement Learning (RL) is the leading post-training technique for improving model performance while reducing serving costs. Unlike supervised fine-tuning (SFT), which teaches a model to imitate patterns in labeled data, RL trains from outcomes using feedback and rewards. This approach has powered breakthroughs such as DeepSeek R1 and other state-of-the-art models, where RL enabled meaningful performance gains. In practice, it allows smaller models to match the performance of larger ones on specific tasks, while being more cost-effective and faster to run. torchforge reduces infrastructure complexity for researchers RL workflows combine continuous inference and training in a tightly coupled loop, which includes generating responses to prompts, scoring them with a reward model, and updating the base model using those scores. At scale, this process is difficult to orchestrate. Researchers must manage separate inference and training stacks, shard models correctly across GPUs and nodes, handle failures gracefully, and transfer weights efficiently between the two phases. These infrastructure concerns are often embedded directly into the RL loop, consuming researcher time and compute that would otherwise go toward improving model quality. torchforge solves this by separating algorithmic logic from infrastructure management, allowing researchers to focus entirely on the RL algorithm itself: data, rewards, losses, and environments, without the burden of managing infrastructure at scale. torchforge, which implements GRPO—a popular RL algorithm aimed at improving a model’s reasoning quality, allows researchers to write RL code almost as simply as pseudocode while managing scaling, routing, load balancing, and fault tolerance. Built on Monarch , a PyTorch-native distributed programming framework, torchforge can scale to thousands of GPUs with fast, fault-tolerant data movement through RDMA. This means researchers can run larger, more complex RL experiments faster and more reliably, turning what was once weeks of engineering overhead into repeatable, production-ready training workflows. SUNK enables torchforge to scale CoreWeave offers the industry-leading solution for running torchforge because of its purpose-built AI cloud, infrastructure reliability, and delightful researcher experience with its Slurm on Kubernetes (SUNK) offering. CoreWeave ensures researchers and platform engineers can run torchforge with higher performance, efficiency, scale, and reliability. The motivation for running torchforge on SUNK is the need for a scheduler that is reliable and efficient at scale. In a shared research cluster, improving utilization is hard because multi-node jobs must launch together, inference and training stress the network fabric in different ways, long generations and verifiers can slow down tasks, and frequent weight syncs can yield low performance in cases where network fabric topology is not optimized. Slurm addresses these scheduling challenges through features such as priorities, preemption, quotas, gang scheduling, and topology-aware scheduling. When the number of compute nodes grows into the thousands, managing Slurm itself becomes increasingly complex, and availability starts to limit throughput. To address this, CoreWeave offers SUNK, combining Slurm’s advanced scheduling with the orchestration power of our managed Kubernetes service, CKS, to support jobs spanning more than 32,000 GPUs. SUNK ensures high availability for Slurm components, scales compute nodes on demand, and replaces the Slurm controller API to handle hundreds of thousands of jobs. For RL workloads running with torchforge, this translates directly into faster job startup, more consistent cluster utilization, and higher end-to-end throughput across large asynchronous rollouts and training loops. Researchers can launch large-scale torchforge runs with reduced queuing delays and infrastructure interruptions, keeping GPUs busy and experiments progressing continuously. SUNK also offers a unique researcher-centric experience through secure isolated environments with Individual Login Nodes, file systems mounted in-cluster, tooling to easily use and build container images, and IdP-federated cluster access management through Automated User Provisioning. Together, these features simplify day-to-day operations for researchers and platform teams alike, making large-scale RL experimentation on torchforge both seamless to manage and effortless to scale. Running torchforge on SUNK on CoreWeave Running torchforge on SUNK feels familiar to anyone already using Slurm. Researchers can start by defining their torchforge job in a standard Slurm batch script and launch it using sbatch or srun. Once submitted, SUNK handles GPU allocation and node scaling automatically, while…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10Substantive post on scaling RL, not a major model release.