RepoInclusionAI (Ant Group)InclusionAI (Ant Group)published Jun 9, 2026seen 4d

inclusionAI/AReno

Python

Open original ↗

Captured source

source ↗
published Jun 9, 2026seen 4dcaptured 4dhttp 200method plain

inclusionAI/AReno

Description: An easy-to-use, fast toolkit to scale up RL post-training on a single node.

Language: Python

License: Apache-2.0

Stars: 5

Forks: 0

Open issues: 51

Created: 2026-06-09T02:28:36Z

Pushed: 2026-06-22T07:00:31Z

Default branch: main

Fork: no

Archived: no

README: 👋 Hi, everyone! AReno is a fast, effortless, and self-contained toolkit that scales RL post-training up locally, initiated by the inclusionAI ASystem team and maintained by the AReno community.

AReno: ASystem Reinforcement Learning Nano

AReno is a local LLM post-training toolkit for RL, SFT/DPO-style training, serving, and agentic RL. It was originally developed by engineers from the ASystem Team at Ant Group.

Built on a self-contained, full-stack design, AReno is optimized to extract maximum performance from a single node, making it well-suited for fast, local post-training with no external training or inference backend in the loop.

AReno's mission is to make LLM RL accessible for a broad community of researchers and developers — so you can go from a base checkpoint to a trained, served model on a single node, without standing up a cluster or wiring together a training framework, an inference server, and a kernel library.

> Small but complete, like its name — nano in footprint, full-stack in > capability. We hope AReno makes scaling up your ideas locally both fast and > delightful. Enjoy!

Highlights

  • Plug-and-play: various post-training methods are easily accessible via the --algo flag or the same Trainer class from Python, no cluster or launcher to set up.
  • 🪶 Lightweight: single self-contained package, no external training/inference backend, just PyTorch, FlashAttention, and a handful of other libraries.
  • 🧰 Agentic RL ready: run an agent function against AReno's local OpenAI-compatible proxy, return explicit trajectories, and train from tokens, logprobs, rewards, and loss masks derived by the trainer.
  • 🧩 Extensible: easily register new algorithms, model adapters, reward functions, and hardware backends without changing the core.

Installation

Requirements:

  • Linux with an NVIDIA GPU (CUDA compute capability 8.0+)
  • CUDA toolkit, with CUDA_HOME set (so nvcc is on the build path)
  • PyTorch >= 2.6, matching your installed CUDA version

> Other platforms: Apple Silicon (M-series) and AMD GPUs are not supported — > the engine requires NVIDIA CUDA. On Windows, install under > WSL2 and follow the Linux > instructions. DGX Spark and other Grace/Blackwell systems work, but install an > aarch64 PyTorch build first.

Compatibility matrix:

| Environment | Status | Notes | | --- | --- | --- | | Linux x86_64 + NVIDIA GPU | Supported | Primary training/serving target. Use CUDA-enabled PyTorch >= 2.6 and build areno_accel. | | Linux aarch64 / Grace-Blackwell | Supported | Install a matching aarch64 CUDA PyTorch build first; build from source with --no-build-isolation. | | Windows WSL2 + NVIDIA GPU | Supported | Follow the Linux install path inside WSL2. Native Windows is not supported. | | macOS Apple Silicon | Metadata/docs only | Use ARENO_BUILD_EXT=0 for docs or packaging checks. Training/serving is not supported. | | CPU-only environments | Metadata/docs/tests only | CPU-only PyTorch can run lightweight docs/tests, but cannot train or serve AReno models. |

To install:

pip install psutil
pip install flash-linear-attention
pip install areno --no-build-isolation

--no-build-isolation is required so that pip uses your existing CUDA-enabled PyTorch instead of installing a CPU-only torch in an isolated build environment. Because build isolation is disabled, build-time helpers are not installed automatically; psutil must already be present because PyTorch's CUDA extension builder imports it while sizing parallel compile jobs. Install flash-attn only when using the default high-throughput --attn-backend flash path. If you run with --attn-backend native, or AReno automatically falls back to native attention on Turing GPUs like T4, flash-attn is optional and does not need to be installed.

Post-install readiness check:

areno check
areno env --json # attach this to setup/support reports

areno check fails fast with next steps for common setup problems such as missing or CPU-only PyTorch, unsupported PyTorch versions, missing CUDA_HOME/nvcc, missing build-time dependencies, unsupported platforms, or a skipped areno_accel build. Use areno env --json when opening an issue so maintainers can see the Python, CUDA, PyTorch, GPU, and extension state without guessing from low-level build errors.

From source (recommended if you want the examples or plan to contribute):

git clone https://github.com/inclusionAI/AReno.git
cd AReno
pip install psutil
pip install flash-linear-attention
pip install -e . --no-build-isolation

Docker setup escape hatch (recommended when you want to verify AReno before debugging local build state):

docker build -t areno .
docker run --gpus all --rm -it areno areno check

If you need local project files, model files, or a Hugging Face cache inside the container:

docker run --gpus all --rm -it \
-v $PWD:/workspace \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
areno \
areno check

Host checklist before blaming AReno setup:

nvidia-smi
docker run --gpus all --rm nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi
docker run --gpus all --rm areno areno check

Docker gives you a known-good Python/PyTorch/CUDA user-space install path and reuses the same areno check diagnostic flow. It does not replace host requirements: the host still needs a working NVIDIA driver, NVIDIA Container Toolkit support for --gpus all, and a driver new enough for the container CUDA runtime. Docker also does not solve model downloads, Hugging Face tokens, cache paths, network access, disk space, or multi-node networking; those remain user environment concerns.

Tips:

  • Install ninja (pip install ninja) before building so CUDA kernels compile in parallel.
  • If installation fails with No module named 'psutil', install it first (pip install psutil) and retry. This is required specifically for --no-build-isolation builds.
  • Install flash-attn before AReno only if you plan to use --attn-backend flash, the default high-throughput backend:

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

New repo with minimal traction