RepoNVIDIANVIDIApublished Mar 11, 2025seen 5d

NVIDIA/Isaac-GR00T

Python

Open original ↗

Captured source

source ↗
published Mar 11, 2025seen 5dcaptured 12hhttp 200method plain

NVIDIA/Isaac-GR00T

Description: NVIDIA Isaac GR00T N1.7 - A Foundation Model for Generalist Robots.

Language: Python

License: Apache-2.0

Stars: 7309

Forks: 1250

Open issues: 288

Created: 2025-03-11T18:34:24Z

Pushed: 2026-06-10T21:36:32Z

Default branch: main

Fork: no

Archived: no

README:

Table of Contents

  • [NVIDIA Isaac GR00T](#nvidia-isaac-gr00t)
  • [What's New in GR00T N1.7](#whats-new-in-gr00t-n17)
  • [Installation](#installation)
  • [Model Checkpoints & Embodiment Tags](#model-checkpoints--embodiment-tags)
  • [Data Format](#data-format)
  • [Inference](#inference)
  • [Fine-tuning](#fine-tuning)
  • [Evaluation](#evaluation)
  • [Contributions](#contributions)
  • [License](#license)
  • [Citation](#citation)

---

NVIDIA Isaac GR00T

> We just released GR00T N1.7 Early Access, the latest version of GR00T N1 with a new VLM backbone (Cosmos-Reason2-2B / Qwen3-VL) and improved performance.

> This is an Early Access (EA) release. You are welcome to download the model, explore the codebase, and begin building on the stack, with the understanding that support and stability guarantees are limited until the GA release. > > What's available: > - Pre-trained GR00T N1.7 model weights and reference code > - Fine-tuning and inference with custom robot data or demonstrations > - Experimentation, prototyping, and research use cases > > Available at GA: > - Production deployment with commercial support > - Complete benchmarks and a fully validated, stable feature set > - Pull request contributions > > We welcome feedback - please feel free to raise issues in this repository.

> To use older versions: N1.6 | N1.5

NVIDIA Isaac GR00T N1.7 is an open vision-language-action (VLA) model for generalized humanoid robot skills. This cross-embodiment model takes multimodal input, including language and images, to perform manipulation tasks in diverse environments.

GR00T N1.7 is trained on a diverse mixture of robot data including bimanual, semi-humanoid and an expansive humanoid dataset. It is adaptable through post-training for specific embodiments, tasks and environments.

GR00T N1.7 is fully commercially licensable under Apache 2.0. It delivers comparable performance to N1.6, with improved generalization and language-following capabilities driven by the inclusion of 20K hours of EgoScale human video data in pretraining.

The neural network architecture of GR00T N1.7 is a combination of vision-language foundation model and diffusion transformer head that denoises continuous actions. Here is a schematic diagram of the architecture:

Workflow Overview

1. Prepare data — Collect robot demonstrations (video, state, action) and convert them to the [GR00T LeRobot format](#data-format). Demo datasets are included for quick testing. 2. Run inference — Try zero-shot inference with the base model on [pretrain embodiments](#embodiment-tags), or use a [finetuned checkpoint](#checkpoints) for benchmark tasks. 3. Fine-tune — Adapt the model to your robot using [launch_finetune.py](#fine-tuning) with your own data and modality config. 4. Evaluate — Validate with [open-loop evaluation](#open-loop-evaluation), then test in [simulation benchmarks](#benchmark-examples) or on real hardware via the [Policy API](getting_started/policy.md). 5. Deploy — Connect Gr00tPolicy to your robot controller, optionally accelerated with [TensorRT](scripts/deployment/README.md).

What's New in GR00T N1.7

GR00T N1.7 builds on N1.6 with a new VLM backbone and code-level improvements.

1. Relative EEF Action Space — N1.7 adopts a relative end-effector action space shared across robot and human embodiments. Representing actions as deltas from the current pose (rather than absolute targets) improves generalization and is a key factor in the model's cross-embodiment performance. See [getting_started/finetune_new_embodiment.md](getting_started/finetune_new_embodiment.md) for guidance on configuring relative EEF for your own robot.

2. Human Video Pretraining — N1.7 is pretrained on 20K hours of EgoScale human video data alongside diverse robot demonstrations. Because the relative EEF action representation is consistent across both human and robot data, the model can transfer manipulation priors learned from human video directly to robot control.

Key Changes from N1.6

  • New VLM backbone: Cosmos-Reason2-2B (Qwen3-VL architecture), replacing the Eagle backbone used in N1.6. Supports flexible resolution and encodes images in their native aspect ratio without padding.
  • Simplified data processing pipeline (processing_gr00t_n1d7.py).
  • Added full pipeline export to ONNX and TensorRT with improved frequency.

---

Installation

Hardware Requirements

Inference: 1 GPU with 16 GB+ VRAM (e.g., RTX 4090, L40, H100, Jetson AGX Thor/Orin, DGX Spark).

Fine-tuning: 1 or more GPUs with 40 GB+ VRAM recommended. We recommend H100 or L40 nodes for optimal performance. Other hardware (e.g., A6000) works but may require longer training time. See the [Hardware Recommendation Guide](getting_started/hardware_recommendation.md) for detailed specs.

CUDA / Python per platform: dGPU on CUDA 12.8 with Python 3.10; Jetson Orin on CUDA 12.6 with Python 3.10; Jetson Thor and DGX Spark on CUDA 13.0 with Python 3.12. The per-platform install scripts and Dockerfiles live under scripts/deployment/; see the [Deployment & Inference Guide](scripts/deployment/README.md) for the full matrix.

Clone the Repository

GR00T relies on submodules for certain dependencies. Include them when cloning:

Note: git-lfs is required to download parquet data files in /demo_data. Install it before cloning: sudo apt install git-lfs && git lfs install.

git clone --recurse-submodules https://github.com/NVIDIA/Isaac-GR00T
cd Isaac-GR00T

If you've already cloned without submodules, initialize them separately:

git submodule update --init --recursive

Set Up the Environment

GR00T uses uv for fast, reproducible dependency management. Install uv first:

curl -LsSf https://astral.sh/uv/install.sh | sh

dGPU (x86_64) — Default

Install FFmpeg (required by torchcodec, the default video backend):

sudo apt-get update && sudo apt-get install -y ffmpeg

Create the environment and…

Excerpt shown — open the source for the full document.

Notability

notability 8.0/10

High stars, NVIDIA robotics model release