RepoNebiusNebiuspublished Apr 7, 2026seen 5d

nebius/nebius-physical-ai

Python

Open original ↗

Captured source

source ↗
published Apr 7, 2026seen 5dcaptured 9hhttp 200method plain

nebius/nebius-physical-ai

Language: Python

License: Apache-2.0

Stars: 7

Forks: 4

Open issues: 9

Created: 2026-04-07T16:04:02Z

Pushed: 2026-06-11T02:35:51Z

Default branch: main

Fork: no

Archived: no

README:

Nebius Physical AI

Partners integrate independently. Teams assemble from open blueprints. Nebius owns the infrastructure layer and compute substrate.

![Nebius Physical AI Workbench](docs/assets/workbench-architecture.png)

npa is the CLI and SDK for physical-AI workloads on Nebius. Workbench is the primary solution: it gives developers one command surface for data curation, simulation, synthetic data, policy training, evaluation, export, observability, and SkyPilot workflows running on the Nebius substrate of object storage, orchestration, vLLM serving, managed Kubernetes, and GPU clusters.

Quick Start

Install the npa package into a fresh virtual environment. The venv can live anywhere; activating it puts npa on your PATH (Python 3.10+ required):

git clone https://github.com/nebius/nebius-physical-ai.git
cd nebius-physical-ai

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e npa

npa --version

Run your first real result with no cloud, GPU, or credentials — score a shipped sample rollout set with the offline stub backend:

npa workbench vlm-eval benchmark \
--dataset npa/src/npa/workbench/vlm_eval/fixtures/sample_benchmark/benchmark.json \
--output /tmp/vlm-eval-benchmark.json \
--backend stub \
--thresholds 0.5,0.8,0.9 \
--rubrics default,strict \
--models Qwen/Qwen2-VL-7B-Instruct \
--format json

You should see a ranked report with accuracy: 1.0 over four labeled rollouts. That is the full local loop; the same command swaps --backend stub for a real self-hosted or api VLM backend once you add credentials.

Next, authenticate with the Nebius CLI and print the credential schema (see [docs/quickstart.md](docs/quickstart.md) for the full walkthrough):

nebius profile create
nebius iam get-access-token >/dev/null
npa configure

The flagship GPU workload is NVIDIA Cosmos (world-foundation model for synthetic data and world generation). It runs across multiple NVIDIA GPU platforms via a single --gpu-type flag (gpu-h100-sxm, gpu-h200-sxm, gpu-b300-sxm, gpu-l40s) with no RT-core lock-in:

npa workbench cosmos -p -n cosmos deploy \
--runtime serverless --gpu-type --wait
npa workbench cosmos -p -n cosmos infer \
--prompt "A robot arm stacks colored cubes" \
--output-path s3:///cosmos/out/

Cosmos needs Nebius credentials, an HF_TOKEN, and GPU capacity; see the flagship walkthrough in [docs/quickstart.md](docs/quickstart.md#7-flagship-gpu-workload-nvidia-cosmos).

To work on npa itself (tests, lint), install the dev extra and run the fast suite — see [CONTRIBUTING.md](CONTRIBUTING.md):

pip install -e "npa[dev]"
make test

For full cloud setup, continue with [docs/quickstart.md](docs/quickstart.md) and [docs/workbench/getting-started.md](docs/workbench/getting-started.md).

Workbench

Workbench is the main product surface in this repository. Current Workbench tools are mounted directly under npa workbench; there is no solutions CLI namespace.

| Category | Workbench commands | | --- | --- | | Data curation | npa workbench data sync, npa workbench data status, npa workbench data list; npa workbench fiftyone curate, eval, load-dataset, datasets list; npa workbench lancedb deploy, create-table, import-lerobot, import-bdd100k, backfill, create-mv, refresh-mv, query-table, query; npa workbench detection-training train, eval, status, list | | Synthetic data | npa workbench cosmos infer, train, serve, status; npa workbench genesis generate-demos; SkyPilot templates such as npa/workflows/workbench/skypilot/bdd100k-pipeline.yaml and npa/workflows/workbench/templates/curate-augment-train.yaml | | Simulation | npa workbench isaac-lab train, eval, export-lerobot; npa workbench genesis train-teacher, generate-demos, eval-teacher, eval-student, diagnose, tune; npa workbench retargeting run | | Eval | npa workbench vlm-eval run, benchmark, workflow, status, list; npa workbench mjlab eval; npa workbench sonic eval; npa workbench fiftyone eval; npa workbench isaac-lab eval; npa workbench genesis eval-student | | Observability | Tool-level status, list, and system-info commands; npa workbench workflow status, logs; npa rerun host, share, list-shares, revoke; npa cluster status, list | | Robot policy | npa workbench lerobot train, eval, serve, infer, list-checkpoints, benchmark, profile-train, train-student; npa workbench groot download, finetune, eval, serve, infer, convert; npa workbench sonic train, serve, export, eval, status, list | | World models | npa workbench cosmos deploy, serve, infer, train, status, system-info | | Blueprints | npa workbench workflow submit, run, status, logs, teardown, distill; checked-in YAML under npa/workflows/workbench/skypilot/ for Isaac Lab, VLM eval, SONIC export, SONIC eval, SONIC locomotion fine-tuning, retargeting, MJLab eval, sim-to-real, and BDD100K pipelines |

Eval: VLM Backend

vlm-eval is a first-class Eval capability. It scores rollout artifacts with self-hosted, API, or stub backends and has a checked-in SkyPilot template at npa/workflows/workbench/skypilot/vlm-eval.yaml. The benchmark command sweeps a labeled rollout set across thresholds, rubrics, and models, then writes a ranked accuracy report with the best config.

npa workbench vlm-eval list
npa workbench vlm-eval status
npa workbench vlm-eval workflow
npa workbench vlm-eval run \
--input-path ./rollout.json \
--output-path ./eval.json \
--backend stub \
--score 0.9 \
--dry-run
npa workbench vlm-eval benchmark \
--dataset npa/src/npa/workbench/vlm_eval/fixtures/sample_benchmark/benchmark.json \
--output /tmp/vlm-eval-benchmark.json \
--backend stub \
--thresholds 0.5,0.8,0.9 \
--rubrics default,strict \
--models Qwen/Qwen2-VL-7B-Instruct

The self-hosted workflow starts an OpenAI-compatible vLLM server and then calls npa workbench vlm-eval run; the benchmark workflow does the same for npa workbench vlm-eval benchmark.

Robot Policy: GR00T, LeRobot, and SONIC

Robot policy work is split across policy training/serving, humanoid…

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Low traction new repo