NVIDIA/xr-ai
Python
Captured source
source ↗NVIDIA/xr-ai
Description: XR AI
Language: Python
License: Apache-2.0
Stars: 18
Forks: 1
Open issues: 11
Created: 2026-04-20T18:11:25Z
Pushed: 2026-06-16T15:12:10Z
Default branch: main
Fork: no
Archived: no
README:
XR AI
Agentic AI for XR — an open-source foundation for multi-modal, real-time conversational AI within the CloudXR ecosystem.
Public Beta Notice
This project is publicly available in beta and is under active development. Features, APIs, documentation, and behavior may change as the project evolves. Expect bugs, incomplete functionality, and breaking changes. Use at your own discretion, and please report issues or feedback to help improve the project.
What is XR AI?
XR AI is a developer stack for building powerful XR and AI systems across devices, platforms, and deployment environments. It connects web, iOS/visionOS, AR glasses, and XR headset clients to GPU-accelerated AI services, tool-using agents, and the CloudXR stack for remote rendering.
With XR AI, developers can build agents that see and hear what the user experiences, reason over live physical context, call external tools through MCP, and respond with audio or data in the same XR session. The stack provides an end-to-end foundation for multimodal spatial computing applications: real-time media routing, participant-aware response handling, agent interfaces, AI service integration, remote rendering, and sample applications that show the pieces working together.
The value is speed without lock-in. XR AI is designed to work quickly with NVIDIA open models for vision, language, and speech, plus swappable speech synthesis services, while still giving developers the flexibility to bring their own models, services, tools, and application logic. Because it is built around NVIDIA GPU infrastructure, the same architecture can be deployed where the workload needs to run: cloud, data center, workstation, or edge.
XR AI also gives developers a practical path across product categories. Teams can start with AI glasses-style experiences that use live camera, audio, and agent responses, then extend the same framework to richer AR glasses or XR headset experiences that use CloudXR remote rendering. This lets developers build for today's lightweight AI devices while keeping a clear path to immersive, GPU-rendered spatial applications.
XR AI is especially useful when you need to:
- Build multimodal XR agents that can see, hear, reason, use tools, and respond in real time.
- Target multiple client platforms including web, iOS/visionOS, AR glasses, and XR headsets.
- Use NVIDIA open models out of the box while preserving the flexibility to bring your own models and services.
- Deploy wherever NVIDIA GPUs are available, from cloud and data center to workstation and edge.
- Start with AI glasses-style experiences and scale to CloudXR remote rendering for richer AR and XR applications.
- Keep transport, rendering, model services, tools, and agent logic separated so teams can evolve each layer independently.
Requirements
Hardware
XR AI samples are designed for a single NVIDIA RTX PRO 6000 Blackwell workstation GPU or an NVIDIA DGX Spark. Both provide enough VRAM to run the full model stack locally. If you prefer not to run models on local hardware, model endpoints are plain URLs — point the worker config at a cloud NIM or model endpoint and no local GPU is required for the agent or hub.
| Sample | Local VRAM needed | |---|---| | model-servers (all 4 models) | ~70 GB | | simple-vlm-example (standalone) | ~23 GB | | xr-render-demo (requires model-servers) | ~70 GB (models) + ~2 GB (hub/TTS) | | Hub only | none |
Software
| Requirement | Version | Notes | |---|---|---| | OS | Linux | Ubuntu 22.04 / 24.04 recommended | | Python | 3.11 or 3.12 | 3.10 and 3.13 are not supported | | uv | latest | dependency manager used by all samples | | NVIDIA driver | 570+ | required for local model inference | | Docker | 24+ | required: all vLLM-backed services (LLM, VLM) run in nvcr.io/nvidia/vllm containers | | NVIDIA Container Toolkit | latest | required: gives Docker access to the GPU. Without it, model_servers fails with failed to discover GPU vendor from CDI: no known GPU vendor found |
uv handles all Python dependencies per-sample — no global pip install or virtual-environment setup needed. If you do not have it:
curl -LsSf https://astral.sh/uv/install.sh | sh
The NVIDIA Container Toolkit install is one-time per host. Follow the official install guide and run the CDI / runtime-configure steps from there:
> https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Quick smoke-test once installed:
docker run --rm --gpus all nvidia/cuda:13.0.3-base-ubuntu24.04 nvidia-smi
GPU-profile prerequisites — install before uv sync for these targets:
- DGX Spark (
xr-render-demo/yaml/spark/):sudo apt install python3-dev
(All GPU profiles default to vllm_backend: docker, so the vLLM container ships nvcc + FlashInfer. If you switch a profile to vllm_backend: pip, see [docs/troubleshooting.md](docs/troubleshooting.md) for the host CUDA toolchain prereq.)
If uv sync or the VLM fails on first run, see [docs/troubleshooting.md](docs/troubleshooting.md).
Network — open the firewall ports listed in [docs/networking.md](docs/networking.md) before connecting from another machine. UDP 7882 is a silent-failure path: signaling succeeds but media frames are dropped if it is closed.
Architecture
| Layer | Directory | Description | |---|---|---| | Clients | client-samples/ | Android, iOS/visionOS, Web, and native StreamKit clients | | Server runtime | server-runtime/ | XR-Media-Hub + LiveKit internal transport | | Launcher | utils/xr-ai-launcher/ | stdlib-only process manager used by samples | | Logging | utils/xr-ai-logging/ | shared loguru sink + stdlib bridge for every process | | Agent interfaces | agent-mcp-servers/ | MCP adapters for XR data & rendering | | Agent demos | agent-samples/ | End-to-end agent pipelines | | Tests | tests/ | Multi-client / multi-agent integration tests |
Lightweight samples (simple-vlm-example) are self-contained — one command starts everything. Heavier demos (xr-render-demo) split model loading from the demo itself: start model-servers once, then run the demo as many times as you like without reloading...
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10New repo with low traction (18 stars).