RepoNVIDIANVIDIApublished Dec 3, 2025seen 5d

NVIDIA/aerial-framework

C++

Open original ↗

Captured source

source ↗
published Dec 3, 2025seen 5dcaptured 8hhttp 200method plain

NVIDIA/aerial-framework

Description: A toolchain for generating high-performance, GPU-accelerated 5G/6G pipelines from Python and a modular, real-time runtime for executing the pipelines on NVIDIA Aerial™ RAN Computer platforms.

Language: C++

License: NOASSERTION

Stars: 122

Forks: 24

Open issues: 0

Created: 2025-12-03T00:22:32Z

Pushed: 2025-12-10T05:55:44Z

Default branch: main

Fork: no

Archived: no

README:

NVIDIA Aerial™ Framework

A real-time signal processing framework

The Aerial Framework has been designed from the ground up to meet the needs of 3GPP Radio Access Networks — signal processing workloads with microsecond latency requirements. It is a single platform that unites research, testbeds, and production deployments to solve development challenges for real-time applications.

Use cases: Signal processing applications with strict latency requirements

Audience: RAN system engineers, signal processing specialists, AI researchers

Built with: DOCA, DPDK, TensorRT, Python, JAX, PyTorch, C++, CUDA, and more

Features

  • Python → Real-time – Prototype in Python and lower to high-performance GPU code.
  • 🍱 Clean separation – Decouple signal-processing algorithm development from runtime execution.
  • 🧩 Modular pipelines – Compose end-to-end pipelines from compiled, executable modules.
  • 🔭 Observability built-in – Hooks for profiling and monitoring throughout development.
  • 🔁 One codebase – Reuse components for prototyping, simulation, testing, and deployment.
  • 🚀 Modern toolchain – Python 3.12+, C++20, CUDA 12.9, CMake, JAX, PyTorch, uv, ruff.
  • 💻 Developer-friendly – Prototype on local machines and scale to live, production deployments.
  • 📚 Guided tutorials – Jupyter notebooks ready to run in a Docker container.
  • 🤖 Targets 5GAdv & 6G – Ships with an example AI-native PUSCH Pipeline. More to come.

How It Works

The Aerial Framework combines two components:

  • Developer tools: Tools to convert Python/JAX/PyTorch and C++/CUDA into pipelines of GPU-native code
  • Runtime engine: Coordinates the execution of GPU pipelines with network interfaces

Aerial Framework Developer Tools

  • JAX → TensorRT – Export JAX programs to StableHLO and lower to TensorRT engines using MLIR-TensorRT
  • Multi-language – Author algorithms in JAX, PyTorch, or C++/CUDA and deploy to common runtime engine
  • Modern Profiling – Leverage NVIDIA Nsight Systems to optimize pipelines and individual kernels to μs-level
  • AI native – Seamlessly integrate with AI Frameworks allowing end-to-end differentiability

Aerial Framework Runtime

  • CUDA graphs – GPU operations run as CUDA graphs with TensorRT integration for deterministic execution
  • Task scheduler – Pinned, high-priority threads on isolated CPU cores enforce microsecond slot timing
  • Inline GPU networking – DOCA GPUNetIO and GPUDirect RDMA enable zero-copy packet transfer NIC ↔ GPU
  • Production driver – Orchestrates pipelines, memory pools & multi-cell coordination

Development → Deployment Workflow

Aerial Framework supports two different environments depending on your use case.

Development - Developers prototype and optimize their algorithms in Python and then compile to GPU native code using MLIR-TensorRT. This is accessible to any developer with a recent GPU (compute capability ≥ 8).

Runtime - Deployments run compiled TensorRT engines with deterministic scheduling and high-performance networking. Testing requires a GPU, NIC, and real-time kernel to validate that pipelines meet latency constraints using Medium Access Control (MAC) and Radio Unit (RU) emulation.

Stage Description Environment

Prototype Write and validate algorithms (NumPy/JAX/PyTorch) Development

Lower Compile Python code into GPU executables using NVIDIA MLIR-TensorRT

Profile Optimize performance using modern profiling tools like NVIDIA Nsight Systems

Compose Assemble TensorRT engines and CUDA kernels into modular pipelines Runtime

Execute Run with real-time task scheduling and networking

Validate Test PHY applications using standards-compliant MAC and RU emulators

This approach bridges:

  • Development Productivity - Write in high-level languages with rich ecosystems
  • Runtime Performance - Execute with the speed and determinism of optimized C++
  • Low Latency Requirements - Meet strict timing and latency constraints

Quickstart

**Install** the Docker container, then explore and build from source:

# 1) Configure (release preset)
cmake --preset clang-release

# 2) Build
cmake --build out/build/clang-release

# 3) Install Example Python Package - 5G RAN
cd ran/py && uv sync

Documentation & Tutorials

Documentation is available at: **docs.nvidia.com/aerial/framework**

Get started with step-by-step **Tutorials**.

| Tutorial | Summary | |---|---| | Getting Started | Set up Docker, verify GPU access, build the project, and run tests. | | PUSCH Receiver | Build a reference PUSCH receiver in NumPy with inner/outer receiver blocks. | | MLIR-TensorRT | Compile JAX functions (FIR filter example) to TensorRT engine(s). | | Lowering PUSCH | Compile complete PUSCH inner receiver to TensorRT and benchmark with Nsight. | | AI Channel Filter | Train a neural network to dynamically estimate channel filter parameters. | | Channel Filter Design | Design custom JAX channel estimators, lower to TensorRT, and profile with Nsight. | | Full PUSCH Pipeline | Run complete…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

New NVIDIA framework with moderate stars