ReleaseNVIDIANVIDIApublished Jun 9, 2026seen 1d

NVIDIA/TensorRT-RTX-EP-ABI v0.3.0

NVIDIA/TensorRT-RTX-EP-ABI

Open original ↗

Captured source

source ↗
published Jun 9, 2026seen 1dcaptured 1dhttp 200method plain

v0.3.0

Repository: NVIDIA/TensorRT-RTX-EP-ABI

Tag: v0.3.0

Published: 2026-06-09T16:36:14Z

Prerelease: no

Release notes: Wheel packaging, ORT API negotiation, weight streaming, and memory/quantization improvements

  • Add Python wheel packaging: meta-package plus per-CUDA variants (cu12, cu13),

Linux SONAME/symlink handling, and PyPI READMEs

  • Negotiate ORT API version with the host so one DLL serves ONNX Runtime 1.24-1.26+
  • Add TensorRT-RTX weight streaming budget support; auto-disable CUDA Graphs when

weight streaming is enabled

  • Improve memory handling: auto-fallback from CUDA async mempool to sync arena and

add arena Shrink() to release unused regions

  • Add policy-driven Q/DQ lowering for asymmetric quantization
  • Port the EP ABI to C++20 and add Windows-on-Arm cross-compile via vcpkg
  • Fix UTF-8 handling for non-ASCII cache paths, capability-discovery and EP-context

crashes, EPContext external engine resolution, and CUDA 13.x build breaks

Contributors to this release of TensorRT RTX EP ABI: @keshavv27, @gedoensmax, @anujj, @ishwar-raut1, @umangb-09, @nitthilan, @yen-shi, @praneshgo, @wenbingl

Notability

notability 3.0/10

Routine minor version release of a specialized NVIDIA tool.