NVIDIA/TensorRT-RTX-EP-ABI v0.3.0
NVIDIA/TensorRT-RTX-EP-ABI
Captured source
source ↗published Jun 9, 2026seen 1dcaptured 1dhttp 200method plain
v0.3.0
Repository: NVIDIA/TensorRT-RTX-EP-ABI
Tag: v0.3.0
Published: 2026-06-09T16:36:14Z
Prerelease: no
Release notes: Wheel packaging, ORT API negotiation, weight streaming, and memory/quantization improvements
- Add Python wheel packaging: meta-package plus per-CUDA variants (cu12, cu13),
Linux SONAME/symlink handling, and PyPI READMEs
- Negotiate ORT API version with the host so one DLL serves ONNX Runtime 1.24-1.26+
- Add TensorRT-RTX weight streaming budget support; auto-disable CUDA Graphs when
weight streaming is enabled
- Improve memory handling: auto-fallback from CUDA async mempool to sync arena and
add arena Shrink() to release unused regions
- Add policy-driven Q/DQ lowering for asymmetric quantization
- Port the EP ABI to C++20 and add Windows-on-Arm cross-compile via vcpkg
- Fix UTF-8 handling for non-ASCII cache paths, capability-discovery and EP-context
crashes, EPContext external engine resolution, and CUDA 13.x build breaks
Contributors to this release of TensorRT RTX EP ABI: @keshavv27, @gedoensmax, @anujj, @ishwar-raut1, @umangb-09, @nitthilan, @yen-shi, @praneshgo, @wenbingl
Notability
notability 3.0/10Routine minor version release of a specialized NVIDIA tool.