ReleaseMicrosoftMicrosoftpublished Apr 20, 2026seen 5d

microsoft/onnxruntime v1.25.0

microsoft/onnxruntime

Open original ↗

Captured source

source ↗
published Apr 20, 2026seen 5dcaptured 8hhttp 200method plain

ONNX Runtime v1.25.0

Repository: microsoft/onnxruntime

Tag: v1.25.0

Published: 2026-04-20T18:25:24Z

Prerelease: no

Release notes:

📢 Announcements & Breaking Changes

Build & Platform

  • C++20 is now required to build ONNX Runtime from source. Minimum toolchains: MSVC 19.29+, GCC 10+, Clang 10+. Users of prebuilt packages are unaffected. (#27178)
  • CUDA minimum version raised to 12.0 — CUDA 11.x is no longer supported. Users pinned to CUDA 11.x should stay on ORT 1.24.x or upgrade their CUDA toolkit/driver. (#27570)
  • ONNX upgraded to 1.21.0 (#27601)
  • sympy is now an optional dependency for Python builds. (#27200)

Execution Provider Changes

  • ArmNN EP has been removed. Users should remove any --use_armnn build flags and migrate to the MLAS/KleidiAI-backed CPU EP or QNN EP for Qualcomm hardware. (#27447)

API Version

  • ORT_API_VERSION updated to 25. (#27280)

---

🔒 Security Fixes

  • Fixed potential integer truncation leading to heap out-of-bounds read/write (#27544)
  • Addressed Pad Reflect vulnerability (#27652)
  • Security fix for transpose optimizer (#27555)
  • Upgraded minimatch 3.1.2 → 3.1.4 for CVE-2026-27904 (#27667)
  • Hardened shell command handling for constant strings (#27840)
  • Added validation of onnx::TensorProto data size before allocation (#27547)
  • Cleaned up external data path validation (#27539)
  • Fixed misaligned address reads for tensor attributes from raw data buffers (#27312)
  • Fixed CPU Attention overflow issue (#27822)
  • Fixed CPU LRN integer overflow issues (#27886)
  • Additional input validation hardening:
  • Tile kernel dim overflow (#27566)
  • Out-of-bounds read in cross entropy (#27568)
  • TreeEnsembleClassifier attributes (#27571)
  • AffineGrid (#27572)
  • EmbedLayerNorm position_ids (#27573)
  • RotaryEmbedding position_ids (#27597)
  • RoiAlign batch_indices (#27603)
  • MaxUnpool indices (#27432)
  • QMoECPU swiglu OOB (#27748)
  • SVMClassifier initializer (#27699)
  • Col2Im SafeInt (#27625)

---

✨ New Features

🔌 Execution Provider Plugin API & CUDA Plugin EP

ORT 1.25.0 introduces the CUDA Plugin EP — the first core implementation that enables third-party CUDA-backed EPs to be delivered as dynamically loaded plugins without rebuilding ORT.

  • CUDA Plugin EP: Core implementation (#27816)
  • CUDA Plugin EP: BFC-style arena and CUDA mempool allocators for stream-aware memory management (#27931)
  • Plugin EP Sync API for synchronous execution (#27538)
  • Plugin EP event profiling APIs (#27649)
  • Plugin EP APIs to retrieve ONNX operator schemas (#27713)
  • Annotation-based graph partitioning with resource accounting (#27595, #27972)
  • EP API adapter improvements: header-only adapter, OpKernelInfo::GetConfigOptions, LoggingManager::HasDefaultLogger() (#26879, #26919, #27540, #27541, #27587)
  • WebGPU EP made compatible with EP API (#26907)

🔧 Core APIs

  • Per-session thread pool work callbacks API (#27253)
  • `enable_profiling` in RunOptions (#26846)
  • KernelInfo string-array attribute APIs for C and C++ (#27599)
  • OrtModel input support for Compile API (#27332)
  • Session config to create weightless EPContext models during compilation (#27197)
  • Compiled model compatibility APIs in example plugin EP (#27088)
  • Model Package support (preview): Initial infrastructure for automatically selecting compiled EPContext model variants from a packaged collection based on EP, device, and hardware constraints. The directory structure is not yet finalized. (#27786)

📊 New ONNX Ops & Opset Coverage

  • Attention opset 23 on CUDA with GQA, boolean masks, softcap, and softmax precision (#26466, #27030, #27082,…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Routine version release of ONNX Runtime