What does this release signal mean?

Microsoft published microsoft/onnxruntime v1.26.0 (microsoft/onnxruntime). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: Cross-platform inference engine for ONNX machine learning models. · 1.26.0 Repository: microsoft/onnxruntime Tag: v1.26.0 Published: 2026-05-08T19:24:39Z Prerelease: no Release notes: n.b. The following was generated via LLM from Git.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

Microsoft Release: microsoft/onnxruntime v1.26.0

Captured source

source ↗

GitHub/github.com/microsoft/onnxruntime

microsoft/onnxruntime v1.26.0

Source ↗

published May 8, 2026seen Jun 6captured Jun 11http 200method plain

1.26.0

Repository: microsoft/onnxruntime

Tag: v1.26.0

Published: 2026-05-08T19:24:39Z

Prerelease: no

Release notes: n.b. The following was generated via LLM from Git history. Only the contributor list has been verified.

ONNX Runtime Release 1.26.0

Announcement - Breaking Changes

Support for CUDA 12 will be removed in 1.27.0.
CUDA 13 will continue to be published as onnxruntime---gpu_cuda13-.
CUDA runtime will be moving soon to a dedicated Execution Provider (EP) instead of a published package from ORT core.

Highlights

Added optional memory mapping for .ort model loads (#28164).
Added RISC-V Vector (RVV) support for CPU EP (#28261).
OpenVINO EP upgraded for 1.26.0 development release (#28297).
WebGPU gained GridSample support (#28264) and Split-K improvements (#28151).
CUDA plugin EP gained graph support (#28002), profiling API (#28216).

Security and Reliability Hardening

Replaced unrestricted Python setattr configuration with an allowlist (#28083).
Hardened multiple OOB and overflow scenarios across ML and core ops:
Attention mask index OOB write (#27789).
MaxPoolGrad indices bounds validation (#27903).
SVM and TreeEnsemble bounds/security fixes (#27950, #27951, #27952, #27989).
RNN sequence_lens OOB read and integer overflow handling (#28052, #28003).
GroupQueryAttention seqlens_k bounds validation and compatibility follow-up (#28031, #28259).
MatMulBnb4 and ML coefficient SafeInt checks (#27995, #28001).
CUDA Gather int32 overflow fix (#28108).
GridSample float->int64 cast hardening for NaN/Inf/out-of-range coords (#28302).
Fixed session logger use-after-free during EP teardown under verbose logging (#28274).

CUDA, Attention, and MLAS

Filled CUDA opset/operator gaps and extended support:
Transpose opset 23 -> 25 (#27740).
QuantizeLinear/DequantizeLinear opset 25 (#28046).
CUDA TopK INT8/INT16/UINT8 support (#27862).
LabelEncoder CUDA support for numeric types (#28045).
Attention/GQA improvements:
Fixed ONNX Attention min-bias alignment crash on SM 1 (#28151).
MatMulNBits refactor and batching improvements (#28109, #28197).
MHA correctness fix when present outputs are not requested (#28027).
Buffer upload overflow fix (#27948).
Position ID bounds validation in WebGPU/JS RotaryEmbedding (#28214).
WebNN change:
Renamed pool2d property roundingType -> outputShapeRounding (#28172).
JavaScript ecosystem maintenance:
Multiple dependency bumps.

Plugin EP and EP Ecosystem

CUDA plugin EP:
Graph capture/replay support ported and expanded (#27958, #28002).
Sync support for IOBinding (#27919).
Profiling API implementation (#28216).
Resource accounting integration (#28028).
WebGPU plugin EP:
Pipeline updates and API init error handling fixes (#28121, #28211).
Other EP updates:
CoreML: HardSigmoid and QuickGelu support; Pad reflect support/fixes (#28182, #28184, #28073, #28062).
NvTensorRTRTX compatibility and diagnostics updates (#28263, #27577).
QNN file-mapping guard improvements (#27871).

Contributors

@tianleiwu, @yuslepukhin, @edgchen1, @vraspar, @hariharans29, @skottmckay, @eserscor, @xadupre, @sanaa-hamel-microsoft, @elwhyjay, @Rishi-Dave, @titaiwangms, @adrianlizarraga, @jatinwadhwa921, @jchen10, @Jiawei-Shao,...

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

New major version of popular ML runtime