What does this release signal mean?

Microsoft published microsoft/onnxruntime v1.27.0 (microsoft/onnxruntime). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: Microsoft's cross-platform inference engine for ONNX models. · ONNX Runtime v1.27.0 Repository: microsoft/onnxruntime Tag: v1.27.0 Published: 2026-06-19T21:11:07Z Prerelease: no Release notes: n.b. This release is targeting ONNX.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

Microsoft Release: microsoft/onnxruntime v1.27.0

Captured source

source ↗

GitHub/github.com/microsoft/onnxruntime

microsoft/onnxruntime v1.27.0

Source ↗

published Jun 19, 2026seen 6dcaptured 6dhttp 200method plain

ONNX Runtime v1.27.0

Repository: microsoft/onnxruntime

Tag: v1.27.0

Published: 2026-06-19T21:11:07Z

Prerelease: no

Release notes: n.b. This release is targeting ONNX 1.21. ONNX 1.22 will be supported in ORT 1.28. n.b. This changelog was generated via LLM. Only the contributor list has been verified. As always, only trust the commit history.

Announcements & Breaking Changes

CUDA 12 package files are now explicitly named as such.
CUDA 12 packages are deprecated, please move to CUDA 13 ASAP.

---

Security Fixes

Fixed out-of-bounds read in SoftmaxCrossEntropyLoss via label bounds validation (#28004)
Hardened OneHot input validation and output-size computation (#28014)
Added SafeInt overflow protection in Expand and capped constant-folding output sizes (#28055)
Bounded total output allocation size in Tile kernel (#28070)
Added mask/input shape consistency checks in MaxpoolWithMask::Compute (#28223)
Fixed BitShift UB for shift amounts greater than or equal to bit width (#28272)
Validated sequence bounds in GQA (seqlens_k vs cos_cache) (#28277)
Validated conv bias shape in WordConvEmbedding to prevent OOB reads (#28279)
Fixed int32 overflow in CUDA Cast and UnaryElementWise kernels for very large tensors (#28386)
Fixed out-of-bounds read in CropBase scale handling (#28399)
Fixed rank-underflow bug in Inverse kernel trailing-dimension indexing (#28400)
Added sparse tensor external file path validation and additional external-path hardening (#28408, #28709, #28725)
Switched remaining torch.load() calls to weights_only=True (#28421)
Added CPU cache-indirection beam-index validation (#28486)
Added additional overflow/bounds checks and test coverage in runtime buffers (#28713, #28747)

---

New Features

Execution Provider Plugin API

Added zero-copy I/O for plugin EPs with HOST_ACCESSIBLE memory (#28037)
Added OrtEp::OnSessionInitializationEnd() callback (#28319)
Added plugin EP session-options getters (#28377)
Added CUDA Plugin EP provider options for streams and external allocators (#28603)

Core APIs & Runtime

Added support for ONNX overloaded functions (IR v10+) (#28275)
Added FLOAT8E8M0 datatype support in ONNX Runtime (#28381)
Added CPU Cast support for FLOAT8E8M0 (#28435)
Added kOrtEpDevice_EpMetadataKey_OSDriverVersion example and docs (#28282)

Quantization & Training Tooling

Added calibration cache support to quantize_static (#28221)
Added ActivationRestrictedAsymmetric quantization option (#28237)
Added opset-21 block_size attribute support to QDQ quantization (#28522)
Added CPU fallback for FusedAdam optimizer in ORT Training (#28233)

---

Execution Provider Updates

NVIDIA CUDA EP

Added ConvTranspose-22 support (#27710)
Filled CUDA opset gaps for LSTM, RNN, Reshape, Cast, Round/Equal, ReduceMax/ReduceMin, Sin/Cos, and Random* ops (#27737, #27743, #27742, #27744, #27754, #27755, #27756, #27759)
Added LpNormalization support for CUDA EP (#28724)
Added chunked dequant+GEMM for MatMulNBits to reduce peak GPU memory (#28712)
Added QMoE tests for standard swiglu and improved decode-path routing/softmax kernels (#28741, #29026)
Fixed CUDA Attention dispatch mismatch for GQA head-size cases (#28358)
Fixed CUTLASS FMHA bias-loader alignment on unaligned kernel path (#28369)

WebGPU EP

Added LSTM support on WebGPU (#27881)
Added per-graph buffer manager for multi-graph capture (#28260)
Added QKV and MLP layer fusions for Qwen3-style models (#28280)
Added QKV bias support in FlashAttention for MultiHeadAttention (#28380)
Added shader dump-to-file environment variable and nightly validation checks (#28674)
Added opset-24 +...

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Routine release of inference engine, not major breakthrough.