microsoft/onnxruntime v1.27.0
microsoft/onnxruntime
Captured source
source ↗published Jun 19, 2026seen 6dcaptured 6dhttp 200method plain
ONNX Runtime v1.27.0
Repository: microsoft/onnxruntime
Tag: v1.27.0
Published: 2026-06-19T21:11:07Z
Prerelease: no
Release notes: n.b. This release is targeting ONNX 1.21. ONNX 1.22 will be supported in ORT 1.28. n.b. This changelog was generated via LLM. Only the contributor list has been verified. As always, only trust the commit history.
Announcements & Breaking Changes
- CUDA 12 package files are now explicitly named as such.
- CUDA 12 packages are deprecated, please move to CUDA 13 ASAP.
---
Security Fixes
- Fixed out-of-bounds read in
SoftmaxCrossEntropyLossvia label bounds validation (#28004) - Hardened
OneHotinput validation and output-size computation (#28014) - Added SafeInt overflow protection in
Expandand capped constant-folding output sizes (#28055) - Bounded total output allocation size in
Tilekernel (#28070) - Added mask/input shape consistency checks in
MaxpoolWithMask::Compute(#28223) - Fixed
BitShiftUB for shift amounts greater than or equal to bit width (#28272) - Validated sequence bounds in GQA (
seqlens_kvscos_cache) (#28277) - Validated conv bias shape in
WordConvEmbeddingto prevent OOB reads (#28279) - Fixed int32 overflow in CUDA Cast and UnaryElementWise kernels for very large tensors (#28386)
- Fixed out-of-bounds read in
CropBasescale handling (#28399) - Fixed rank-underflow bug in Inverse kernel trailing-dimension indexing (#28400)
- Added sparse tensor external file path validation and additional external-path hardening (#28408, #28709, #28725)
- Switched remaining
torch.load()calls toweights_only=True(#28421) - Added CPU cache-indirection beam-index validation (#28486)
- Added additional overflow/bounds checks and test coverage in runtime buffers (#28713, #28747)
---
New Features
Execution Provider Plugin API
- Added zero-copy I/O for plugin EPs with HOST_ACCESSIBLE memory (#28037)
- Added
OrtEp::OnSessionInitializationEnd()callback (#28319) - Added plugin EP session-options getters (#28377)
- Added CUDA Plugin EP provider options for streams and external allocators (#28603)
Core APIs & Runtime
- Added support for ONNX overloaded functions (IR v10+) (#28275)
- Added FLOAT8E8M0 datatype support in ONNX Runtime (#28381)
- Added CPU Cast support for FLOAT8E8M0 (#28435)
- Added
kOrtEpDevice_EpMetadataKey_OSDriverVersionexample and docs (#28282)
Quantization & Training Tooling
- Added calibration cache support to
quantize_static(#28221) - Added
ActivationRestrictedAsymmetricquantization option (#28237) - Added opset-21
block_sizeattribute support to QDQ quantization (#28522) - Added CPU fallback for
FusedAdamoptimizer in ORT Training (#28233)
---
Execution Provider Updates
NVIDIA CUDA EP
- Added
ConvTranspose-22support (#27710) - Filled CUDA opset gaps for LSTM, RNN, Reshape, Cast, Round/Equal, ReduceMax/ReduceMin, Sin/Cos, and Random* ops (#27737, #27743, #27742, #27744, #27754, #27755, #27756, #27759)
- Added LpNormalization support for CUDA EP (#28724)
- Added chunked dequant+GEMM for MatMulNBits to reduce peak GPU memory (#28712)
- Added QMoE tests for standard swiglu and improved decode-path routing/softmax kernels (#28741, #29026)
- Fixed CUDA Attention dispatch mismatch for GQA head-size cases (#28358)
- Fixed CUTLASS FMHA bias-loader alignment on unaligned kernel path (#28369)
WebGPU EP
- Added LSTM support on WebGPU (#27881)
- Added per-graph buffer manager for multi-graph capture (#28260)
- Added QKV and MLP layer fusions for Qwen3-style models (#28280)
- Added QKV bias support in FlashAttention for MultiHeadAttention (#28380)
- Added shader dump-to-file environment variable and nightly validation checks (#28674)
- Added opset-24 +...
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10Routine release of inference engine, not major breakthrough.