What does this release signal mean?

Microsoft published microsoft/onnxruntime-genai v0.14.0 (microsoft/onnxruntime-genai). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: Microsoft library for running generative AI models with ONNX Runtime. · v0.14.0 Repository: microsoft/onnxruntime-genai Tag: v0.14.0 Published: 2026-05-29T18:06:43Z Prerelease: no Release notes: What's Changed * Fix WhisperProcessor.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

Microsoft Release: microsoft/onnxruntime-genai v0.14.0

Captured source

source ↗

GitHub/github.com/microsoft/onnxruntime-genai

microsoft/onnxruntime-genai v0.14.0

Source ↗

published May 29, 2026seen Jun 9captured Jun 11http 200method plain

v0.14.0

Repository: microsoft/onnxruntime-genai

Tag: v0.14.0

Published: 2026-05-29T18:06:43Z

Prerelease: no

Release notes:

What's Changed

Fix WhisperProcessor divide-by-zero when single prompt is provided by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2068
Fix lm_head tensor loading order dependency in quantized model builder by @thpereir in https://github.com/microsoft/onnxruntime-genai/pull/2061
Fail to build Whisper model by @xiaofeihan1 in https://github.com/microsoft/onnxruntime-genai/pull/2075
Rename NemotronCacheConfig to NemotronConfig and add blank penalty to the decoder by @nenad1002 in https://github.com/microsoft/onnxruntime-genai/pull/2042
Fix YaRN RoPE bugs in model builder and add parity tests by @titaiwangms in https://github.com/microsoft/onnxruntime-genai/pull/2076
Add Transformers v5 Support by @sayanshaw24 in https://github.com/microsoft/onnxruntime-genai/pull/2089
macOS ARM64 ADO pipeline by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2091
Reduce CPU-side per-token overhead in GenerateNextToken and SampleTopP by @hanbitmyths in https://github.com/microsoft/onnxruntime-genai/pull/2085
Add onStageComplete by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2074
[WebGPU] Support continuous decoding (RewindTo) with graph capture by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/2083
[Mistral3] Add VLM support with multi-image inference by @titaiwangms in https://github.com/microsoft/onnxruntime-genai/pull/2077
Add k_quant_linear mixed-precision quantization for hybrid attention … by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2100
Removes QNN packaging from onnxruntime-genai pipelines by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2109
Add Gemma4 multimodal support (vision + audio) by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2103
Update GUIDs during az login by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2122
Add CODEOWNERS file for repository ownership by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2119
Qwen3.5: drop fp32 cast around RMSNorm in builder by @xiaofeihan1 in https://github.com/microsoft/onnxruntime-genai/pull/2101
Add support for LFM2 in ORT GenAI by @xenova in https://github.com/microsoft/onnxruntime-genai/pull/1979
Enable CUDA graph capture for CUDA EP to improve decode throughput by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2070
[Qwen3.5] dedup position ids by @daijh in https://github.com/microsoft/onnxruntime-genai/pull/2102
Address win-cuda pipeline errors by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2154
Update Extensions Commit to Fix Id2Token Bugs by @sayanshaw24 in https://github.com/microsoft/onnxruntime-genai/pull/2159
Limit the CUDA cmake architectures to 86 for CI builds by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2161
Gate leaked-object error reporting in Shutdown() to debug builds or when logging is enabled by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2162
Update Copilot instructions for reviewing model builder by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2164
Fix DecoderState input_ids check regression introduced in #2103 by @titaiwangms in https://github.com/microsoft/onnxruntime-genai/pull/2148
Fix memory leaks by @skottmckay in https://github.com/microsoft/onnxruntime-genai/pull/2153
[Qwen3.5] Use LpNormalization for L2-norm in linear-attention Q/K by @xiaofeihan1 in https://github.com/microsoft/onnxruntime-genai/pull/2127
Fix: Win32 build failure when paths contain spaces by @nsubaru in https://github.com/microsoft/onnxruntime-genai/pull/2053
Fix CUDA build with MSVC by enabling /Zc:preprocessor for nvcc host compilation on VS 16.5 or greater by @nsubaru in https://github.com/microsoft/onnxruntime-genai/pull/2054
Apply linear rope_scaling in model builder for Neutts/nano by @VishalX in https://github.com/microsoft/onnxruntime-genai/pull/2142
Fix Quark/AWQ weight loading for Qwen3-VL-4B text model by @anilmartha in https://github.com/microsoft/onnxruntime-genai/pull/2143
Fix WebGPU inference crash in embedding and multi-modal feature allocation by @feich-ms in https://github.com/microsoft/onnxruntime-genai/pull/2163
Support Visual Studio 18 2026 build by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2017
Add QNN EP documentation to OGA including Genie note by @qti-kromero in https://github.com/microsoft/onnxruntime-genai/pull/2158
Use windowsml package and make winml usage simpler by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2155
Cleanup TensorObject created by OrtxTensorResultGetAt by @skottmckay in https://github.com/microsoft/onnxruntime-genai/pull/2168
Fix nemotron leaks by @skottmckay in https://github.com/microsoft/onnxruntime-genai/pull/2169
[RyzenAI] make speech sub-model optional in PhiMultiModalProcessor by @manasablrm in https://github.com/microsoft/onnxruntime-genai/pull/2167
Enable graph capture for WebGPU models and DML continuous decoding tests by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/2099
[Qwen3] Allow packed QKV MatMul under QK-Norm via post-MatMul Split by @xiaofeihan1 in https://github.com/microsoft/onnxruntime-genai/pull/2137
Enable Linux ARM64 builds and packaging by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2107
Add gemma4 unit tests by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2151
Auto-detect fixed kv-cache shape in DefaultKeyValueCache by @akholodnamdcom in https://github.com/microsoft/onnxruntime-genai/pull/2166
Add text-only mode support for Qwen 3.5 model builder by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2157
Fix heap overflow issue by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2110
[Benchmark] Add --use_random_tokens flag to C benchmark by @VishalX in https://github.com/microsoft/onnxruntime-genai/pull/2170
Add HunYuan Dense V1 (hunyuan_v1_dense) model support by @anilmartha in https://github.com/microsoft/onnxruntime-genai/pull/2144
Nvidia Parakeet Tdt ASR support by @nenad1002 in https://github.com/microsoft/onnxruntime-genai/pull/2150
Multilingual Streaming Nemotron ASR + CUDA support...

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine minor library release