microsoft/onnxruntime-genai v0.14.0
microsoft/onnxruntime-genai
Captured source
source ↗published May 29, 2026seen 2dcaptured 9hhttp 200method plain
v0.14.0
Repository: microsoft/onnxruntime-genai
Tag: v0.14.0
Published: 2026-05-29T18:06:43Z
Prerelease: no
Release notes:
What's Changed
- Fix WhisperProcessor divide-by-zero when single prompt is provided by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2068
- Fix lm_head tensor loading order dependency in quantized model builder by @thpereir in https://github.com/microsoft/onnxruntime-genai/pull/2061
- Fail to build Whisper model by @xiaofeihan1 in https://github.com/microsoft/onnxruntime-genai/pull/2075
- Rename NemotronCacheConfig to NemotronConfig and add blank penalty to the decoder by @nenad1002 in https://github.com/microsoft/onnxruntime-genai/pull/2042
- Fix YaRN RoPE bugs in model builder and add parity tests by @titaiwangms in https://github.com/microsoft/onnxruntime-genai/pull/2076
- Add Transformers v5 Support by @sayanshaw24 in https://github.com/microsoft/onnxruntime-genai/pull/2089
- macOS ARM64 ADO pipeline by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2091
- Reduce CPU-side per-token overhead in GenerateNextToken and SampleTopP by @hanbitmyths in https://github.com/microsoft/onnxruntime-genai/pull/2085
- Add onStageComplete by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2074
- [WebGPU] Support continuous decoding (RewindTo) with graph capture by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/2083
- [Mistral3] Add VLM support with multi-image inference by @titaiwangms in https://github.com/microsoft/onnxruntime-genai/pull/2077
- Add k_quant_linear mixed-precision quantization for hybrid attention … by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2100
- Removes QNN packaging from onnxruntime-genai pipelines by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2109
- Add Gemma4 multimodal support (vision + audio) by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2103
- Update GUIDs during az login by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2122
- Add CODEOWNERS file for repository ownership by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2119
- Qwen3.5: drop fp32 cast around RMSNorm in builder by @xiaofeihan1 in https://github.com/microsoft/onnxruntime-genai/pull/2101
- Add support for LFM2 in ORT GenAI by @xenova in https://github.com/microsoft/onnxruntime-genai/pull/1979
- Enable CUDA graph capture for CUDA EP to improve decode throughput by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2070
- [Qwen3.5] dedup position ids by @daijh in https://github.com/microsoft/onnxruntime-genai/pull/2102
- Address win-cuda pipeline errors by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2154
- Update Extensions Commit to Fix Id2Token Bugs by @sayanshaw24 in https://github.com/microsoft/onnxruntime-genai/pull/2159
- Limit the CUDA cmake architectures to 86 for CI builds by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2161
- Gate leaked-object error reporting in Shutdown() to debug builds or when logging is enabled by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2162
- Update Copilot instructions for reviewing model builder by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2164
- Fix DecoderState input_ids check regression introduced in #2103 by @titaiwangms in https://github.com/microsoft/onnxruntime-genai/pull/2148
- Fix memory leaks by @skottmckay in https://github.com/microsoft/onnxruntime-genai/pull/2153
- [Qwen3.5] Use LpNormalization for L2-norm in linear-attention Q/K by @xiaofeihan1 in https://github.com/microsoft/onnxruntime-genai/pull/2127
- Fix: Win32 build failure when paths contain spaces by @nsubaru in https://github.com/microsoft/onnxruntime-genai/pull/2053
- Fix CUDA build with MSVC by enabling /Zc:preprocessor for nvcc host compilation on VS 16.5 or greater by @nsubaru in https://github.com/microsoft/onnxruntime-genai/pull/2054
- Apply linear rope_scaling in model builder for Neutts/nano by @VishalX in https://github.com/microsoft/onnxruntime-genai/pull/2142
- Fix Quark/AWQ weight loading for Qwen3-VL-4B text model by @anilmartha in https://github.com/microsoft/onnxruntime-genai/pull/2143
- Fix WebGPU inference crash in embedding and multi-modal feature allocation by @feich-ms in https://github.com/microsoft/onnxruntime-genai/pull/2163
- Support Visual Studio 18 2026 build by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2017
- Add QNN EP documentation to OGA including Genie note by @qti-kromero in https://github.com/microsoft/onnxruntime-genai/pull/2158
- Use windowsml package and make winml usage simpler by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2155
- Cleanup TensorObject created by OrtxTensorResultGetAt by @skottmckay in https://github.com/microsoft/onnxruntime-genai/pull/2168
- Fix nemotron leaks by @skottmckay in https://github.com/microsoft/onnxruntime-genai/pull/2169
- [RyzenAI] make speech sub-model optional in PhiMultiModalProcessor by @manasablrm in https://github.com/microsoft/onnxruntime-genai/pull/2167
- Enable graph capture for WebGPU models and DML continuous decoding tests by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/2099
- [Qwen3] Allow packed QKV MatMul under QK-Norm via post-MatMul Split by @xiaofeihan1 in https://github.com/microsoft/onnxruntime-genai/pull/2137
- Enable Linux ARM64 builds and packaging by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2107
- Add gemma4 unit tests by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2151
- Auto-detect fixed kv-cache shape in DefaultKeyValueCache by @akholodnamdcom in https://github.com/microsoft/onnxruntime-genai/pull/2166
- Add text-only mode support for Qwen 3.5 model builder by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2157
- Fix heap overflow issue by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2110
- [Benchmark] Add --use_random_tokens flag to C benchmark by @VishalX in https://github.com/microsoft/onnxruntime-genai/pull/2170
- Add HunYuan Dense V1 (hunyuan_v1_dense) model support by @anilmartha in https://github.com/microsoft/onnxruntime-genai/pull/2144
- Nvidia Parakeet Tdt ASR support by @nenad1002 in https://github.com/microsoft/onnxruntime-genai/pull/2150
- Multilingual Streaming Nemotron ASR + CUDA support…
Excerpt shown — open the source for the full document.