microsoft/onnxruntime-genai v0.13.0
microsoft/onnxruntime-genai
Captured source
source ↗published Apr 15, 2026seen 2dcaptured 8hhttp 200method plain
v0.13.0
Repository: microsoft/onnxruntime-genai
Tag: v0.13.0
Published: 2026-04-15T19:56:04Z
Prerelease: no
Release notes:
What's Changed
- update WebGPU buffer memory info name by @fs-eire in https://github.com/microsoft/onnxruntime-genai/pull/1957
- Add
enable_profilingin Runtime Options by @xiaofeihan1 in https://github.com/microsoft/onnxruntime-genai/pull/1949 - Fix uninitialized tools variable and improve exception debug messages by @sheller-ms in https://github.com/microsoft/onnxruntime-genai/pull/1971
- Add common download to Phi-3 tutorial by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/1973
- Add support for InternLM2 model architecture by @amdrajeevp1 in https://github.com/microsoft/onnxruntime-genai/pull/1958
- Update cmake cuda architecture and use win-arm64 pool workaround by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/1976
- Update examples after 0.12.0 release by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/1980
- Add CI pipeline for WebGPU EP model testing by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/1956
- Fix Python nightly build by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/1981
- Add missing Quark 0.11 weight patterns for ChatGLM3 output layer by @poganesh in https://github.com/microsoft/onnxruntime-genai/pull/1983
- Support Qwen2.5-VL pre-quantized models in qwen.py by @poganesh in https://github.com/microsoft/onnxruntime-genai/pull/1985
- [VitisAI] external_ep_libray support fix for WinML by @akholodnamdcom in https://github.com/microsoft/onnxruntime-genai/pull/1984
- Fix guidance bug by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/1988
- Fix incorrect batch responses when using multiple prompts by @lnigam in https://github.com/microsoft/onnxruntime-genai/pull/1986
- Enable webgpu graph capture in base.py by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/1991
- Harden CUDA error checking across the codebase by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/1994
- allow pruned models for prefill by @fs-eire in https://github.com/microsoft/onnxruntime-genai/pull/1995
- Fix WinML Packaging Pipeline by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/1998
- Add small changes after pruning prefill by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2000
- webgpu: Optimize Copyfrom by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/1992
- Add support for CUDA 13 by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2001
- add webgpu to qmoe path by @guschmue in https://github.com/microsoft/onnxruntime-genai/pull/2005
- Fix ERNIE 4.5 model builder: rope_attrs and config architecture name by @xiaoyao9184 in https://github.com/microsoft/onnxruntime-genai/pull/2007
- Bug fix in Continuous Decoding by @chilukam-qti in https://github.com/microsoft/onnxruntime-genai/pull/2008
- Update Phi-4 mm README links by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2014
- Add Qwen3-VL model support + multi-image input support in Qwen VL family by @hanbitmyths in https://github.com/microsoft/onnxruntime-genai/pull/2003
- Add Qwen3.5 model support and optimize multi-image handling by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2019
- Reuse a single generator via RewindTo(0) in benchmark instead of creating multiple generators by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/2002
- [RyzenAI] WinML compatibility fix by @akholodnamdcom in https://github.com/microsoft/onnxruntime-genai/pull/2026
- Nemotron ASR Support for Streaming by @nenad1002 in https://github.com/microsoft/onnxruntime-genai/pull/1997
- [WebGPU] Fix the prefill regression when graph capture is ON by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/2021
- Support 4 inputs for nemotron model by @jiafatom in https://github.com/microsoft/onnxruntime-genai/pull/2036
- Updated java packaging based on python packaging logic by @EPNW-Eric in https://github.com/microsoft/onnxruntime-genai/pull/2029
- Fix android packaging pipeline by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2039
- Add OpenAI's Whisper to model builder by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2018
- [Java] Add a dependency on onnxruntime (#2030) by @EPNW-Eric in https://github.com/microsoft/onnxruntime-genai/pull/2040
- Fix mutually exclusive inputs for language models by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2046
- Decouple plugin execution providers (EPs) from the USE_WINML pre-processor macro by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2038
- Route pipeline model RunOptions through SetRunOption for proper special key handling by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2044
- Add ort_build_version and ort_build_source parameters to nuget and python packaging pipelines, remove ROCm support by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2049
- Add batched multi-image vision path and window_size config for Qwen VL by @hanbitmyths in https://github.com/microsoft/onnxruntime-genai/pull/2050
- docs: fix formatting and syntax highlighting in documentation by @riddles-the-one in https://github.com/microsoft/onnxruntime-genai/pull/2051
- Add Silero VAD Support to Nemotron Streaming ASR by @sayanshaw24 in https://github.com/microsoft/onnxruntime-genai/pull/2035
- Add Qwen3.5 hybrid decoder export support (GatedDeltaNet + Attention) by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2043
- Add support for QNN stateful models by @qti-ashimaj in https://github.com/microsoft/onnxruntime-genai/pull/2012
- Allocate recurrent state via device allocator to enable CUDA graph capture by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2057
- Speed up CI pipelines by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2052
- Fix tool calling for TRT-RTX models by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2048
- Fix vision pipeline EP hardcoding and pixel_values rank mismatch for Qwen VL models by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2060
New Contributors
- @sheller-ms made their first contribution in https://github.com/microsoft/onnxruntime-genai/pull/1971
*…
Excerpt shown — open the source for the full document.