ReleaseMicrosoftMicrosoftpublished Apr 15, 2026seen 2d

microsoft/onnxruntime-genai v0.13.0

microsoft/onnxruntime-genai

Open original ↗

Captured source

source ↗
published Apr 15, 2026seen 2dcaptured 8hhttp 200method plain

v0.13.0

Repository: microsoft/onnxruntime-genai

Tag: v0.13.0

Published: 2026-04-15T19:56:04Z

Prerelease: no

Release notes:

What's Changed

  • update WebGPU buffer memory info name by @fs-eire in https://github.com/microsoft/onnxruntime-genai/pull/1957
  • Add enable_profiling in Runtime Options by @xiaofeihan1 in https://github.com/microsoft/onnxruntime-genai/pull/1949
  • Fix uninitialized tools variable and improve exception debug messages by @sheller-ms in https://github.com/microsoft/onnxruntime-genai/pull/1971
  • Add common download to Phi-3 tutorial by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/1973
  • Add support for InternLM2 model architecture by @amdrajeevp1 in https://github.com/microsoft/onnxruntime-genai/pull/1958
  • Update cmake cuda architecture and use win-arm64 pool workaround by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/1976
  • Update examples after 0.12.0 release by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/1980
  • Add CI pipeline for WebGPU EP model testing by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/1956
  • Fix Python nightly build by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/1981
  • Add missing Quark 0.11 weight patterns for ChatGLM3 output layer by @poganesh in https://github.com/microsoft/onnxruntime-genai/pull/1983
  • Support Qwen2.5-VL pre-quantized models in qwen.py by @poganesh in https://github.com/microsoft/onnxruntime-genai/pull/1985
  • [VitisAI] external_ep_libray support fix for WinML by @akholodnamdcom in https://github.com/microsoft/onnxruntime-genai/pull/1984
  • Fix guidance bug by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/1988
  • Fix incorrect batch responses when using multiple prompts by @lnigam in https://github.com/microsoft/onnxruntime-genai/pull/1986
  • Enable webgpu graph capture in base.py by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/1991
  • Harden CUDA error checking across the codebase by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/1994
  • allow pruned models for prefill by @fs-eire in https://github.com/microsoft/onnxruntime-genai/pull/1995
  • Fix WinML Packaging Pipeline by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/1998
  • Add small changes after pruning prefill by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2000
  • webgpu: Optimize Copyfrom by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/1992
  • Add support for CUDA 13 by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2001
  • add webgpu to qmoe path by @guschmue in https://github.com/microsoft/onnxruntime-genai/pull/2005
  • Fix ERNIE 4.5 model builder: rope_attrs and config architecture name by @xiaoyao9184 in https://github.com/microsoft/onnxruntime-genai/pull/2007
  • Bug fix in Continuous Decoding by @chilukam-qti in https://github.com/microsoft/onnxruntime-genai/pull/2008
  • Update Phi-4 mm README links by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2014
  • Add Qwen3-VL model support + multi-image input support in Qwen VL family by @hanbitmyths in https://github.com/microsoft/onnxruntime-genai/pull/2003
  • Add Qwen3.5 model support and optimize multi-image handling by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2019
  • Reuse a single generator via RewindTo(0) in benchmark instead of creating multiple generators by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/2002
  • [RyzenAI] WinML compatibility fix by @akholodnamdcom in https://github.com/microsoft/onnxruntime-genai/pull/2026
  • Nemotron ASR Support for Streaming by @nenad1002 in https://github.com/microsoft/onnxruntime-genai/pull/1997
  • [WebGPU] Fix the prefill regression when graph capture is ON by @qjia7 in https://github.com/microsoft/onnxruntime-genai/pull/2021
  • Support 4 inputs for nemotron model by @jiafatom in https://github.com/microsoft/onnxruntime-genai/pull/2036
  • Updated java packaging based on python packaging logic by @EPNW-Eric in https://github.com/microsoft/onnxruntime-genai/pull/2029
  • Fix android packaging pipeline by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2039
  • Add OpenAI's Whisper to model builder by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2018
  • [Java] Add a dependency on onnxruntime (#2030) by @EPNW-Eric in https://github.com/microsoft/onnxruntime-genai/pull/2040
  • Fix mutually exclusive inputs for language models by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2046
  • Decouple plugin execution providers (EPs) from the USE_WINML pre-processor macro by @baijumeswani in https://github.com/microsoft/onnxruntime-genai/pull/2038
  • Route pipeline model RunOptions through SetRunOption for proper special key handling by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2044
  • Add ort_build_version and ort_build_source parameters to nuget and python packaging pipelines, remove ROCm support by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2049
  • Add batched multi-image vision path and window_size config for Qwen VL by @hanbitmyths in https://github.com/microsoft/onnxruntime-genai/pull/2050
  • docs: fix formatting and syntax highlighting in documentation by @riddles-the-one in https://github.com/microsoft/onnxruntime-genai/pull/2051
  • Add Silero VAD Support to Nemotron Streaming ASR by @sayanshaw24 in https://github.com/microsoft/onnxruntime-genai/pull/2035
  • Add Qwen3.5 hybrid decoder export support (GatedDeltaNet + Attention) by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2043
  • Add support for QNN stateful models by @qti-ashimaj in https://github.com/microsoft/onnxruntime-genai/pull/2012
  • Allocate recurrent state via device allocator to enable CUDA graph capture by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2057
  • Speed up CI pipelines by @Copilot in https://github.com/microsoft/onnxruntime-genai/pull/2052
  • Fix tool calling for TRT-RTX models by @kunal-vaishnavi in https://github.com/microsoft/onnxruntime-genai/pull/2048
  • Fix vision pipeline EP hardcoding and pixel_values rank mismatch for Qwen VL models by @apsonawane in https://github.com/microsoft/onnxruntime-genai/pull/2060

New Contributors

  • @sheller-ms made their first contribution in https://github.com/microsoft/onnxruntime-genai/pull/1971

*…

Excerpt shown — open the source for the full document.