microsoft/onnxruntime-genai v0.12.2
microsoft/onnxruntime-genai
Captured source
source ↗published Mar 27, 2026seen 2dcaptured 10hhttp 200method plain
v0.12.2
Repository: microsoft/onnxruntime-genai
Tag: v0.12.2
Published: 2026-03-27T17:49:15Z
Prerelease: no
Release notes:
- Update examples after 0.12.0 release
- Add missing Quark 0.11 weight patterns for ChatGLM3 output layer
- [Support Qwen2.5-VL pre-quantized models in qwen.py
](https://github.com/microsoft/onnxruntime-genai/pull/1985)
- [Fix incorrect batch responses when using multiple prompts
](https://github.com/microsoft/onnxruntime-genai/pull/1986)
- Harden CUDA error checking across the codebase
- allow pruned models for prefill
- [Add small changes after pruning prefill
](https://github.com/microsoft/onnxruntime-genai/pull/2000)