ReleaseMicrosoftMicrosoftpublished Jun 9, 2026seen 1d

microsoft/Olive v0.13.0

microsoft/Olive

Open original ↗

Captured source

source ↗
published Jun 9, 2026seen 1dcaptured 1dhttp 200method plain

Olive-ai 0.13.0

Repository: microsoft/Olive

Tag: v0.13.0

Published: 2026-06-09T21:32:24Z

Prerelease: no

Release notes:

Olive 0.13.0

New Features

  • MobiusBuilder pass for Mobius-backed ONNX export (#2406, #2447, #2472, #2471, by @justinchuby and @xiaoyu-work): Added a new pass (originally MobiusModelBuilder, renamed to MobiusBuilder) that exports ONNX via Mobius, produces loadable ORT GenAI composite packages with caching, and added a CLI option to capture the ONNX graph.
  • QairtPipeline pass for QCOM devices (#2465, by @qti-kromero): Added a single-pass QAIRT LLM pipeline driven by a YAML recipe that runs model loading, quantization, and compilation end-to-end, replacing the multi-step QairtPreparation→QairtGenAIBuilder workflow.
  • PyTorch-native K-quant pass (#2479, by @jambayk): Added a KQuant pass implementing ggml-style weight-only K-quant quantization (asymmetric and symmetric, 2/4/8-bit), with Rtn and KQuant now advertising uint2/int2 precisions.
  • ONNX K-quant quantization pass (#2428, by @jiafatom): Added an OnnxKquantQuantization pass for K-quant quantization of ONNX models.
  • INT8 embedding quantization surgeries (#2464, by @apsonawane): Added QuantizeEmbeddingInt8 and ShareEmbeddingLmHead graph surgeries for INT8 embedding quantization and shared embedding/LM-head weights.
  • SimplifiedLayerNormToRMSNorm surgery (#2348, by @unnim-qti): Added a graph surgery to convert SimplifiedLayerNorm nodes to RMSNorm.
  • LFM2 hybrid model support (#2410, by @ykhrustalev): Added support for LFM2 hybrid models.
  • ONNX discrepancy check pass (#2478, by @xadupre): Added a pass to measure numerical discrepancies on a test model to help validate conversions and optimizations.
  • AMD VitisAI SD1.5 support (#2359, by @liujij): Added Stable Diffusion 1.5 support for the VitisAI execution path.
  • QNN ABI execution provider support (#2434, by @rM-planet): Added Olive changes to support the QNN ABI execution provider.
  • Whisper recipe integration (#2450, by @kunal-vaishnavi): Added changes to integrate Olive with Whisper recipes.
  • Speech evaluation metrics (#2444, by @jiafatom): Added WER and RTFx speech evaluation metrics to the Olive evaluator.
  • Vision evaluation metrics and inference path (#2476, #2488, by @jiafatom): Added vision evaluation metrics (exact_match, relaxed_accuracy, word_sort_ratio) and a vision GenAI inference path for multi-file VLM evaluation.
  • HY-MT evaluation workflows (#2482, by @hanbitmyths): Added support for HY-MT evaluation workflows.
  • ORTGenAI backend option for benchmark CLI (#2420, by @GopalakrishnanN): Added a --backend option (auto/ort/ortgenai) to the olive benchmark command for ONNX models while preserving existing defaults.
  • Chat-template hooks for ORT GenAI LM evaluation (#2462, by @ykhrustalev): Added chat-template hooks to LMEvalORTGenAIEvaluator.
  • Test CLI path for small random models (#2459, by @Copilot): Added a --test HF CLI path for 2-layer random model configs with olive run and ModelBuilder support.

Improvements

  • Selective mixed-precision enhancements (#2475, by @jambayk): Added QKV-aware overrides, an AUTO memory mode, and MULTI_GPU dispatch to the selective mixed-precision pass.
  • Model package CLI alignment (#2495, #2445, by @xiaoyu-work): Aligned the generate-model-package CLI with onnxruntime-genai and updated it to match the latest schema.
  • ORT GenAI generation comparison in discrepancy check (#2487, by @xadupre): Added an ONNX Runtime GenAI generation comparison in the OnnxDiscrepancyCheck pass.
  • Vision VQA evaluation alignment (#2499, by @jiafatom): Improved vision VQA evaluation with dynamic choice detection, configurable max_length, and more robust error handling.
  • Faster ORT GenAI evaluation (#2452, by @justinchuby): Used get_logits() to avoid a massive GPU→CPU logits copy in the ORT GenAI evaluator.
  • Tie-word embedding surgery update (#2430, by @apsonawane): Updated the tie-word embedding graph surgery.
  • Deprecate auto-opt command (#2442, by @shaahji): Marked the auto-opt command as deprecated.

Security

  • Disable trusting remote code by default (#2413, by @shaahji): Stopped implicitly trusting remote code so it is no longer executed unless explicitly enabled.

Bug Fixes

  • Fix optimize CLI EP and device (#2418, by @jambayk): Fixed the optimize CLI to correctly set the system execution provider and device.
  • Fix MTEBEvaluator embedding evaluation (#2415, by @natke): Fixed device mapping, padding-free GenAI inference, last-token pooling, and L2 normalization, closing the score gap between HF and GenAI evaluation.
  • Fix node output issues (#2497, by @apsonawane): Fixed node output handling issues.
  • Fix input validation and multiple-choice handling (#2501, by @apsonawane): Fixed input validation issues and updated multiple-choice options handling.
  • Handle…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine tool update, no major launch or traction evidence.