microsoft/Olive v0.13.0
microsoft/Olive
Captured source
source ↗published Jun 9, 2026seen 1dcaptured 1dhttp 200method plain
Olive-ai 0.13.0
Repository: microsoft/Olive
Tag: v0.13.0
Published: 2026-06-09T21:32:24Z
Prerelease: no
Release notes:
Olive 0.13.0
New Features
- MobiusBuilder pass for Mobius-backed ONNX export (#2406, #2447, #2472, #2471, by @justinchuby and @xiaoyu-work): Added a new pass (originally
MobiusModelBuilder, renamed toMobiusBuilder) that exports ONNX via Mobius, produces loadable ORT GenAI composite packages with caching, and added a CLI option to capture the ONNX graph. - QairtPipeline pass for QCOM devices (#2465, by @qti-kromero): Added a single-pass QAIRT LLM pipeline driven by a YAML recipe that runs model loading, quantization, and compilation end-to-end, replacing the multi-step QairtPreparation→QairtGenAIBuilder workflow.
- PyTorch-native K-quant pass (#2479, by @jambayk): Added a
KQuantpass implementing ggml-style weight-only K-quant quantization (asymmetric and symmetric, 2/4/8-bit), withRtnandKQuantnow advertisinguint2/int2precisions. - ONNX K-quant quantization pass (#2428, by @jiafatom): Added an
OnnxKquantQuantizationpass for K-quant quantization of ONNX models. - INT8 embedding quantization surgeries (#2464, by @apsonawane): Added
QuantizeEmbeddingInt8andShareEmbeddingLmHeadgraph surgeries for INT8 embedding quantization and shared embedding/LM-head weights. - SimplifiedLayerNormToRMSNorm surgery (#2348, by @unnim-qti): Added a graph surgery to convert SimplifiedLayerNorm nodes to RMSNorm.
- LFM2 hybrid model support (#2410, by @ykhrustalev): Added support for LFM2 hybrid models.
- ONNX discrepancy check pass (#2478, by @xadupre): Added a pass to measure numerical discrepancies on a test model to help validate conversions and optimizations.
- AMD VitisAI SD1.5 support (#2359, by @liujij): Added Stable Diffusion 1.5 support for the VitisAI execution path.
- QNN ABI execution provider support (#2434, by @rM-planet): Added Olive changes to support the QNN ABI execution provider.
- Whisper recipe integration (#2450, by @kunal-vaishnavi): Added changes to integrate Olive with Whisper recipes.
- Speech evaluation metrics (#2444, by @jiafatom): Added WER and RTFx speech evaluation metrics to the Olive evaluator.
- Vision evaluation metrics and inference path (#2476, #2488, by @jiafatom): Added vision evaluation metrics (exact_match, relaxed_accuracy, word_sort_ratio) and a vision GenAI inference path for multi-file VLM evaluation.
- HY-MT evaluation workflows (#2482, by @hanbitmyths): Added support for HY-MT evaluation workflows.
- ORTGenAI backend option for benchmark CLI (#2420, by @GopalakrishnanN): Added a
--backendoption (auto/ort/ortgenai) to theolive benchmarkcommand for ONNX models while preserving existing defaults. - Chat-template hooks for ORT GenAI LM evaluation (#2462, by @ykhrustalev): Added chat-template hooks to
LMEvalORTGenAIEvaluator. - Test CLI path for small random models (#2459, by @Copilot): Added a
--testHF CLI path for 2-layer random model configs witholive runand ModelBuilder support.
Improvements
- Selective mixed-precision enhancements (#2475, by @jambayk): Added QKV-aware overrides, an AUTO memory mode, and MULTI_GPU dispatch to the selective mixed-precision pass.
- Model package CLI alignment (#2495, #2445, by @xiaoyu-work): Aligned the
generate-model-packageCLI with onnxruntime-genai and updated it to match the latest schema. - ORT GenAI generation comparison in discrepancy check (#2487, by @xadupre): Added an ONNX Runtime GenAI generation comparison in the
OnnxDiscrepancyCheckpass. - Vision VQA evaluation alignment (#2499, by @jiafatom): Improved vision VQA evaluation with dynamic choice detection, configurable max_length, and more robust error handling.
- Faster ORT GenAI evaluation (#2452, by @justinchuby): Used
get_logits()to avoid a massive GPU→CPU logits copy in the ORT GenAI evaluator. - Tie-word embedding surgery update (#2430, by @apsonawane): Updated the tie-word embedding graph surgery.
- Deprecate auto-opt command (#2442, by @shaahji): Marked the
auto-optcommand as deprecated.
Security
- Disable trusting remote code by default (#2413, by @shaahji): Stopped implicitly trusting remote code so it is no longer executed unless explicitly enabled.
Bug Fixes
- Fix optimize CLI EP and device (#2418, by @jambayk): Fixed the
optimizeCLI to correctly set the system execution provider and device. - Fix MTEBEvaluator embedding evaluation (#2415, by @natke): Fixed device mapping, padding-free GenAI inference, last-token pooling, and L2 normalization, closing the score gap between HF and GenAI evaluation.
- Fix node output issues (#2497, by @apsonawane): Fixed node output handling issues.
- Fix input validation and multiple-choice handling (#2501, by @apsonawane): Fixed input validation issues and updated multiple-choice options handling.
- Handle…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Routine tool update, no major launch or traction evidence.