microsoft/Olive v0.10.0
microsoft/Olive
Captured source
source ↗published Nov 5, 2025seen 5dcaptured 8hhttp 200method plain
Olive-ai 0.10.0
Repository: microsoft/Olive
Tag: v0.10.0
Published: 2025-11-05T19:24:15Z
Prerelease: no
Release notes:
New Features
- Quark Quantization for ONNX Models (#2236) — New
QuarkQuantizationpass viaolive runwith support for int8/uint8/int16/uint16/int32/uint32/bf16/bfp16 and CLE/SmoothQuant/AdaRound/AdaQuant. - Embedding Quantization & RTN Improvements (#2238) — Added
QuantEmbedding, a composableRtnpass, and a unified checkpoint format aligned withMatMulNBits/GatherBlockQuantized(block/shape constraints enforced; AutoGPTQ/AutoAWQ export updated to 2D params). - Word Embedding Tying Surgery (#2240) —
TieWordEmbeddingsties input embeddings andlm_headfor both unquantized (Gemm) and quantized (MatMulNBits+GatherBlockQuantized) graphs. - Custom ONNX Model Naming (#2235) — Allows specifying a custom ONNX model name in the output directory.
- Intel OpenVINO Weight Compression Pass (#2180) — Adds NNCF-based weight compression for HF/ONNX models to OpenVINO or compressed ONNX.
Improvements
- AIMET Enhancements (#2158, #2187, #2215) — Adds Sequential MSE, enables AIMET in
quantizeCLI, and supports manual precision overrides. - GPTQ Updates (#2202, #2203) — Supports user-provided module overrides and
transformers >= 4.53. - Quantization Export Compatibility (#2218) — Updates checks for
ort-genai > 0.9.0and fixes minorOnnxDAGname clashes. - Torch Dynamo Export Alignment (#2185) —
extract_adapterrecovers folded LoRA and decomposes DORA-fusedGemmtoMatMulfor quantization. - Post-Surgery Deduplication (#2228) — Runs
DeduplicateHashedInitializersPassafter surgeries to remove duplicate initializers. - QNN Execution Provider: GPU Enablement (#2220) — Enables QNN-EP GPU, updates
StaticLLMandContextBinaryGeneration, keeps NPU default. - Run API Ergonomics (#2199) —
olive.run()now accepts a dictrun_config. - OpenVINO Config Overrides (#2191) — Allows overriding
genai_config.jsonproperties in OV encapsulation. - ReplaceAttentionMaskValue Robustness (#2213) — Adds
ShapetoALLOWED_CONSUMER_OPSfor text-encoder graphs. - Implicit Olive Version Tagging (#2183) — Automatically embeds the Olive version in saved ONNX model protos.
Notability
notability 3.0/10Routine tool release, no major traction