ReleaseMicrosoftMicrosoftpublished Nov 5, 2025seen 5d

microsoft/Olive v0.10.0

microsoft/Olive

Open original ↗

Captured source

source ↗
published Nov 5, 2025seen 5dcaptured 8hhttp 200method plain

Olive-ai 0.10.0

Repository: microsoft/Olive

Tag: v0.10.0

Published: 2025-11-05T19:24:15Z

Prerelease: no

Release notes:

New Features

  • Quark Quantization for ONNX Models (#2236) — New QuarkQuantization pass via olive run with support for int8/uint8/int16/uint16/int32/uint32/bf16/bfp16 and CLE/SmoothQuant/AdaRound/AdaQuant.
  • Embedding Quantization & RTN Improvements (#2238) — Added QuantEmbedding, a composable Rtn pass, and a unified checkpoint format aligned with MatMulNBits/GatherBlockQuantized (block/shape constraints enforced; AutoGPTQ/AutoAWQ export updated to 2D params).
  • Word Embedding Tying Surgery (#2240) — TieWordEmbeddings ties input embeddings and lm_head for both unquantized (Gemm) and quantized (MatMulNBits + GatherBlockQuantized) graphs.
  • Custom ONNX Model Naming (#2235) — Allows specifying a custom ONNX model name in the output directory.
  • Intel OpenVINO Weight Compression Pass (#2180) — Adds NNCF-based weight compression for HF/ONNX models to OpenVINO or compressed ONNX.

Improvements

  • AIMET Enhancements (#2158, #2187, #2215) — Adds Sequential MSE, enables AIMET in quantize CLI, and supports manual precision overrides.
  • GPTQ Updates (#2202, #2203) — Supports user-provided module overrides and transformers >= 4.53.
  • Quantization Export Compatibility (#2218) — Updates checks for ort-genai > 0.9.0 and fixes minor OnnxDAG name clashes.
  • Torch Dynamo Export Alignment (#2185) — extract_adapter recovers folded LoRA and decomposes DORA-fused Gemm to MatMul for quantization.
  • Post-Surgery Deduplication (#2228) — Runs DeduplicateHashedInitializersPass after surgeries to remove duplicate initializers.
  • QNN Execution Provider: GPU Enablement (#2220) — Enables QNN-EP GPU, updates StaticLLM and ContextBinaryGeneration, keeps NPU default.
  • Run API Ergonomics (#2199) — olive.run() now accepts a dict run_config.
  • OpenVINO Config Overrides (#2191) — Allows overriding genai_config.json properties in OV encapsulation.
  • ReplaceAttentionMaskValue Robustness (#2213) — Adds Shape to ALLOWED_CONSUMER_OPS for text-encoder graphs.
  • Implicit Olive Version Tagging (#2183) — Automatically embeds the Olive version in saved ONNX model protos.

Notability

notability 3.0/10

Routine tool release, no major traction