microsoft/winml-cli
Python
Captured source
source ↗microsoft/winml-cli
Description: Accelerate Model Deployment on WinML
Language: Python
License: MIT
Stars: 24
Forks: 4
Open issues: 189
Created: 2026-02-27T07:34:16Z
Pushed: 2026-06-11T02:50:53Z
Default branch: main
Fork: no
Archived: no
README:
WinML CLI

Windows ML CLI is a command line tool for building portable, performant, and high-quality AI models for Windows ML. It takes you from a source model — whether from Hugging Face or your own pipeline — to a hardware-optimized artifact in a reproducible workflow.
Purpose-built for Windows hardware diversity, the CLI handles conversion, graph optimization, and compilation across AMD, Intel, NVIDIA, and Qualcomm targets. The CLI fits naturally into CI/CD pipelines so teams can validate and ship models easily.
---
:dart: Features
✅ You want to build models that run with Windows ML on any device — seamlessly across CPU, GPU, and NPU
✅ You want to benchmark models with one command — get latency, throughput, and live hardware utilization
✅ You want to optimize models out of the box — with built-in graph optimizations, quantization, and EP-aware tuning
✅ You want deep insights into your model — including unsupported operators, shape mismatches, and execution provider gaps
✅ You want a repeatable and traceable workflow — with config-driven pipelines that are inspectable at every stage
✅ You want AI agents to build and profile models for you — with agent-ready skills for automation via coding assistants
:compass: Scope
WinML CLI supports classic deep learning models for now — LLM support is on the way.
Supported execution providers: QNN · OpenVINO · VitisAI · NvTensorRTRTX · Dml · CPU — covering NPU, GPU, and CPU across Windows ML. See the [Supported Hardware](#supported-hardware) reference table for the full EP-to-device mapping.
The [built-in model catalog](#built-in-models) includes verified models that run across all EPs supported by Windows ML and serve as a reliable starting point. WinML CLI is not limited to these — you can bring any model you have:
- HuggingFace model ID (e.g.,
microsoft/resnet-50) — weights are downloaded on first run - Local ONNX file (e.g.,
model.onnx) — fromwinml export,winml build, or any ONNX you already have
See the [Supported Tasks](#supported-tasks) and [Supported Model Types](#supported-model-types) reference tables for the full list.
Known constraints:
- Some models may export successfully but fail during optimization or quantization due to unsupported operator patterns. The analyzer will flag these issues.
- Performance numbers vary by device, driver version, and EP version. Always benchmark on your target hardware.
---
:rocket: Getting Started
Prerequisites
| Component | Details | |---|---| | Windows | Windows 11 24H2 or later (required for NPU support; earlier versions work for CPU/GPU) | | Python | 3.11 | | Package manager | uv | | WinML CLI | PyPI |
Installation
WinML CLI requires Python 3.11 and is distributed as a Python wheel. We recommend uv for fast, reproducible environment setup.
1. Create an environment
uv venv --python 3.11
Activate it:
# Windows (PowerShell) .venv\Scripts\activate # Windows (Git Bash / WSL) source .venv/Scripts/activate
2. Install winml-cli
uv pip install winml-cli
3. Verify your environment
uv run winml sys --list-device --list-ep
--list-device and --list-ep print only the hardware and EP inventory, skipping SDK versions and Python environment details that plain winml sys would include. If the command exits without error, your winml-cli install is ready.
Quick Start
WinML CLI supports two ways to build a model — choose the one that fits your workflow:
- [Config-Build Driven Pipeline](#config-build-pipeline) — generate a config file first, then run a single build command. Best for reproducible, CI/CD-friendly workflows.
- [Primitive Commands](#step-by-step-through-primitive-commands) — run each pipeline stage individually. Best for exploring, debugging, or custom workflows.
This walkthrough uses facebook/convnext-tiny-224 as an example model.
Config-Build Pipeline
##### Step 0: Check model readiness
Before running any pipeline command, verify the model is supported:
uv run winml inspect -m facebook/convnext-tiny-224
This prints the model's task, model class, input/output tensor names and shapes, and execution provider compatibility — without downloading weights. If inspect succeeds, the model is supported and you can proceed.
##### Step 1: Generate the build config
uv run winml config -m facebook/convnext-tiny-224 --device auto -o convnext_config.json
winml config queries Hugging Face, auto-detects the task and model type, and produces a WinMLBuildConfig JSON. Passing --device auto tells the config generator to resolve the target device at generation time — it inspects your hardware and writes the winning device (NPU, GPU, or CPU) together with matching precision and compile settings into convnext_config.json. You can open the file to see exactly what was picked before committing to a full build.
##### Step 2: Run the build
uv run winml build -c convnext_config.json -m facebook/convnext-tiny-224 -o convnext_out/
This single command runs all four pipeline stages in sequence — export, optimize, quantize, and compile — reading the device and precision settings recorded in convnext_config.json. The compile stage targets whichever device the config captured: it calls the QNN backend and embeds a pre-compiled Hexagon binary on NPU, or it compiles a DirectML graph on GPU, or it produces a standard optimized ONNX for CPU. All intermediate artifacts land in convnext_out/, so you can inspect or reuse any stage independently.
You can also pass --no-quant or --no-compile to stop the pipeline early, or --rebuild to force re-running even when cached artifacts exist.
##### Step 3: Benchmark on your device
uv run winml perf -m convnext_out/.onnx --device auto --iterations 50 --monitor
Replace `` with the…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low-star routine repo from Microsoft