RepoMicrosoftMicrosoftpublished Feb 27, 2026seen 3d

microsoft/winml-cli

Python

Open original ↗

Captured source

source ↗
published Feb 27, 2026seen 3dcaptured 11hhttp 200method plain

microsoft/winml-cli

Description: Accelerate Model Deployment on WinML

Language: Python

License: MIT

Stars: 24

Forks: 4

Open issues: 189

Created: 2026-02-27T07:34:16Z

Pushed: 2026-06-11T02:50:53Z

Default branch: main

Fork: no

Archived: no

README:

WinML CLI

![WinML CLI CI](https://github.com/microsoft/winml-cli/actions/workflows/modelkit-ci.yml)

Windows ML CLI is a command line tool for building portable, performant, and high-quality AI models for Windows ML. It takes you from a source model — whether from Hugging Face or your own pipeline — to a hardware-optimized artifact in a reproducible workflow.

Purpose-built for Windows hardware diversity, the CLI handles conversion, graph optimization, and compilation across AMD, Intel, NVIDIA, and Qualcomm targets. The CLI fits naturally into CI/CD pipelines so teams can validate and ship models easily.

---

:dart: Features

You want to build models that run with Windows ML on any device — seamlessly across CPU, GPU, and NPU

You want to benchmark models with one command — get latency, throughput, and live hardware utilization

You want to optimize models out of the box — with built-in graph optimizations, quantization, and EP-aware tuning

You want deep insights into your model — including unsupported operators, shape mismatches, and execution provider gaps

You want a repeatable and traceable workflow — with config-driven pipelines that are inspectable at every stage

You want AI agents to build and profile models for you — with agent-ready skills for automation via coding assistants

:compass: Scope

WinML CLI supports classic deep learning models for now — LLM support is on the way.

Supported execution providers: QNN · OpenVINO · VitisAI · NvTensorRTRTX · Dml · CPU — covering NPU, GPU, and CPU across Windows ML. See the [Supported Hardware](#supported-hardware) reference table for the full EP-to-device mapping.

The [built-in model catalog](#built-in-models) includes verified models that run across all EPs supported by Windows ML and serve as a reliable starting point. WinML CLI is not limited to these — you can bring any model you have:

  • HuggingFace model ID (e.g., microsoft/resnet-50) — weights are downloaded on first run
  • Local ONNX file (e.g., model.onnx) — from winml export, winml build, or any ONNX you already have

See the [Supported Tasks](#supported-tasks) and [Supported Model Types](#supported-model-types) reference tables for the full list.

Known constraints:

  • Some models may export successfully but fail during optimization or quantization due to unsupported operator patterns. The analyzer will flag these issues.
  • Performance numbers vary by device, driver version, and EP version. Always benchmark on your target hardware.

---

:rocket: Getting Started

Prerequisites

| Component | Details | |---|---| | Windows | Windows 11 24H2 or later (required for NPU support; earlier versions work for CPU/GPU) | | Python | 3.11 | | Package manager | uv | | WinML CLI | PyPI |

Installation

WinML CLI requires Python 3.11 and is distributed as a Python wheel. We recommend uv for fast, reproducible environment setup.

1. Create an environment

uv venv --python 3.11

Activate it:

# Windows (PowerShell)
.venv\Scripts\activate

# Windows (Git Bash / WSL)
source .venv/Scripts/activate

2. Install winml-cli

uv pip install winml-cli

3. Verify your environment

uv run winml sys --list-device --list-ep

--list-device and --list-ep print only the hardware and EP inventory, skipping SDK versions and Python environment details that plain winml sys would include. If the command exits without error, your winml-cli install is ready.

Quick Start

WinML CLI supports two ways to build a model — choose the one that fits your workflow:

  • [Config-Build Driven Pipeline](#config-build-pipeline) — generate a config file first, then run a single build command. Best for reproducible, CI/CD-friendly workflows.
  • [Primitive Commands](#step-by-step-through-primitive-commands) — run each pipeline stage individually. Best for exploring, debugging, or custom workflows.

This walkthrough uses facebook/convnext-tiny-224 as an example model.

Config-Build Pipeline

##### Step 0: Check model readiness

Before running any pipeline command, verify the model is supported:

uv run winml inspect -m facebook/convnext-tiny-224

This prints the model's task, model class, input/output tensor names and shapes, and execution provider compatibility — without downloading weights. If inspect succeeds, the model is supported and you can proceed.

##### Step 1: Generate the build config

uv run winml config -m facebook/convnext-tiny-224 --device auto -o convnext_config.json

winml config queries Hugging Face, auto-detects the task and model type, and produces a WinMLBuildConfig JSON. Passing --device auto tells the config generator to resolve the target device at generation time — it inspects your hardware and writes the winning device (NPU, GPU, or CPU) together with matching precision and compile settings into convnext_config.json. You can open the file to see exactly what was picked before committing to a full build.

##### Step 2: Run the build

uv run winml build -c convnext_config.json -m facebook/convnext-tiny-224 -o convnext_out/

This single command runs all four pipeline stages in sequence — export, optimize, quantize, and compile — reading the device and precision settings recorded in convnext_config.json. The compile stage targets whichever device the config captured: it calls the QNN backend and embeds a pre-compiled Hexagon binary on NPU, or it compiles a DirectML graph on GPU, or it produces a standard optimized ONNX for CPU. All intermediate artifacts land in convnext_out/, so you can inspect or reuse any stage independently.

You can also pass --no-quant or --no-compile to stop the pipeline early, or --rebuild to force re-running even when cached artifacts exist.

##### Step 3: Benchmark on your device

uv run winml perf -m convnext_out/.onnx --device auto --iterations 50 --monitor

Replace `` with the…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low-star routine repo from Microsoft