What does this repo signal mean?

Microsoft published microsoft/winml-cli (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo microsoft/winml-cli · language Python · Low-star routine repo from Microsoft. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Product and customer in the data-business radar.

Microsoft Repo: microsoft/winml-cli

Captured source

source ↗

GitHub/github.com/microsoft/winml-cli

microsoft/winml-cli repository metadata

Source ↗

published Feb 27, 2026seen Jun 8captured Jun 11http 200method plain

microsoft/winml-cli

Description: Accelerate Model Deployment on WinML

Language: Python

License: MIT

Stars: 24

Forks: 4

Open issues: 189

Created: 2026-02-27T07:34:16Z

Pushed: 2026-06-11T02:50:53Z

Default branch: main

Fork: no

Archived: no

README:

WinML CLI

![WinML CLI CI](https://github.com/microsoft/winml-cli/actions/workflows/modelkit-ci.yml)

Windows ML CLI is a command line tool for building portable, performant, and high-quality AI models for Windows ML. It takes you from a source model — whether from Hugging Face or your own pipeline — to a hardware-optimized artifact in a reproducible workflow.

Purpose-built for Windows hardware diversity, the CLI handles conversion, graph optimization, and compilation across AMD, Intel, NVIDIA, and Qualcomm targets. The CLI fits naturally into CI/CD pipelines so teams can validate and ship models easily.

---

:dart: Features

✅ You want to build models that run with Windows ML on any device — seamlessly across CPU, GPU, and NPU

✅ You want to benchmark models with one command — get latency, throughput, and live hardware utilization

✅ You want to optimize models out of the box — with built-in graph optimizations, quantization, and EP-aware tuning

✅ You want deep insights into your model — including unsupported operators, shape mismatches, and execution provider gaps

✅ You want a repeatable and traceable workflow — with config-driven pipelines that are inspectable at every stage

✅ You want AI agents to build and profile models for you — with agent-ready skills for automation via coding assistants

:compass: Scope

WinML CLI supports classic deep learning models for now — LLM support is on the way.

Supported execution providers: QNN · OpenVINO · VitisAI · NvTensorRTRTX · Dml · CPU — covering NPU, GPU, and CPU across Windows ML. See the [Supported Hardware](#supported-hardware) reference table for the full EP-to-device mapping.

The [built-in model catalog](#built-in-models) includes verified models that run across all EPs supported by Windows ML and serve as a reliable starting point. WinML CLI is not limited to these — you can bring any model you have:

HuggingFace model ID (e.g., microsoft/resnet-50) — weights are downloaded on first run
Local ONNX file (e.g., model.onnx) — from winml export, winml build, or any ONNX you already have

See the [Supported Tasks](#supported-tasks) and [Supported Model Types](#supported-model-types) reference tables for the full list.

Known constraints:

Some models may export successfully but fail during optimization or quantization due to unsupported operator patterns. The analyzer will flag these issues.
Performance numbers vary by device, driver version, and EP version. Always benchmark on your target hardware.

---

:rocket: Getting Started

Prerequisites

| Component | Details | |---|---| | Windows | Windows 11 24H2 or later (required for NPU support; earlier versions work for CPU/GPU) | | Python | 3.11 | | Package manager | uv | | WinML CLI | PyPI |

Installation

WinML CLI requires Python 3.11 and is distributed as a Python wheel. We recommend uv for fast, reproducible environment setup.

1. Create an environment

uv venv --python 3.11

Activate it:

# Windows (PowerShell)
.venv\Scripts\activate

# Windows (Git Bash / WSL)
source .venv/Scripts/activate

2. Install winml-cli

uv pip install winml-cli

3. Verify your environment

uv run winml sys --list-device --list-ep

--list-device and --list-ep print only the hardware and EP inventory, skipping SDK versions and Python environment details that plain winml sys would include. If the command exits without error, your winml-cli install is ready.

Quick Start

WinML CLI supports two ways to build a model — choose the one that fits your workflow:

[Config-Build Driven Pipeline](#config-build-pipeline) — generate a config file first, then run a single build command. Best for reproducible, CI/CD-friendly workflows.
[Primitive Commands](#step-by-step-through-primitive-commands) — run each pipeline stage individually. Best for exploring, debugging, or custom workflows.

This walkthrough uses facebook/convnext-tiny-224 as an example model.

Config-Build Pipeline

##### Step 0: Check model readiness

Before running any pipeline command, verify the model is supported:

uv run winml inspect -m facebook/convnext-tiny-224

This prints the model's task, model class, input/output tensor names and shapes, and execution provider compatibility — without downloading weights. If inspect succeeds, the model is supported and you can proceed.

##### Step 1: Generate the build config

uv run winml config -m facebook/convnext-tiny-224 --device auto -o convnext_config.json

winml config queries Hugging Face, auto-detects the task and model type, and produces a WinMLBuildConfig JSON. Passing --device auto tells the config generator to resolve the target device at generation time — it inspects your hardware and writes the winning device (NPU, GPU, or CPU) together with matching precision and compile settings into convnext_config.json. You can open the file to see exactly what was picked before committing to a full build.

##### Step 2: Run the build

uv run winml build -c convnext_config.json -m facebook/convnext-tiny-224 -o convnext_out/

This single command runs all four pipeline stages in sequence — export, optimize, quantize, and compile — reading the device and precision settings recorded in convnext_config.json. The compile stage targets whichever device the config captured: it calls the QNN backend and embeds a pre-compiled Hexagon binary on NPU, or it compiles a DirectML graph on GPU, or it produces a standard optimized ONNX for CPU. All intermediate artifacts land in convnext_out/, so you can inspect or reuse any stage independently.

You can also pass --no-quant or --no-compile to stop the pipeline early, or --rebuild to force re-running even when cached artifacts exist.

##### Step 3: Benchmark on your device

uv run winml perf -m convnext_out/.onnx --device auto --iterations 50 --monitor

Replace `` with the...

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low-star routine repo from Microsoft