ReleaseMicrosoftMicrosoftpublished May 29, 2026seen 5d

microsoft/onnxruntime plugin-ep-webgpu/v0.1.0

microsoft/onnxruntime

Open original ↗

Captured source

source ↗
published May 29, 2026seen 5dcaptured 9hhttp 200method plain

ONNX Runtime WebGPU Plugin EP v0.1.0

Repository: microsoft/onnxruntime

Tag: plugin-ep-webgpu/v0.1.0

Published: 2026-05-29T17:53:51Z

Prerelease: no

Release notes: We're excited to ship the first release of the WebGPU Execution Provider as a plugin EP for ONNX Runtime. Instead of being baked into the core onnxruntime binary, the WebGPU EP is now distributed as a standalone artifact that registers with an existing ONNX Runtime installation at runtime.

Highlights

  • Broad operator coverage on WebGPU. Native WebGPU kernels for the operators needed by common transformer, vision, and generative workloads — including Conv variants, MatMul/Gemm, normalizations, attention (Attention, MultiHeadAttention, GroupQueryAttention), rotary embeddings, quantized matmul, quantized Mixture-of-Experts (QMoE), and more. See the Operator coverage section below for a summary.
  • Quantized & accelerated kernels. DP4A and subgroup-matrix MatMulNBits, a FlashAttention kernel, and vendor-optimized Intel MatMul/Gemm paths. See the Performance features section below.
  • Plugin EP packaging. WebGPU support now ships as a separate, independently versioned library (onnxruntime_providers_webgpu) that plugs into a compatible ONNX Runtime (1.24.4 or newer) at runtime. Users can adopt WebGPU acceleration without switching their core ORT package, and the EP can iterate on its own cadence.
  • Cross-platform native binaries for Windows x64/arm64 (bundled with dxil.dll / dxcompiler.dll), Linux x64, and macOS arm64.
  • Language packages.
  • Python: onnxruntime-ep-webgpu wheel, installed alongside the onnxruntime package, registered via onnxruntime.register_execution_provider_library(...). See package page for details on installation and usage.
  • .NET: Microsoft.ML.OnnxRuntime.EP.WebGpu NuGet package, referenced alongside Microsoft.ML.OnnxRuntime, registered via OrtEnv.RegisterExecutionProviderLibrary(...). See package page for details on installation and usage.

Operator coverage

The WebGPU EP registers kernels for the majority of ONNX standard-domain operators used by mainstream model architectures, plus a curated set of com.microsoft contrib operators. Highlights by category:

  • Math, normalization & reduction: MatMul, Gemm, Softmax, LayerNormalization, RMSNormalization, InstanceNormalization, BatchNormalization, LpNormalization, unary/binary elementwise ops, all standard reductions (ReduceMean, ReduceSum, ReduceMax, ...), CumSum, Einsum, TopK, ArgMax/ArgMin.
  • Neural network: Conv, ConvTranspose, MaxPool/AveragePool (and Global* variants), plus a FusedConv contrib op.
  • Tensor manipulation: Transpose, Reshape, Slice, Concat, Split, Gather/GatherElements/GatherND, ScatterElements/ScatterND, Pad, Tile, Cast, Resize, GridSample, Where, Flatten, Squeeze, Identity, Shape, and more.
  • Transformer / LLM contrib ops: Attention, MultiHeadAttention, GroupQueryAttention, RotaryEmbedding, SkipLayerNormalization, SkipSimplifiedLayerNormalization, SimplifiedLayerNormalization, BiasAdd, BiasGelu, BiasSplitGelu, FastGelu, Gelu, QuickGelu, CausalConvWithState, LinearAttention.
  • Quantization: DequantizeLinear, MatMulNBits (with DP4A and subgroup-matrix paths), GatherBlockQuantized, QMoE.

For the authoritative list, see the kernel registrations in `webgpu_execution_provider.cc` and `webgpu_contrib_kernels.cc`.

Performance features

  • DP4A and subgroup-matrix MatMulNBits paths for accelerated quantized matmul on supported hardware.
  • FlashAttention kernel for attention-heavy workloads.
  • Intel-optimized MatMul/Gemm code paths for improved performance on Intel GPUs.
  • Program caching to amortize shader compilation costs across runs.
  • Optional PIX frame capture and WebGPU profiler integration for performance investigation.

Known limitations

  • Platform support in this release is limited to the platforms listed above (no mobile, no Linux arm64, no macOS x64).

Acknowledgments

This initial release is the result of contributions from engineers at Microsoft, Intel, and the broader community. Thank you to everyone who built, reviewed, and tested the WebGPU plugin EP — including (in alphabetical order):

@aciddelgado, @adrastogi, @adrianlizarraga, @chilo-ms, @daijh, @derdeljan-msft, @edgchen1, @eserscor, @feich-ms, @fs-eire, @guschmue, @HectorSVC, @ingyukoh, @jchen10, @jiangzhaoming, @Jiawei-Shao, @jing-bao, @justinchuby, @kunal-vaishnavi, @mindest, @prathikr, @qjia7, @satyajandhyala, @shaoboyan091, @sheetalarkadam, @skottmckay, @snnn, @sushraja-msft, @tianleiwu, @titaiwangms, @TomCrypto, @vraspar, @wenqinI, @xenova, @xhcao, @xiaofeihan1, @yuslepukhin.

Special thanks to the Intel team for the vendor-optimized MatMul/Gemm kernels.

Note: This list was compiled on a best-effort basis from PRs that touched…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New plugin release for ONNX Runtime