ReleaseNVIDIANVIDIApublished Mar 25, 2026seen 5d

NVIDIA/cudnn-frontend v1.21.0

NVIDIA/cudnn-frontend

Open original ↗

Captured source

source ↗
published Mar 25, 2026seen 5dcaptured 13hhttp 200method plain

v1.21.0-release

Repository: NVIDIA/cudnn-frontend

Tag: v1.21.0

Published: 2026-03-25T03:18:51Z

Prerelease: no

Release notes:

cuDNN Frontend v1.21.0 Release Notes (https://github.com/NVIDIA/cudnn-frontend/pull/213)

cuDNN Frontend v1.21.0 is the recommended version for cuDNN 9.20.0 and later releases.

General Improvements 🚀

  • Dropped dependency on the CUDA driver API for the frontend library, enabling builds without direct CUDA driver linkage.

Open-Source Kernels

Added new kernels for the GEMM fusions.

[Grouped GEMM + GLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/grouped_gemm_glu): Unified grouped GEMM GLU API supporting dense and discrete MoE weight layouts with optional bias. [Grouped GEMM + dGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/grouped_gemm_dglu): Unified grouped GEMM dGLU backward API supporting dense and discrete MoE weight layouts with optional bias. [Discrete Grouped GEMM + SwiGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/discrete_grouped_gemm_swiglu): Per-expert-pointer SwiGLU grouped GEMM for MoE workloads without weight packing. [Discrete Grouped GEMM + dSwiGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/discrete_grouped_gemm_dswiglu): Per-expert-pointer dSwiGLU backward grouped GEMM for MoE workloads without weight packing. Uses dSwiGLU/dGeGLU backward epilogue. [Grouped GEMM + dSwiglu](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/gemm_dswiglu): dSwiglu activation fused with Grouped GEMM [Grouped GEMM + Quant](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_quant): Grouped GEMM with output quantization for MoE FC2/dFC1 workloads

Notability

notability 3.0/10

Routine library release, no traction.