NVIDIA/cudnn-frontend v1.21.0
NVIDIA/cudnn-frontend
Captured source
source ↗v1.21.0-release
Repository: NVIDIA/cudnn-frontend
Tag: v1.21.0
Published: 2026-03-25T03:18:51Z
Prerelease: no
Release notes:
cuDNN Frontend v1.21.0 Release Notes (https://github.com/NVIDIA/cudnn-frontend/pull/213)
cuDNN Frontend v1.21.0 is the recommended version for cuDNN 9.20.0 and later releases.
General Improvements 🚀
- Dropped dependency on the CUDA driver API for the frontend library, enabling builds without direct CUDA driver linkage.
Open-Source Kernels
Added new kernels for the GEMM fusions.
[Grouped GEMM + GLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/grouped_gemm_glu): Unified grouped GEMM GLU API supporting dense and discrete MoE weight layouts with optional bias. [Grouped GEMM + dGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/grouped_gemm_dglu): Unified grouped GEMM dGLU backward API supporting dense and discrete MoE weight layouts with optional bias. [Discrete Grouped GEMM + SwiGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/discrete_grouped_gemm_swiglu): Per-expert-pointer SwiGLU grouped GEMM for MoE workloads without weight packing. [Discrete Grouped GEMM + dSwiGLU](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/discrete_grouped_gemm/discrete_grouped_gemm_dswiglu): Per-expert-pointer dSwiGLU backward grouped GEMM for MoE workloads without weight packing. Uses dSwiGLU/dGeGLU backward epilogue. [Grouped GEMM + dSwiglu](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/gemm_dswiglu): dSwiglu activation fused with Grouped GEMM [Grouped GEMM + Quant](https://github.com/NVIDIA/cudnn-frontend/tree/main/python/cudnn/grouped_gemm/grouped_gemm_quant): Grouped GEMM with output quantization for MoE FC2/dFC1 workloads
Notability
notability 3.0/10Routine library release, no traction.