What does this release signal mean?

NVIDIA published NVIDIA/cudnn-frontend v1.22.1 (NVIDIA/cudnn-frontend). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: C++ frontend for NVIDIA's cuDNN deep learning library · v1.22.1-release Repository: NVIDIA/cudnn-frontend Tag: v1.22.1 Published: 2026-04-10T17:29:31Z Prerelease: no Release notes: cuDNN Frontend v1.22.1 is the recommended.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

NVIDIA Release: NVIDIA/cudnn-frontend v1.22.1

Captured source

source ↗

GitHub/github.com/NVIDIA/cudnn-frontend

NVIDIA/cudnn-frontend v1.22.1

Source ↗

published Apr 10, 2026seen Jun 6captured Jun 11http 200method plain

v1.22.1-release

Repository: NVIDIA/cudnn-frontend

Tag: v1.22.1

Published: 2026-04-10T17:29:31Z

Prerelease: no

Release notes: cuDNN Frontend v1.22.1 is the recommended version for cuDNN 9.20.0 and later releases.

General Improvements 🚀 🚀

Introducing PyTorch custom operator wrapping cuDNN's MoE Grouped Gemm operation.

def moe_grouped_matmul(
token: torch.Tensor,
weight: torch.Tensor,
first_token_offset: torch.Tensor,
token_index: Optional[torch.Tensor] = None,
token_ks: Optional[torch.Tensor] = None,
mode: str = "none",
top_k: int = 1,
) -> torch.Tensor

See [test/python/test_moe_grouped_matmul_op.py](test/python/test_moe_grouped_matmul_op.py) for usage.

🕒 We will be rolling out new native custom torch ops in upcoming releases – stay tuned! 😃

Open-Source Kernels 🚀 🚀

Blackwell sdpa fprop kernel supporting head dim = 256, written in cuteDSL. Support added through the torch-op above or callable as a standalone API. See [samples](test/python/fe_api/test_sdpa_fwd.py) for the API usage. Requires nvidia-cutlass-dsl[cu13]==4.4.1

Updates:

GroupedGemmWgradSm100 and grouped_gemm_wgrad_wrapper_sm100 expose the grouped GEMM weight-gradient kernel. See grouped_gemm_wgrad.html for API reference [moe_blockscaled_grouped_gemm_wgrad.py](python/cudnn/grouped_gemm/grouped_gemm_wgrad/moe_blockscaled_grouped_gemm_wgrad.py) for samples.

Acknowledgements:

Blackwell sdpa fprop kernel supporting head dim = 256, written in cuteDSL kernel was jointly developed by Shengbin Di, Yuxi Chi, and Linfeng Zheng in close collaboration with Alibaba. We would like to extend special thanks to the core contributors from Alibaba: Siyu Wang, Haoyan Huang, Lanbo Li, Yun Zhong, Man Yuan, Minmin Sun, Yong Li, and Wei Lin for their significant contributions to this work.

Notability

notability 3.0/10

Routine version update of library