What does this release signal mean?

NVIDIA published NVIDIA/cudnn-frontend v1.22.0 (NVIDIA/cudnn-frontend). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: C++ API for NVIDIA's deep learning library cuDNN. · v1.22.0-release Repository: NVIDIA/cudnn-frontend Tag: v1.22.0 Published: 2026-04-03T02:24:29Z Prerelease: no Release notes: cuDNN Frontend v1.22.0 Release Notes cuDNN.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

NVIDIA Release: NVIDIA/cudnn-frontend v1.22.0

Captured source

source ↗

GitHub/github.com/NVIDIA/cudnn-frontend

NVIDIA/cudnn-frontend v1.22.0

Source ↗

published Apr 3, 2026seen Jun 6captured Jun 11http 200method plain

v1.22.0-release

Repository: NVIDIA/cudnn-frontend

Tag: v1.22.0

Published: 2026-04-03T02:24:29Z

Prerelease: no

Release notes:

cuDNN Frontend v1.22.0 Release Notes

cuDNN Frontend v1.22.0 is the recommended version for cuDNN 9.20.0 and later releases.

General Improvements 🚀 🚀

Introducing PyTorch custom operator wrapping cuDNN's Scaled Dot-Product Attention (SDPA). `scaled_dot_product_attention` as the public entry point, closely

matching the signature of `torch.nn.functional.scaled_dot_product_attention`.

def scaled_dot_product_attention(
query: torch.Tensor,
key: torch.Tensor,
value: torch.Tensor,
attn_mask: Optional[torch.Tensor] = None,
dropout_p: float = 0.0,
is_causal: bool = False,
scale: Optional[float] = None,
enable_gqa: bool = False,
*,
diagonal_alignment: int = 0,
left_bound: int = -1,
right_bound: int = -1,
seq_len_q: Optional[torch.Tensor] = None,
seq_len_kv: Optional[torch.Tensor] = None,
cumulative_seq_len_q: Optional[torch.Tensor] = None,
cumulative_seq_len_kv: Optional[torch.Tensor] = None,
) -> torch.Tensor:

Introduce a preindexed execute method, that reduces the CPU execution overhead.

Improve the reproducer tool to report and reproduce SDPA failures for fp8 data types as well.

🕒 We will be rolling out new native custom torch ops in upcoming releases – stay tuned! 😃

Open-Source Kernels 🚀 🚀

Blackwell sdpa bprop kernel supporting head dim = 256, written in cuteDSL. Support added through the torch-op above or callable as a standalone API. See [samples](test/python/fe_api/test_sdpa_bwd.py) for the API usage. Requires nvidia-cutlass-dsl[cu13]==4.4.1

Grouped Gemm + quantize kernels now support dynamic shape and layout. This is controllable via an environment toggle.

Grouped Gemm + Glu/Swiglu now supoprt optional bias fusion in both dense and discrete modes, including partial‑N support and optional bias‑gradient generation for discrete backward paths.

Updates:

fp8 datatype with packed variable sequences (THD) is no longer supported for SM90 (Hopper) architecture.

Fix an issue where sdpa fp8 was failing when used with cuda toolkit 12.9

Acknowledgements:

Blackwell sdpa bprop kernel supporting head dim = 256, written in cuteDSL kernel was jointly developed by Shengbin Di, Yuxi Chi, and Linfeng Zheng in close collaboration with Alibaba. We would like to extend special thanks to the core contributors from Alibaba: Siyu Wang, Haoyan Huang, Lanbo Li, Yun Zhong, Man Yuan, Minmin Sun, Yong Li, and Wei Lin for their significant contributions to this work.

Notability

notability 3.0/10

Routine library release, low community traction.