NVIDIA/cutlass v4.4.2
NVIDIA/cutlass
Captured source
source ↗published Mar 17, 2026seen 5dcaptured 13hhttp 200method plain
CUTLASS 4.4.2
Repository: NVIDIA/cutlass
Tag: v4.4.2
Published: 2026-03-17T14:55:49Z
Prerelease: no
Release notes:
CuTe DSL
- New features
- CuTe DSL now supports Python 3.14 for both x86_64 and aarch64
- Runtime Pointer/Tensor/FakeTensor now supports __cache_key__, providing a stable, hashable representation that simplifies and improves compiled function caching.
- Bug fixing and improvements
- Fixed Hopper FMHA causal attention performance regression on CUDA toolkit 13.1 by
optimizing mbarrier synchronization to avoid unnecessary convergence barriers.
- Fix kernel loading race condition when multiple GPU are present in the same process in JAX.
CUTLASS C++
- Enable Blackwell SM120f compilation of examples and exposes NVFP4/MX Grouped GEMM in the CUTLASS Profiler.
Notability
notability 5.0/10Major CUDA library update from Nvidia