NVIDIA/cutlass v4.5.1
NVIDIA/cutlass
Captured source
source ↗published May 27, 2026seen 5dcaptured 9hhttp 200method plain
CUTLASS 4.5.1
Repository: NVIDIA/cutlass
Tag: v4.5.1
Published: 2026-05-27T02:31:50Z
Prerelease: no
Release notes:
CuTe DSL
- Bug fixing and improvements
- Fixed following issues:
https://github.com/NVIDIA/cutlass/issues/3219 https://github.com/NVIDIA/cutlass/issues/3218 https://github.com/NVIDIA/cutlass/issues/3212 https://github.com/NVIDIA/cutlass/issues/3210 https://github.com/NVIDIA/cutlass/issues/3208 https://github.com/NVIDIA/cutlass/issues/3201 https://github.com/NVIDIA/cutlass/issues/3227
- Fixed Jax int64 stride divisibility issue
- Fixed issues for SM120 blockscaled MMAs
- added missing MXFP8MMAOP and MXF8F6F4MMAOP for sm120.
CUTLASS C++
- Fix SM100 F8F6F4 SS MMA (1SM and 2SM) traits to use typed op templates.
- Add UE8M0 (uniform exponent distribution) initialization support in tensor fill utilities.
- Add
cvt.rn.bf16x2.e4m3x2conversion instruction support tonumeric_conversion.h. - Update example 93 with paged KV cache support for Blackwell low-latency GQA.
Notability
notability 4.0/10Routine version update of CUDA library.