ReleaseNVIDIANVIDIApublished May 27, 2026seen 5d

NVIDIA/cutlass v4.5.1

NVIDIA/cutlass

Open original ↗

Captured source

source ↗
published May 27, 2026seen 5dcaptured 9hhttp 200method plain

CUTLASS 4.5.1

Repository: NVIDIA/cutlass

Tag: v4.5.1

Published: 2026-05-27T02:31:50Z

Prerelease: no

Release notes:

CuTe DSL

  • Bug fixing and improvements
  • Fixed following issues:

https://github.com/NVIDIA/cutlass/issues/3219 https://github.com/NVIDIA/cutlass/issues/3218 https://github.com/NVIDIA/cutlass/issues/3212 https://github.com/NVIDIA/cutlass/issues/3210 https://github.com/NVIDIA/cutlass/issues/3208 https://github.com/NVIDIA/cutlass/issues/3201 https://github.com/NVIDIA/cutlass/issues/3227

  • Fixed Jax int64 stride divisibility issue
  • Fixed issues for SM120 blockscaled MMAs
  • added missing MXFP8MMAOP and MXF8F6F4MMAOP for sm120.

CUTLASS C++

  • Fix SM100 F8F6F4 SS MMA (1SM and 2SM) traits to use typed op templates.
  • Add UE8M0 (uniform exponent distribution) initialization support in tensor fill utilities.
  • Add cvt.rn.bf16x2.e4m3x2 conversion instruction support to numeric_conversion.h.
  • Update example 93 with paged KV cache support for Blackwell low-latency GQA.

Notability

notability 4.0/10

Routine version update of CUDA library.