ReleaseNVIDIANVIDIApublished May 5, 2026seen 5d

NVIDIA/cccl python-0.7.0

NVIDIA/cccl

Open original ↗

Captured source

source ↗
published May 5, 2026seen 5dcaptured 13hhttp 200method plain

CCCL Python Libraries (v0.7.0)

Repository: NVIDIA/cccl

Tag: python-0.7.0

Published: 2026-05-05T15:15:33Z

Prerelease: no

Release notes:

cuda-cccl Python package — version 0.7.0

Release date: May 5th, 2026. Previous release: v0.6.0.

cuda-cccl is in "experimental" status, meaning that its API and feature set can change quite rapidly.

Installation

Please refer to the install instructions here

API breaking changes

  • All `cuda.compute` functions now require keyword-only arguments (#8772)

Every top-level function and factory (make_*) in cuda.compute now enforces keyword-only call syntax (i.e., all parameters must be passed by name). Positional calls will raise a TypeError.

Before:

reduce_into(d_in, d_out, op, num_items, h_init)

After:

reduce_into(d_in=d_in, d_out=d_out, num_items=num_items, op=op, h_init=h_init)

Features

  • System CUDA toolkit install extras — New pip extras sysctk12 / sysctk13 (and

minimal-sysctk12 / minimal-sysctk13) allow installing cuda-cccl without pulling in cuda-toolkit as a pip dependency, for users who already have CUDA installed system-wide (#8608):

pip install cuda-cccl[sysctk13] # full install, system CTK
pip install cuda-cccl[minimal-sysctk13] # no Numba, system CTK

Performance

  • Faster binary searchlower_bound / upper_bound are now implemented via transform

with a small linear search for the final steps, improving throughput on modern GPUs (#8642)

  • Adaptive warpspeed scan — The scan tuning policy now automatically selects the warpspeed

(lookahead) scan path when beneficial for the data type and architecture (#8158)

Bug Fixes

  • Fix incorrect minimum CUDA architecture targeted when building the cccl.c native extension

(#8631)

Notability

notability 3.0/10

Routine library release, no major traction