What does this release signal mean?

NVIDIA published NVIDIA/nccl nccl4py-v0.2.0 (NVIDIA/nccl). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: Python bindings for NVIDIA's GPU collective communication library NCCL · nccl4py v0.2.0 Release Repository: NVIDIA/nccl Tag: nccl4py-v0.2.0 Published: 2026-04-24T21:33:08Z Prerelease: no Release notes: Release Notes — nccl4py 0.2.0 This.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

NVIDIA Release: NVIDIA/nccl nccl4py-v0.2.0

Captured source

source ↗

GitHub/github.com/NVIDIA/nccl

NVIDIA/nccl nccl4py-v0.2.0

Source ↗

published Apr 24, 2026seen Jun 6captured Jun 11http 200method plain

nccl4py v0.2.0 Release

Repository: NVIDIA/nccl

Tag: nccl4py-v0.2.0

Published: 2026-04-24T21:33:08Z

Prerelease: no

Release notes:

Release Notes — nccl4py 0.2.0

This release adds Python bindings for the new NCCL 2.30 one-sided RMA, Device API (GIN), and elastic communicator features, along with substantially more control over communicator configuration.

Highlights

One-sided RMA (point-to-point) — New Communicator.put_signal(), Communicator.signal(), and Communicator.wait_signal() methods, plus a WaitSignalDesc helper for describing signal values and match operations.
NCCL Device API host side setup — New Communicator.create_dev_comm() that produces a DevCommResource for use with device-side NCCL kernels. Configure the device communicator through the new NCCLDevCommRequirements class, and introspect support via device_api_support, gin_type, railed_gin_type, host_rma_support, and n_lsa_teams properties.
Device pointer access for registered windows — RegisteredWindowHandle now exposes user_ptr, get_lsa_device_pointer(), get_lsa_multimem_device_pointer(), and get_peer_device_pointer() for direct access to LSA, multimem, and peer mappings.
Elastic and fault-tolerant communicators — New Communicator.grow(), revoke(), suspend(), and resume() methods to support elastic topology changes and error-handling flows. CommSuspendFlag added alongside existing CommShrinkFlag.
More flexible construction — In addition to init(), communicators can now be created with class method init_all() and instance method initialize(). Communicator.get_mem_stat() reports per-communicator memory statistics.

Configuration

New tuning knobs on NCCLConfig:

graph_usage_mode, num_rma_ctx, max_p2p_peers.

NCCLDevCommRequirements — passed to Communicator.create_dev_comm() to describe the resources and capabilities a device communicator needs:

LSA: lsa_multimem, barrier_count, lsa_barrier_count, rail_gin_barrier_count, world_gin_barrier_count, lsa_ll_a2a_block_count, lsa_ll_a2a_slot_count.
GIN: gin_force_enable, gin_context_count, gin_signal_count, gin_counter_count, gin_queue_depth, gin_connection_type, gin_exclusive_contexts.

Device / topology introspection

New Communicator properties: cuda_dev, nvml_dev, device_api_support, multimem_support, gin_type, railed_gin_type, n_lsa_teams, host_rma_support.

Other changes

CTAPolicy is now an IntFlag (was IntEnum) so multiple policies can be combined.
Interop submodules nccl.core.cupy and nccl.core.torch are now lazy-loaded via __getattr__ and only imported on first attribute access, so import nccl.core no longer pulls in CuPy or PyTorch.

Notability

notability 4.0/10

Routine library release, minor version update