What does this release signal mean?

NVIDIA published NVIDIA/nccl v2.30.3-1 (NVIDIA/nccl). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: NVIDIA's optimized multi-GPU collective communication library for distributed training. · NCCL v2.30.3-1 Release Repository: NVIDIA/nccl Tag: v2.30.3-1 Published: 2026-04-15T03:03:54Z Prerelease: no Release notes: Device API and GIN Enhancements * GIN.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

NVIDIA Release: NVIDIA/nccl v2.30.3-1

Captured source

source ↗

GitHub/github.com/NVIDIA/nccl

NVIDIA/nccl v2.30.3-1

Source ↗

published Apr 15, 2026seen Jun 6captured Jun 11http 200method plain

NCCL v2.30.3-1 Release

Repository: NVIDIA/nccl

Tag: v2.30.3-1

Published: 2026-04-15T03:03:54Z

Prerelease: no

Release notes:

Device API and GIN Enhancements

GIN contexts are no longer shared between device communicators backed by the same host communicator.
Adds per-context resource sharing modes for GIN, allowing GPU-scope or CTA-scoped resource sharing.
Adds TrafficClass support to device communicator.
Adds versioning to ncclDevComm.
Adds timeout support to the device APIs.
Adds max_rd_atomic and max_dest_rd_atomic support in GIN.
Upgrades doca-gpunetio to v2.0.2-rc1

Elastic Buffers (LSA support)

Support new use cases where large tensors are split into multi-segment windows, with the active region in GPU memory and the remainder in host memory.
Enables larger effective models and reduces memory pressure during spilling.
Elastic buffers will support GIN in a future release.

gin.get with Nonblocking Flush (Experimental)

Support GPU‑initiated gets and check completion without stalling.
It currently only works with GDAKI (not with CPU proxy) and doesn't work on directNIC and Ampere.

Symmetric Memory Improvements

Adds AVG operator to ReduceScatter Symmetric kernels.
Enable dynamic memory offload with group support for single-process, multi-GPU scenarios.
Adds support for GPU-only multi-segment registration for symmetric windows.
Adds CUDA graph capture and replay support for ncclPutSignal and ncclWaitSignal APIs.
One-sided RMA can now use an external network plugin.

Tensor Memory Accelerator (TMA) Support

Adds TMA support in select built-in symmetric kernels to offload bulk peer‑to‑peer copies and reductions, improving NVLink bandwidth and latency.
Can be enabled with NCCL_SYM_TMA_ENABLE=1.

DDP Support

Enables Dynamic Direct Path (DDP) so that NCCL can take advantage of hardware multipath and out‑of‑order receive for higher network performance on supported systems.
Can be enabled with NCCL_IB_OOO_RQ=1.

Port Recovery

Adds support for IB port recovery in NCCL.
Improves NCCL’s ability to recover from transient network issues so communicators can continue operating without full re‑initialization.
Can be enabled with NCCL_IB_RESILIENCY_PORT_RECOVERY=1.

Cross Clique Support

Add support for treating multiple cliques as the same NVLINK domain.
Can be enabled with NCCL_MNNVL_CROSS_CLIQUE=1

NCCL Parameter Infrastructure

Adds new C APIs to support querying NCCL parameters.
Introduces ncclParamGetAllParameterKeys,ncclParamDumpAll, ncclParamGet and ncclParamGetParameter APIs.

NCCL4PY v0.2.0

Adds new APIs from NCCL 2.29 release.
Add devcomm create/destroy APIs to prepare for device API.
Enables Freethreading support.

Other Improvements

Adds NCCL Inspector P2P event support.
ncclGinBarrierSession can now be created directly for the world team without manual resource allocation.
GIN proxy GFD size increased to 128 bytes with version field added.
GIN proxy CQ polling (ginProgress) moved to per-context to improve performance.
ncclBarrierSession no longer shares resources with ncclLsaBarrierSession or ncclGinBarrierSession.
Redundant NCCL_DEBUG=INFO log volume reduced significantly.
NVLSTree tuning that improves performance for various Blackwell systems.
Adds p2pMaxPeers to communicator to achieve better tuning for send/recv vs. all2all.
Enables LL128 protocol in heterogeneous scenarios for Hopper and later GPUs.
Adds checks for mismatched Net and CollNet counts across communicators.
Adds Graphana template for NCCL inspector dashboard rendering using Prometheus data.
Removes unused members nccl_id, comm, nccl_unique_id, and thread_ranks in the examples (Github PR #1989).
Adds NCCL_LIBIBVERBS_SO environment variable to specify an absolute path for libibverbs (Github PR #2043).
Extends suspend memory offload to channel device allocations (Github PR #2060).

Bug Fixes

Fixes implicit CUDA synchronization in putSignal and CE collectives caused by pageable CPU stack memcpy.
Fixes a hang when using CE collectives and cuda graph under an edge case.
Fixes NULL access issue during finalize when RMA and GIN plugins are both initialized.
Fixes race conditions in all2all GIN/Hybrid examples with more than one CTA.
Fixes ncclGinType_t uint8_t enum compatibility issue in nccl4py.
Fixes several memory leaks in communicator create/destroy code paths.
Fixes a bug in plugin compat layer for v11 related to lazy initialization.
Fixes data corruption in symmetric LL kernels with unaligned buffer.
Fixes plugin name being cleared after communicator destroy (Github Issue #1978).
Fixes deadlock and use-after-free in the inspector plugin (Github Issue #2000).
Fixes incorrect network interface selection caused by inverted boolean logic in matchSubnet (Github PR #2047).
Fixes regression from 2.29.2 where CPU affinity mask is not restored in initTransportsRank (Github issue #2033)

Known Limitations

Applications that use GIN APIs need to be recompiled with 2.30.3 to work with 2.30.3 runtime.
gin.get requires GDAKI and is not supported on Ampere or directNIC platforms.

Acknowledgments

We thank the following contributors for their work on this release:

@chenhengqi, @liangxs, @phu0ngng, @SreevatsaAnantharamu, @SongXiaoXi for your PRs.
@sphish, @LyricZhao for continued contribution on improving the NCCL device API.

We also thank the community for issue reports, testing, and feedback.

Notability

notability 3.0/10

Routine maintenance release of a library.