What does this release signal mean?

NVIDIA published NVIDIA/nvshmem v3.4.5-0 (NVIDIA/nvshmem). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: Routine version release of NVSHMEM library. · NVSHMEM 3.4.5-0 Repository: NVIDIA/nvshmem Tag: v3.4.5-0 Published: 2025-10-07T18:50:29Z Prerelease: no Release notes: NVIDIA® NVSHMEM 3.4.5 Release Notes NVSHMEM is an.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

NVIDIA Release: NVIDIA/nvshmem v3.4.5-0

Captured source

source ↗

GitHub/github.com/NVIDIA/nvshmem

NVIDIA/nvshmem v3.4.5-0

Source ↗

published Oct 7, 2025seen Jun 11captured Jun 11http 200method plain

NVSHMEM 3.4.5-0

Repository: NVIDIA/nvshmem

Tag: v3.4.5-0

Published: 2025-10-07T18:50:29Z

Prerelease: no

Release notes:

NVIDIA® NVSHMEM 3.4.5 Release Notes

NVSHMEM is an implementation of the OpenSHMEM specification for NVIDIA GPUs. The NVSHMEM programming interface implements a Partitioned Global Address Space (PGAS) model across a cluster of NVIDIA GPUs. NVSHMEM provides an easy-to-use interface to allocate memory that is symmetrically distributed across the GPUs. In addition to a CPU-side interface, NVSHMEM provides a NVIDIA® CUDA® kernel-side interface that allows CUDA threads to access any location in the symmetrically-distributed memory.

The release notes describe the key features, software enhancements and improvements, and known issues for NVSHMEM 3.4.5 and earlier releases.

Key Features and Enhancements

This NVSHMEM release includes the following key features and enhancements:

Added support for data direct NIC configurations in the IB transports. Added a new environment variable, NVSHMEM_DISABLE_DATA_DIRECT, to force disable data direct NIC even when present.
Added support for CPU-Assisted IBGDA without the use of GDRCopy or the x86 regkey setting.

Systems not supporting the other methods will automatically fall back to this new method. It enables the use of IBGDA on a broad range of systems without the need for administrator intervention.

Added a new environment variable NVSHMEM_HCA_PREFIX to enable IB transports on systems which

name their HCA devices in a non-standard way (for example, ipb* instead of mlx5*).

Deprecated support for the combined libnvshmem.a host and device static library.

Compatibility

NVSHMEM 3.4.5 has been tested with the following:

CUDA Toolkit:

12.2
12.6
12.9
13.0

CPUs:

*x86* and NVIDIA Grace™ processors

GPUs:

NVIDIA Ampere A100
NVIDIA Hopper™
NVIDIA Blackwell®

Limitations

NVSHMEM is not compatible with the PMI client library on Cray systems,

and *must* use the NVSHMEM internal PMI-2 client library.

You can launch jobs with the PMI bootstrap by specifying --mpi=pmi2

to Slurm and NVSHMEM_BOOTSTRAP_PMI=PMI-2, or directly by using the MPI or SHMEM bootstraps.

You can also set PMI-2 as the default PMI by setting NVSHMEM_DEFAULT_PMI2=1

when you build NVSHMEM.

The libfabric transport does not support VMM yet, so you must disable VMM

by setting NVSHMEM_DISABLE_CUDA_VMM=1.

Systems with PCIe peer-to-peer communication require one of the following:
InfiniBand to support NVSHMEM atomics APIs
Using NVSHMEM’s UCX transport that, if IB is absent, will use sockets for atomics
nvshmem_barrier*, nvshmem_quiet, or nvshmem_wait_until only ensures ordering

and visibility between the source and destination PEs and *does not* ensure global ordering and visibility.

When built with GDRCopy, and when using InfiniBand on earlier versions

of the 460 driver and previous branches, NVSHMEM cannot allocate the complete device memory because of the inability to reuse the BAR1 space. This has been fixed in the CUDA release 460 driver and in release 470 and later.

IBGDA does not work with CX-4 when the link layer is Ethernet (RoCE).
NVSHMEM is not supported on Grace with Ada L40 platforms.
NVSHMEM is not supported on virtualized environments (VM).
User buffers registered with nvshmemx_buffer_register_symmetric

lack support for libfabric transport to perform GPU-GPU communication over Remote networks (EFA, Slingshot, etc.).

When registering Extended GPU memory (EGM) user buffers with

nvshmemx_buffer_register_symmetric, the buffers on different PEs must belong to distinct CPU sockets within a node. This can be achieved by selecting GPUs on a different NUMA domain using the CUDA_VISIBLE_DEVICES environment variable.

When using the Libfabric transport with NVSHMEM_LIBFABRIC_PROVIDER=EFA, you must ensure that the libfabric environment variable FI_EFA_ENABLE_SHM_TRANSFER is set to 0 before launching their application. While NVSHMEM

does set this variable during initialization, it can be ignored by the EFA provider if it was already initialized by the launcher, for example when using mpirun.

Deprecated Features

Support for libnvshmem.a is now deprecated.

Known Issues

Complex types, which are enabled by setting NVSHMEM_COMPLEX_SUPPORT

at compile time, are not currently supported.

When enabling libfabric transport with NVSHMEM_LIBFABRIC_PROVIDER=EFA,

certain operations are experimental and may cause the application kernel to hang in the following operations:

Device side nvshmem_put/nvshmem_get with nvshmem_barrier
Host side nvshmem_put_on_stream/nvshmem_get_on_stream
When you enable UCX remote transport with NVSHMEM_REMOTE_TRANSPORT=UCX,

you may observe a data mismatch when scaling 32 PEs or more on DGX-2 platform.

Notability

notability 3.0/10

Routine version release of NVSHMEM library.