ReleaseNVIDIANVIDIApublished Jun 1, 2026seen 5d

NVIDIA/warp v1.14.0

NVIDIA/warp

Open original ↗

Captured source

source ↗
published Jun 1, 2026seen 5dcaptured 13hhttp 200method plain

v1.14.0

Repository: NVIDIA/warp

Tag: v1.14.0

Published: 2026-06-01T15:29:00Z

Prerelease: no

Release notes:

Warp v1.14.0

Warp v1.14 expands serialized CPU capture support: captured graphs can now include backward launches, tiled kernels, richer launch arguments, and .wrp files with arrays nested inside structs or wp.indexedarray arguments. This release also adds multi-environment FEM support for batched simulations, reusable and batched linear solvers, pluggable logging, portable tile FFT and solver fallbacks, stable JAX integration APIs, and relaxed CPU/GPU array access for Heterogeneous Memory Management (HMM) and Address Translation Services (ATS) systems.

New features

API Capture expands to cover more workflows

Building on the initial API Capture serialization support in Warp v1.13, Warp v1.14 primarily broadens the set of CPU graph patterns that can be saved and replayed. CPU captures can now include forward execution and reverse-mode passes from wp.Tape().backward(), wp.launch_tiled() kernels, and scalar parameters of any size (#1431). The shared .wrp serialization format also now supports @wp.struct arguments that contain arrays and wp.indexedarray arguments that carry data, gradient, and index buffers.

> [!IMPORTANT] > Upgrade impact for APIC users: > > - Recapture .wrp files saved by Warp v1.13. Warp v1.14 writes APIC format version 10 and rejects the previous format. > - Update native C/C++ APIC handle declarations to explicit pointers such as APICState* and APICGraph*. Ownership and destroy calls are unchanged. See the [APIC migration diff](#rn-v114-apic-migration).

Saved APIC graphs can still be consumed from standalone C++ through the C API declared in warp/native/apic.h. Native replay behavior is unchanged apart from the explicit pointer spelling for APIC handles.

Key capabilities:

  • Reverse-mode replay on CPU: adjoint launches emitted by wp.Tape().backward() are recorded into the CPU APIC stream and replayed from live captures or loaded graphs.
  • Richer launch arguments: APIC now relocates array data pointers, gradient pointers, index pointers for wp.indexedarray arguments, and handles inside serialized launch value blobs.
  • Tiled kernels: CPU captures can replay kernels that use tiles, including reductions and scans that previously fell outside the captured operation set.

Known limitations:

  • wp.utils.array_scan() is still not recorded into CPU APIC and raises NotImplementedError in CPU capture.
  • Nonzero array.fill_() operations on CPU are not recorded.
  • APIC serializes wp.Mesh handles, but wp.Volume and wp.Bvh handles are not yet supported.
  • Loading CPU .wrp graphs requires the warp-clang backend and the companion _modules/ directory with the recorded CPU kernel objects.

Multi-environment warp.fem

warp.fem can now represent many independent simulation environments inside one geometry and one solve setup (#1407). Colocated Grid2D and Grid3D geometries expose an env_count, sparse Nanogrid and AdaptiveNanogrid geometries can pack per-environment voxels into one NanoVDB volume, and unstructured meshes can carry per-cell environment metadata through cell_env and env_count.

This feature changes two positional call signatures. See the [FEM migration diff](#rn-v114-fem-migration) if your code passes requires_grad, device, or temporary_store positionally.

import warp as wp
import warp.fem as fem

geo = fem.Grid3D(res=(8, 8, 8), bounds_lo=wp.vec3(0.0), bounds_hi=wp.vec3(1.0), env_count=4)
pressure_space = fem.make_polynomial_space(geo, degree=0, discontinuous=True)
partition = fem.make_space_partition(space_topology=pressure_space.topology, environment_first=True)

# For scalar pressure spaces, these node offsets can be passed directly to
# warp.optim.linear.LinearOperator(batch_offsets=...).
pressure_batch_offsets = partition.env_offsets

Environment-aware lookup keeps colocated environments from interacting accidentally. When a geometry has more than one environment, pass an environment index to fem.lookup() and pass env_indices to fem.PicQuadrature when particles are binned from world-space positions. The new `warp/examples/fem/example_apic_fluid_multi_env.py` example uses these APIs to run colocated APIC fluid environments with environment-aware particle quadrature and batched pressure solves.

Key capabilities:

  • Colocated environments: grid environments can overlap in world coordinates while remaining topologically independent.
  • Sparse packed environments: Nanogrid.from_environment_voxels() and AdaptiveNanogrid.from_environment_voxels() build packed sparse grids with per-cell cell_env metadata and hidden offsets for packed grids.
  • Mesh environments: FEM mesh constructors can use cell_env and env_count so grouped BVH lookup only traverses the requested environment.
  • Batched solves: make_space_partition(..., environment_first=True) exposes env_offsets that line up with the new linear-solver batch_offsets support for scalar spaces.
  • Known limitations: environment_first=True does not support halo nodes for partitions that do not cover a whole geometry. Mesh environment indices are lookup and partition metadata, so callers must still provide disconnected mesh topology for independent mesh environments.

Reusable and batched linear solvers

The iterative solvers in warp.optim.linear can now preallocate their temporary buffers and reuse them across compatible solves (#1391). Passing run=False to cg(), cr(), bicgstab(), or gmres() returns a solver object that can be called repeatedly with replacement operands that have the same shape, dtype, device, and batch layout.

import numpy as np
import warp as wp
from warp.optim.linear import aslinearoperator, cg

diag = wp.array([2.0, 2.0, 5.0, 5.0], dtype=float, device="cpu")
b = wp.array([2.0, 4.0, 10.0, 15.0], dtype=float, device="cpu")
x = wp.zeros_like(b)
offsets = wp.array(np.array([0, 2, 4], dtype=np.int32), dtype=int, device="cpu")

A = aslinearoperator(diag, batch_offsets=offsets)
state = cg(A, b, x, maxiter=100, run=False)
state() # solve the original system

b2 = wp.array([4.0, 2.0, 15.0, 10.0], dtype=float, device="cpu")
x2 = wp.zeros_like(b2)…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Routine release, but from NVIDIA