NVIDIA/warp v1.14.0
NVIDIA/warp
Captured source
source ↗v1.14.0
Repository: NVIDIA/warp
Tag: v1.14.0
Published: 2026-06-01T15:29:00Z
Prerelease: no
Release notes:
Warp v1.14.0
Warp v1.14 expands serialized CPU capture support: captured graphs can now include backward launches, tiled kernels, richer launch arguments, and .wrp files with arrays nested inside structs or wp.indexedarray arguments. This release also adds multi-environment FEM support for batched simulations, reusable and batched linear solvers, pluggable logging, portable tile FFT and solver fallbacks, stable JAX integration APIs, and relaxed CPU/GPU array access for Heterogeneous Memory Management (HMM) and Address Translation Services (ATS) systems.
New features
API Capture expands to cover more workflows
Building on the initial API Capture serialization support in Warp v1.13, Warp v1.14 primarily broadens the set of CPU graph patterns that can be saved and replayed. CPU captures can now include forward execution and reverse-mode passes from wp.Tape().backward(), wp.launch_tiled() kernels, and scalar parameters of any size (#1431). The shared .wrp serialization format also now supports @wp.struct arguments that contain arrays and wp.indexedarray arguments that carry data, gradient, and index buffers.
> [!IMPORTANT] > Upgrade impact for APIC users: > > - Recapture .wrp files saved by Warp v1.13. Warp v1.14 writes APIC format version 10 and rejects the previous format. > - Update native C/C++ APIC handle declarations to explicit pointers such as APICState* and APICGraph*. Ownership and destroy calls are unchanged. See the [APIC migration diff](#rn-v114-apic-migration).
Saved APIC graphs can still be consumed from standalone C++ through the C API declared in warp/native/apic.h. Native replay behavior is unchanged apart from the explicit pointer spelling for APIC handles.
Key capabilities:
- Reverse-mode replay on CPU: adjoint launches emitted by
wp.Tape().backward()are recorded into the CPU APIC stream and replayed from live captures or loaded graphs. - Richer launch arguments: APIC now relocates array data pointers, gradient pointers, index pointers for
wp.indexedarrayarguments, and handles inside serialized launch value blobs. - Tiled kernels: CPU captures can replay kernels that use tiles, including reductions and scans that previously fell outside the captured operation set.
Known limitations:
wp.utils.array_scan()is still not recorded into CPU APIC and raisesNotImplementedErrorin CPU capture.- Nonzero
array.fill_()operations on CPU are not recorded. - APIC serializes
wp.Meshhandles, butwp.Volumeandwp.Bvhhandles are not yet supported. - Loading CPU
.wrpgraphs requires the warp-clang backend and the companion_modules/directory with the recorded CPU kernel objects.
Multi-environment warp.fem
warp.fem can now represent many independent simulation environments inside one geometry and one solve setup (#1407). Colocated Grid2D and Grid3D geometries expose an env_count, sparse Nanogrid and AdaptiveNanogrid geometries can pack per-environment voxels into one NanoVDB volume, and unstructured meshes can carry per-cell environment metadata through cell_env and env_count.
This feature changes two positional call signatures. See the [FEM migration diff](#rn-v114-fem-migration) if your code passes requires_grad, device, or temporary_store positionally.
import warp as wp import warp.fem as fem geo = fem.Grid3D(res=(8, 8, 8), bounds_lo=wp.vec3(0.0), bounds_hi=wp.vec3(1.0), env_count=4) pressure_space = fem.make_polynomial_space(geo, degree=0, discontinuous=True) partition = fem.make_space_partition(space_topology=pressure_space.topology, environment_first=True) # For scalar pressure spaces, these node offsets can be passed directly to # warp.optim.linear.LinearOperator(batch_offsets=...). pressure_batch_offsets = partition.env_offsets
Environment-aware lookup keeps colocated environments from interacting accidentally. When a geometry has more than one environment, pass an environment index to fem.lookup() and pass env_indices to fem.PicQuadrature when particles are binned from world-space positions. The new `warp/examples/fem/example_apic_fluid_multi_env.py` example uses these APIs to run colocated APIC fluid environments with environment-aware particle quadrature and batched pressure solves.
Key capabilities:
- Colocated environments: grid environments can overlap in world coordinates while remaining topologically independent.
- Sparse packed environments:
Nanogrid.from_environment_voxels()andAdaptiveNanogrid.from_environment_voxels()build packed sparse grids with per-cellcell_envmetadata and hidden offsets for packed grids. - Mesh environments: FEM mesh constructors can use
cell_envandenv_countso grouped BVH lookup only traverses the requested environment. - Batched solves:
make_space_partition(..., environment_first=True)exposesenv_offsetsthat line up with the new linear-solverbatch_offsetssupport for scalar spaces. - Known limitations:
environment_first=Truedoes not support halo nodes for partitions that do not cover a whole geometry. Mesh environment indices are lookup and partition metadata, so callers must still provide disconnected mesh topology for independent mesh environments.
Reusable and batched linear solvers
The iterative solvers in warp.optim.linear can now preallocate their temporary buffers and reuse them across compatible solves (#1391). Passing run=False to cg(), cr(), bicgstab(), or gmres() returns a solver object that can be called repeatedly with replacement operands that have the same shape, dtype, device, and batch layout.
import numpy as np import warp as wp from warp.optim.linear import aslinearoperator, cg diag = wp.array([2.0, 2.0, 5.0, 5.0], dtype=float, device="cpu") b = wp.array([2.0, 4.0, 10.0, 15.0], dtype=float, device="cpu") x = wp.zeros_like(b) offsets = wp.array(np.array([0, 2, 4], dtype=np.int32), dtype=int, device="cpu") A = aslinearoperator(diag, batch_offsets=offsets) state = cg(A, b, x, maxiter=100, run=False) state() # solve the original system b2 = wp.array([4.0, 2.0, 15.0, 10.0], dtype=float, device="cpu") x2 = wp.zeros_like(b2)…
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10Routine release, but from NVIDIA