What does this release signal mean?

NVIDIA published NVIDIA/nvalchemi-toolkit-ops v0.3.0 (NVIDIA/nvalchemi-toolkit-ops). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: GPU-accelerated chemical science toolkit operations library from NVIDIA. · v0.3.0 Repository: NVIDIA/nvalchemi-toolkit-ops Tag: v0.3.0 Published: 2026-03-16T17:23:56Z Prerelease: no Release notes: v0.3.0 Release Notes New Features Molecular.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

NVIDIA Release: NVIDIA/nvalchemi-toolkit-ops v0.3.0

Captured source

source ↗

GitHub/github.com/NVIDIA/nvalchemi-toolkit-ops

NVIDIA/nvalchemi-toolkit-ops v0.3.0

Source ↗

published Mar 16, 2026seen Jun 6captured Jun 11http 200method plain

v0.3.0

Repository: NVIDIA/nvalchemi-toolkit-ops

Tag: v0.3.0

Published: 2026-03-16T17:23:56Z

Prerelease: no

Release notes:

v0.3.0 Release Notes

New Features

Molecular Dynamics Integrators and Geometry Optimization (#17)

GPU-accelerated MD integrators and geometry optimization are now available:

Velocity Verlet (NVE) — microcanonical ensemble
Langevin dynamics (NVT) — BAOAB splitting for optimal stability
Nosé-Hoover Chain (NVT) — deterministic thermostat
NPT/NPH ensembles — Martyna-Tobias-Klein barostat with dynamic cell fluctuations
Velocity rescaling — direct temperature control
FIRE2 optimizer — adaptive timestep geometry optimization with variable cell support

All integrators support both single-system and batched computation modes.

JAX Bindings (#24)

A new nvalchemiops.jax namespace provides JAX JIT-compatible wrappers for neighbor lists, electrostatics (Ewald, PME), DFT-D3 dispersion, splines, and batch utilities. Warp kernels are exposed via warp.jax_experimental.jax_kernel.

Note: Cell list operations that require runtime-dependent array allocation are not yet JIT-compatible. JAX autograd (differentiation) support is not yet implemented.

Damped Shifted Force (DSF) Electrostatics (#21)

A new O(N) electrostatics method is available via nvalchemiops.torch.dsf and nvalchemiops.jax.dsf:

Full PBC and virial stress tensor support
CSR and dense neighbor matrix formats
Charge gradient support via straight-through estimator for MLIP training
Benchmarks show ~80× speedup over PME at 54k atoms; scales to 195k atoms where Ewald/PME encounter memory limits

Differentiable Virial for Ewald and PME (#20)

Ewald and PME now support analytical stress tensor computation via a new compute_virial parameter. Virial is computed in the forward pass and attached to the autograd tape, enabling stress-based loss functions for MLIP training. Typical overhead: +5–12% for Ewald, +8–27% for PME.

compute_charge_gradients in ewald_summation (#27)

The high-level ewald_summation() now accepts compute_charge_gradients=True, forwarding it to both ewald_real_space() and ewald_reciprocal_space() components. Previously only available via the low-level APIs.

GPU-Side Batch Rebuild Detection and Selective Skip (#38)

Per-system neighbor list rebuild detection now runs entirely on the GPU with no CPU–GPU synchronization:

batch_neighbor_list_needs_rebuild() and batch_cell_list_needs_rebuild() return per-system boolean flags
All neighbor list APIs (naive, batch_naive, cell_list, batch_cell_list) accept an optional rebuild_flags tensor to skip non-rebuilding systems at the kernel level
Available in PyTorch and JAX bindings

---

Breaking Changes

Bindings API Refactor (#15)

This release introduces a significant internal restructuring to enable multi-framework support:

wp_ prefix removed from all Warp-level launcher functions (e.g., wp_neighbor_list → neighbor_list)
New nvalchemiops.torch namespace — PyTorch bindings are now organized under nvalchemiops.torch.*; the top-level imports remain as backwards-compatible aliases
PyTorch is now optional — the core Warp layer has no PyTorch dependency
A v0.3.0 migration guide is included in the documentation

NPT/NPH Kernel Interface (#38)

NPT and NPH kernel parameters (dt, dt_half, num_atoms, ndof, target_temp, tau) were changed from Python scalars to wp.array types to fix type mismatches in Warp kernels. Callers passing bare Python floats/ints to these

kernels must update to wrapped arrays.

---

Bug Fixes

NPT/NPH scalar type mismatch — kernel parameters now correctly typed as wp.array (#38)
Naive PBC neighbor list wrapping — positions are now pre-wrapped before distance computation, fixing incorrect neighbor detection for unwrapped coordinates (#38)
Batch Ewald k-cutoff — k_cutoff is now reduced to the shared maximum across systems in batch mode, preventing per-system over-allocation (#41)
Coulomb matrix double-counting — corrected missing 0.5 prefactor in energy-only kernels when using full neighbor lists (#21)
Negative cell volume — bindings now raise exceptions when a zero or negative cell volume is detected (#6)
Neighbor list dispatch — improved validation and auto-selection logic for neighbor_list() (#26)
estimate_max_neighbors defaults — corrected default values for better out-of-box behavior (#25)
Warp warp.vec deprecation warning — updated symbol usage for compatibility with current Warp (#23)

---

Performance Improvements

No-allocation warp launchers (#29) — wp.zeros allocations removed from Warp launcher functions. Callers are now required to pre-allocate output arrays, enabling deterministic GPU memory management and eliminating

allocation overhead in hot loops.

---

Documentation

Sphinx user guides expanded with tabbed PyTorch/JAX code examples (#39)
DFT-D3 parameter documentation and padding atom guidance fixed (#36)
Cross-linking to JAX, NumPy, PyTorch, and Warp API references via intersphinx (#39)
Examples and benchmarks updated for dynamics, DSF, and virial APIs (#40, #42)

Full Changelog: https://github.com/NVIDIA/nvalchemi-toolkit-ops/compare/v0.2.0...v0.3.0

Notability

notability 3.0/10

Routine toolkit release, low traction.