NVIDIA/nvalchemi-toolkit-ops v0.3.0
NVIDIA/nvalchemi-toolkit-ops
Captured source
source ↗v0.3.0
Repository: NVIDIA/nvalchemi-toolkit-ops
Tag: v0.3.0
Published: 2026-03-16T17:23:56Z
Prerelease: no
Release notes:
v0.3.0 Release Notes
New Features
Molecular Dynamics Integrators and Geometry Optimization (#17)
GPU-accelerated MD integrators and geometry optimization are now available:
- Velocity Verlet (NVE) — microcanonical ensemble
- Langevin dynamics (NVT) — BAOAB splitting for optimal stability
- Nosé-Hoover Chain (NVT) — deterministic thermostat
- NPT/NPH ensembles — Martyna-Tobias-Klein barostat with dynamic cell fluctuations
- Velocity rescaling — direct temperature control
- FIRE2 optimizer — adaptive timestep geometry optimization with variable cell support
All integrators support both single-system and batched computation modes.
JAX Bindings (#24)
A new nvalchemiops.jax namespace provides JAX JIT-compatible wrappers for neighbor lists, electrostatics (Ewald, PME), DFT-D3 dispersion, splines, and batch utilities. Warp kernels are exposed via warp.jax_experimental.jax_kernel.
Note: Cell list operations that require runtime-dependent array allocation are not yet JIT-compatible. JAX autograd (differentiation) support is not yet implemented.
Damped Shifted Force (DSF) Electrostatics (#21)
A new O(N) electrostatics method is available via nvalchemiops.torch.dsf and nvalchemiops.jax.dsf:
- Full PBC and virial stress tensor support
- CSR and dense neighbor matrix formats
- Charge gradient support via straight-through estimator for MLIP training
- Benchmarks show ~80× speedup over PME at 54k atoms; scales to 195k atoms where Ewald/PME encounter memory limits
Differentiable Virial for Ewald and PME (#20)
Ewald and PME now support analytical stress tensor computation via a new compute_virial parameter. Virial is computed in the forward pass and attached to the autograd tape, enabling stress-based loss functions for MLIP training. Typical overhead: +5–12% for Ewald, +8–27% for PME.
compute_charge_gradients in ewald_summation (#27)
The high-level ewald_summation() now accepts compute_charge_gradients=True, forwarding it to both ewald_real_space() and ewald_reciprocal_space() components. Previously only available via the low-level APIs.
GPU-Side Batch Rebuild Detection and Selective Skip (#38)
Per-system neighbor list rebuild detection now runs entirely on the GPU with no CPU–GPU synchronization:
- batch_neighbor_list_needs_rebuild() and batch_cell_list_needs_rebuild() return per-system boolean flags
- All neighbor list APIs (naive, batch_naive, cell_list, batch_cell_list) accept an optional rebuild_flags tensor to skip non-rebuilding systems at the kernel level
- Available in PyTorch and JAX bindings
---
Breaking Changes
Bindings API Refactor (#15)
This release introduces a significant internal restructuring to enable multi-framework support:
- wp_ prefix removed from all Warp-level launcher functions (e.g., wp_neighbor_list → neighbor_list)
- New nvalchemiops.torch namespace — PyTorch bindings are now organized under nvalchemiops.torch.*; the top-level imports remain as backwards-compatible aliases
- PyTorch is now optional — the core Warp layer has no PyTorch dependency
- A v0.3.0 migration guide is included in the documentation
NPT/NPH Kernel Interface (#38)
- NPT and NPH kernel parameters (dt, dt_half, num_atoms, ndof, target_temp, tau) were changed from Python scalars to wp.array types to fix type mismatches in Warp kernels. Callers passing bare Python floats/ints to these
kernels must update to wrapped arrays.
---
Bug Fixes
- NPT/NPH scalar type mismatch — kernel parameters now correctly typed as wp.array (#38)
- Naive PBC neighbor list wrapping — positions are now pre-wrapped before distance computation, fixing incorrect neighbor detection for unwrapped coordinates (#38)
- Batch Ewald k-cutoff — k_cutoff is now reduced to the shared maximum across systems in batch mode, preventing per-system over-allocation (#41)
- Coulomb matrix double-counting — corrected missing 0.5 prefactor in energy-only kernels when using full neighbor lists (#21)
- Negative cell volume — bindings now raise exceptions when a zero or negative cell volume is detected (#6)
- Neighbor list dispatch — improved validation and auto-selection logic for neighbor_list() (#26)
- estimate_max_neighbors defaults — corrected default values for better out-of-box behavior (#25)
- Warp warp.vec deprecation warning — updated symbol usage for compatibility with current Warp (#23)
---
Performance Improvements
- No-allocation warp launchers (#29) — wp.zeros allocations removed from Warp launcher functions. Callers are now required to pre-allocate output arrays, enabling deterministic GPU memory management and eliminating
allocation overhead in hot loops.
---
Documentation
- Sphinx user guides expanded with tabbed PyTorch/JAX code examples (#39)
- DFT-D3 parameter documentation and padding atom guidance fixed (#36)
- Cross-linking to JAX, NumPy, PyTorch, and Warp API references via intersphinx (#39)
- Examples and benchmarks updated for dynamics, DSF, and virial APIs (#40, #42)
Full Changelog: https://github.com/NVIDIA/nvalchemi-toolkit-ops/compare/v0.2.0...v0.3.0
Notability
notability 3.0/10Routine toolkit release, low traction.