NVIDIA/nvalchemi-toolkit-ops
Python
Captured source
source ↗NVIDIA/nvalchemi-toolkit-ops
Description: ALCHEMI Toolkit-Ops is a collection of optimized batch kernels to accelerate computational chemistry and material science workflows.
Language: Python
License: NOASSERTION
Stars: 196
Forks: 28
Open issues: 11
Created: 2025-12-01T18:34:52Z
Pushed: 2026-06-10T21:00:56Z
Default branch: main
Fork: no
Archived: no
README:
NVIDIA ALCHEMI Toolkit-Ops
 
High-performance NVIDIA Warp primitives for computational chemistry
NVIDIA ALCHEMI Toolkit-Ops is a collection of GPU-optimized, batched primitives for accelerating atomistic simulations. High performance compute kernels are written in NVIDIA `warp-lang`.
Key Features
- Molecular Dynamics kernels: Velocity Verlet (NVE), Langevin (NVT),
Nosé-Hoover Chain (NVT), NPT/NPH ensembles, velocity rescaling
- Geometry optimization: FIRE and FIRE2 with optional unit cell
optimization
- Neighbor lists: naive $O(N^2)$ and cell list $O(N)$ algorithms
- Dispersion corrections via Becke-Johnson damped DFT-D3
- Electrostatic interactions: Ewald, particle mesh Ewald (PME), and
damped shifted force (DSF) algorithms
- Differentiable physics: analytical stress tensor (virial) support
for Ewald and PME, enabling stress-based MLIP training
- NVIDIA Warp core with optional, JIT-compatible PyTorch and JAX
bindings, including autograd support
Kernels are naturally intended to be highly scalable (>100,000 atoms) and generally optimized for high throughput operations (on the order of several microseconds per atom) on GPUs, with batching support.
Use Cases
There are currently three primary use cases where we imagine nvalchemi-toolkit-ops to fit into the ecosystem:
- Library maintainers and developers are encouraged to benchmark and explore
integrating functionality like neighbor list computation to accelerate existing workflows;
- Researchers and model developers ideally should be able to rely on
this package (and not implement their own!) for neighbor list computation, interatomic interactions, and so on during method development;
- Engineers looking to build applications that involve molecular dynamics,
interatomic potentials, and the like can take advantage of optimized and maintained low-level kernels. warp-lang kernels should be sufficiently modular to allow for a high degree of flexibility and reusability.
The combination of being GPU-first and batched should enable the kernels contained in nvalchemi-toolkit-ops to be ready for a wide range of research and production applications.
Example Snippets
We encourage interested readers to browse our hosted documentation. Below are some short snippets that highlight our straightforward API and use cases for PyTorch: see the hosted documentation for Jax details.
Neighbor list in a 2D unit cell with 50,000 atoms
This example uses PyTorch:
import torch
from nvalchemiops.torch.neighbors import neighbor_list
torch.set_default_dtype(torch.float32)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.set_default_device(device)
NUM_ATOMS = 50_000
# arbitrarily scale positions
positions = torch.randn((NUM_ATOMS, 3)) * 10.0
cell = torch.eye(3, dtype=torch.float32).unsqueeze(0)
pbc = torch.tensor([True, True, False], dtype=torch.bool)
cutoff = 6.0
# use padded matrix representation for neighbors, optimal for
# compiled applications that need constant shapes
neighbor_matrix, num_neighbors, shift_matrix = neighbor_list(
positions,
cutoff,
cell=cell,
pbc=pbc,
method="cell_list"
)
# ...or pass `return_neighbor_list=True` for the familiar COO
# `edge_index` format. `method` will also automatically determine
# neighbor algorithm based off system size
edge_index, neighbor_ptr, shifts = neighbor_list(
positions,
cutoff,
cell=cell,
pbc=pbc,
return_neighbor_list=True
)DFT-D3(BJ) corrections on a batch of molecules
This example assumes you already have concatenated a set of molecules into combined tensors, and have computed some form of neighborhood using the neighbor_list API. Here, we'll demonstrate using the matrix representation:
import torch from nvalchemiops.torch.interactions.dispersion import dftd3 from nvalchemiops.torch.neighbors import neighbor_list # the following parameters need to be constructed ahead of time positions = ... # [num_atoms, 3] atomic_numbers = ... # [num_atoms] cell = ... # [num_systems, 3, 3] pbc = ... # [num_systems, 3] batch_idx = ... # [num_atoms] batch_ptr = ... # [num_systems + 1] # construct neighbor matrix neighbor_matrix, num_neighbors, shift_matrix = neighbor_list( positions, cutoff=..., # on the order of ~20 Angstroms cell=cell, pbc=pbc, batch_idx=batch_idx, batch_ptr=batch_ptr ) # DFT-D3 parameters need to be provided, which comprises reference C6 parameters. # Refer to the user documentation to see the expected structure and data source. d3_params = ... # pass everything to the functional interface d3_energies, d3_forces, coord_nums, d3_virials = dftd3( positions=positions, numbers=atomic_numbers, neighbor_matrix=neighbor_matrix, neighbor_matrix_shifts=shift_matrix, batch_idx=batch_idx, # functional specific DFT-D3 parameters (PBE shown) a1=0.4289, a2=4.4407, s8=0.7875, d3_params=d3_params, compute_virial=True )
Electrostatics via particle mesh Ewald
This example shows how to compute the per-atom and system energies as well as the forces using the particle mesh Ewald interface.
import torch from nvalchemiops.torch.interactions.electrostatics import particle_mesh_ewald from nvalchemiops.torch.neighbors import neighbor_list # the following parameters need to be constructed ahead of time positions = ... # [num_atoms, 3] atomic_numbers = ... # [num_atoms] cell = ... # [num_systems, 3, 3] pbc = ... # [num_systems, 3] atomic_charges = ... # [num_atoms] # construct neighbor matrix neighbor_matrix, num_neighbors, shift_matrix = neighbor_list( positions, cutoff=..., # on the order of ~20 Angstroms cell=cell, pbc=pbc, ) # call PME, using automatic parameter tuning atom_energies, atom_forces = particle_mesh_ewald( positions=positions, charges=atomic_charges, cell=cell,…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10NVIDIA toolkit with moderate traction