ReleaseNVIDIANVIDIApublished Apr 22, 2026seen 5d

NVIDIA/cuEquivariance v0.10.0

NVIDIA/cuEquivariance

Open original ↗

Captured source

source ↗
published Apr 22, 2026seen 5dcaptured 13hhttp 200method plain

v0.10.0

Repository: NVIDIA/cuEquivariance

Tag: v0.10.0

Published: 2026-04-22T01:20:18Z

Prerelease: no

Release notes:

Added

  • Python 3.14 support finalized, including a fix for stale tuple hashes in SegmentedTensorProduct after in-place operand mutation, and updated CI matrix (#272)
  • [Torch/JAX] cuet.triangle_attention/cuex.triangle_attention: new faster sm100f (CC 10.0/10.3) forward kernel for hidden_dim ≤ 256, bwd hidden_dim ≤ 128; bias is cast to q/k/v dtype (instead of always float32) under sm100f; non-contiguous input tensors are handled internally — no manual contiguity assertion is required as long as shape requirements are met; updated docstrings. Only available on cu13 builds (#260)
  • [JAX] MACE flax.nnx example restructured to use nnx.split + @jax.jit on (graphdef, state) instead of @nnx.jit on the module, removing the Python-side nnx graph traversal overhead from each training/inference step (#261)
  • [JAX] NVTX markers added to the MACE examples to make step boundaries visible in nsys profiles (#266)

Bug fix

  • [Torch] SegmentedPolynomial checkpoint portability: GPU-saved models now load correctly on CPU. Implemented via __reduce__ on SegmentedPolynomialFromUniform1dJit, SegmentedPolynomialFusedTP, SegmentedPolynomialIndexedLinear, and SegmentedPolynomial, plus graceful fallback when specific cuequivariance_ops_torch extensions (e.g. uniform_1d) are unavailable (#270)
  • [Torch] Replaced deprecated is_fx_tracing with is_fx_symbolic_tracing (#270)
  • [JAX] Restrict PTX 88 to sm_121 for CUDA 12.9+, avoiding breakage on other architectures (addresses the known issue noted in the 0.9.0 release) (#250)
  • [Torch/JAX] cuet.attention_pair_bias/cuex.attention_pair_bias: fixed incorrect results when the hidden dimension is not a multiple of 32; the previous torch fallback for these cases is removed as the kernel now handles them correctly

Notes

  • [Torch] The CUEQ_TORCH_COMPILE environment variable (experimental) enables torch.compile for cuet.triangle_attention; useful for non-contiguous tensor inputs on Ampere/Hopper architectures

Documentation

  • Fixed tutorial format issues (#274)

What's Changed

  • Fix __eq__ / __lt__ on segmented polynomial types for Python and JAX compatibility by @mariogeiger in https://github.com/NVIDIA/cuEquivariance/pull/258
  • [merge after release] Restrict PTX 88 to sm_121 for CUDA 12.9+ by @hsadasivan in https://github.com/NVIDIA/cuEquivariance/pull/250
  • doc string and api update for triattn by @hsadasivan in https://github.com/NVIDIA/cuEquivariance/pull/260
  • api: remove dim_order from triangle_attention by @hsadasivan in https://github.com/NVIDIA/cuEquivariance/pull/262
  • Removed dim_order from triAttn by @phiandark in https://github.com/NVIDIA/cuEquivariance/pull/263
  • nnx.split by @mariogeiger in https://github.com/NVIDIA/cuEquivariance/pull/261
  • add nvtx marker by @paulz-nv in https://github.com/NVIDIA/cuEquivariance/pull/266
  • Fixing some torch segmented_polynomial support by @phiandark in https://github.com/NVIDIA/cuEquivariance/pull/270
  • Add Python 3.14 support and fix CI setup by @mariogeiger in https://github.com/NVIDIA/cuEquivariance/pull/272
  • Fix tutorials doc format issues by @LiamZhang100 in https://github.com/NVIDIA/cuEquivariance/pull/274
  • Add skill.md files by @mariogeiger in https://github.com/NVIDIA/cuEquivariance/pull/269
  • Release 0.10.0 by @mariogeiger in https://github.com/NVIDIA/cuEquivariance/pull/275

New Contributors

  • @paulz-nv made their first contribution in https://github.com/NVIDIA/cuEquivariance/pull/266

Full Changelog: https://github.com/NVIDIA/cuEquivariance/compare/v0.9.0...v0.10.0

Notability

notability 3.0/10

Routine software release from NVIDIA