ReleaseNVIDIANVIDIApublished May 21, 2026seen 5d

NVIDIA/torch-harmonics v0.9.1

NVIDIA/torch-harmonics

Open original ↗

Captured source

source ↗
published May 21, 2026seen 5dcaptured 8hhttp 200method plain

v0.9.1

Repository: NVIDIA/torch-harmonics

Tag: v0.9.1

Published: 2026-05-21T12:03:14Z

Prerelease: no

Release notes:

  • Fourier-Bessel filter basis; Hann window basis with per-type init factors via get_init_factors
  • Standardized L2 normalization on the unit disk (harmonic, Zernike, Fourier-Bessel); on a disk of radius R the norm equals R via the Jacobian
  • New DISCO basis normalization modes modal (mean-subtracted, reduces spectral leakage) and geometric (spherical cap area measure)
  • Deprecated basis_norm_mode="individual""nodal" and "area ratio""geometric" (old names emit DeprecationWarning)
  • Faster DISCO sparsity-pattern setup; OpenMP forward/backward kernels with up to ~55x speedup in some configurations
  • Cross-attention (key != value != query) in AttentionS2, NeighborhoodAttentionS2, and DistributedNeighborhoodAttentionS2
  • Serial attention upsampling when nlon_out % nlon_in == 0: CPU/CUDA/torch upsample kernels and matching reference
  • DistributedNeighborhoodAttentionS2 for self-attention and downsampling (distributed upsample not yet implemented)
  • Optional per-head QK RMS norm (use_qknorm) for AttentionS2 and NeighborhoodAttentionS2; shape checks across attention layers
  • Fixed Q/K/V projection gain when input dim != embedding dim
  • Breaking: default NeighborhoodAttentionS2 scale changed from 1/sqrt(k_channels) to 1/sqrt(k_channels // num_heads) to match standard MHA head-dim scaling (num_heads > 1)
  • Faster Legendre coefficient precomputation for SHT layers
  • Differentiable polar_halo_exchange and get_group_neighbors for distributed attention
  • More robust distributed transpose; _reduce clones before all_reduce for torch.compile compatibility
  • Fixed Galewsky initial condition NaN from overflow; convolution adapter for mismatched residual channel counts
  • Midpoint rule for filter-basis L2 norm integration (O(h^2)); improved _precompute_convolution_tensor_s2 docstring
  • Expanded attention tests (including upsample); new tests/test_filter_basis.py; broader layer integrity coverage

Notability

notability 3.0/10

Routine library release, low traction