ReleaseNVIDIANVIDIApublished Apr 14, 2026seen 1w

NVIDIA/recsys-examples v26.03

NVIDIA/recsys-examples

Open original ↗

Captured source

source ↗
published Apr 14, 2026seen 1wcaptured 1whttp 200method plain

v26.03

Repository: NVIDIA/recsys-examples

Tag: v26.03

Published: 2026-04-14T10:05:41Z

Prerelease: no

Release notes:

What's Changed

Features & Enhancements

  • Add Torch export for HSTU model by @jensenhwa in https://github.com/NVIDIA/recsys-examples/pull/327
  • [Feature] dynamicemb table fusion and expansion by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/343
  • feat(benchmark): HSTU E2E training benchmark suite with progressive optimizations by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/340
  • Add HSTU inference benchmark results on B200 by @geoffreyQiu in https://github.com/NVIDIA/recsys-examples/pull/338
  • Relax alignment requirements(remove pow of 2) in dynamicemb by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/312
  • perf: avoid D2H sync in _Split2DJaggedFunction by precomputing split lengths by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/318
  • refactor: migrate to fbgemm_gpu_hstu, remove legacy HSTU compat layer by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/321
  • Optimize balancer and setup debug logger. by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/308
  • fix: align DynamicEmb capacity to bucket_capacity instead of DEMB_TABLE_ALIGN_SIZE by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/329

Bug Fixes

  • fix missing import by @gameofdimension in https://github.com/NVIDIA/recsys-examples/pull/320
  • refactor: remove redundant apply_optimizer_in_backward in sharding.py by @ShaobinChen-AH in https://github.com/NVIDIA/recsys-examples/pull/330
  • error handling for empty kv list by @gameofdimension in https://github.com/NVIDIA/recsys-examples/pull/331
  • Fix docker, cmake and imports after torch export support by @geoffreyQiu in https://github.com/NVIDIA/recsys-examples/pull/358
  • Make table_ptrs_dev persistent by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/356
  • Create DynamicEmbStorage when zero local hbm; reset _prefetch_outstanding_keys only in reset_cache_states by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/354
  • Fix empty batch hang fundamentally by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/349
  • [bugfix] fix hang issue when fed empty batch by @gameofdimension in https://github.com/NVIDIA/recsys-examples/pull/342
  • Fix optimizer states dim(ckpt) of rowwise adagrad by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/305
  • Refactor test for alignment; add get_sharded_table_capacity by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/348

Misc

  • fix(pipeline): drain eval pipeline naturally to prevent batch leak by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/314
  • Fix NVE dependency by @geoffreyQiu in https://github.com/NVIDIA/recsys-examples/pull/323
  • refactor: move HSTU build to devel stage by @shijieliu in https://github.com/NVIDIA/recsys-examples/pull/325
  • Upgrade to Torch 2.11 with Cuda 13.1 by @geoffreyQiu in https://github.com/NVIDIA/recsys-examples/pull/347
  • Update HSTU inference README file by @geoffreyQiu in https://github.com/NVIDIA/recsys-examples/pull/360

New Contributors

  • @jensenhwa made their first contribution in https://github.com/NVIDIA/recsys-examples/pull/327

Full Changelog: https://github.com/NVIDIA/recsys-examples/compare/v26.01...v26.03

Notability

notability 4.0/10

Routine examples release, no major traction.