NVIDIA/recsys-examples v26.03
NVIDIA/recsys-examples
Captured source
source ↗published Apr 14, 2026seen 1wcaptured 1whttp 200method plain
v26.03
Repository: NVIDIA/recsys-examples
Tag: v26.03
Published: 2026-04-14T10:05:41Z
Prerelease: no
Release notes:
What's Changed
Features & Enhancements
- Add Torch export for HSTU model by @jensenhwa in https://github.com/NVIDIA/recsys-examples/pull/327
- [Feature] dynamicemb table fusion and expansion by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/343
- feat(benchmark): HSTU E2E training benchmark suite with progressive optimizations by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/340
- Add HSTU inference benchmark results on B200 by @geoffreyQiu in https://github.com/NVIDIA/recsys-examples/pull/338
- Relax alignment requirements(remove pow of 2) in dynamicemb by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/312
- perf: avoid D2H sync in _Split2DJaggedFunction by precomputing split lengths by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/318
- refactor: migrate to fbgemm_gpu_hstu, remove legacy HSTU compat layer by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/321
- Optimize balancer and setup debug logger. by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/308
- fix: align DynamicEmb capacity to bucket_capacity instead of DEMB_TABLE_ALIGN_SIZE by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/329
Bug Fixes
- fix missing import by @gameofdimension in https://github.com/NVIDIA/recsys-examples/pull/320
- refactor: remove redundant apply_optimizer_in_backward in sharding.py by @ShaobinChen-AH in https://github.com/NVIDIA/recsys-examples/pull/330
- error handling for empty kv list by @gameofdimension in https://github.com/NVIDIA/recsys-examples/pull/331
- Fix docker, cmake and imports after torch export support by @geoffreyQiu in https://github.com/NVIDIA/recsys-examples/pull/358
- Make table_ptrs_dev persistent by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/356
- Create DynamicEmbStorage when zero local hbm; reset _prefetch_outstanding_keys only in reset_cache_states by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/354
- Fix empty batch hang fundamentally by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/349
- [bugfix] fix hang issue when fed empty batch by @gameofdimension in https://github.com/NVIDIA/recsys-examples/pull/342
- Fix optimizer states dim(ckpt) of rowwise adagrad by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/305
- Refactor test for alignment; add get_sharded_table_capacity by @jiashuy in https://github.com/NVIDIA/recsys-examples/pull/348
Misc
- fix(pipeline): drain eval pipeline naturally to prevent batch leak by @JacoCheung in https://github.com/NVIDIA/recsys-examples/pull/314
- Fix NVE dependency by @geoffreyQiu in https://github.com/NVIDIA/recsys-examples/pull/323
- refactor: move HSTU build to devel stage by @shijieliu in https://github.com/NVIDIA/recsys-examples/pull/325
- Upgrade to Torch 2.11 with Cuda 13.1 by @geoffreyQiu in https://github.com/NVIDIA/recsys-examples/pull/347
- Update HSTU inference README file by @geoffreyQiu in https://github.com/NVIDIA/recsys-examples/pull/360
New Contributors
- @jensenhwa made their first contribution in https://github.com/NVIDIA/recsys-examples/pull/327
Full Changelog: https://github.com/NVIDIA/recsys-examples/compare/v26.01...v26.03
Notability
notability 4.0/10Routine examples release, no major traction.