NVIDIA/TileGym v1.3.0
NVIDIA/TileGym
Captured source
source ↗published May 19, 2026seen 5dcaptured 10hhttp 200method plain
v1.3.0
Repository: NVIDIA/TileGym
Tag: v1.3.0
Published: 2026-05-19T06:26:33Z
Prerelease: no
Release notes:
What's Changed
- perf(rmsnorm): skip bounds checking in gather kernel when tile evenly divides N by @liqiangxl in https://github.com/NVIDIA/TileGym/pull/111
- experimental/swa_attention_cutile by @DevTechJr in https://github.com/NVIDIA/TileGym/pull/107
- feat: migrate cuda.tile_experimental.autotune_launch → cuda.tile.tune.exhaustive_search & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/114
- Add cutile-python programming skill by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/115
- Update cutile kernels & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/116
- add
multi_wave_cachedrms norm by @liqiangxl in https://github.com/NVIDIA/TileGym/pull/113 - Add cuTile autotune disable policy & refactor(attention): update _fmha_autotune_configs & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/117
- skills: update .agents/skills/ license by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/119
- Update improve-cutile-kernel-perf skill & Add liger kernels & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/120
- skills: Skip non-SKILL.md skill docs in SPDX header check by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/122
- Refine tilegym ops kernel style by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/121
- Refactor and release Transformers patching and auto-kernelize loop & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/123
- Bump setup.py version from 1.2.0 to 1.3.0 by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/124
- Update attention and matmul tests & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/125
- [benchmark] add torch.profiler based bench utils with L2 cache flush by @liqiangxl in https://github.com/NVIDIA/TileGym/pull/118
- bench(softmax): use profile_with_l2flush for more realistic bandwidth measurements by @liqiangxl in https://github.com/NVIDIA/TileGym/pull/126
- [tests] test_gemma_attention: guard set_backend in test_op with try/skip & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/127
New Contributors
- @liqiangxl made their first contribution in https://github.com/NVIDIA/TileGym/pull/111
- @DevTechJr made their first contribution in https://github.com/NVIDIA/TileGym/pull/107
Full Changelog: https://github.com/NVIDIA/TileGym/compare/v1.2.0...v1.3.0
Notability
notability 4.0/10Minor version update of RL environment.