ReleaseNVIDIANVIDIApublished May 19, 2026seen 5d

NVIDIA/TileGym v1.3.0

NVIDIA/TileGym

Open original ↗

Captured source

source ↗
published May 19, 2026seen 5dcaptured 10hhttp 200method plain

v1.3.0

Repository: NVIDIA/TileGym

Tag: v1.3.0

Published: 2026-05-19T06:26:33Z

Prerelease: no

Release notes:

What's Changed

  • perf(rmsnorm): skip bounds checking in gather kernel when tile evenly divides N by @liqiangxl in https://github.com/NVIDIA/TileGym/pull/111
  • experimental/swa_attention_cutile by @DevTechJr in https://github.com/NVIDIA/TileGym/pull/107
  • feat: migrate cuda.tile_experimental.autotune_launch → cuda.tile.tune.exhaustive_search & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/114
  • Add cutile-python programming skill by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/115
  • Update cutile kernels & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/116
  • add multi_wave_cached rms norm by @liqiangxl in https://github.com/NVIDIA/TileGym/pull/113
  • Add cuTile autotune disable policy & refactor(attention): update _fmha_autotune_configs & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/117
  • skills: update .agents/skills/ license by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/119
  • Update improve-cutile-kernel-perf skill & Add liger kernels & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/120
  • skills: Skip non-SKILL.md skill docs in SPDX header check by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/122
  • Refine tilegym ops kernel style by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/121
  • Refactor and release Transformers patching and auto-kernelize loop & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/123
  • Bump setup.py version from 1.2.0 to 1.3.0 by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/124
  • Update attention and matmul tests & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/125
  • [benchmark] add torch.profiler based bench utils with L2 cache flush by @liqiangxl in https://github.com/NVIDIA/TileGym/pull/118
  • bench(softmax): use profile_with_l2flush for more realistic bandwidth measurements by @liqiangxl in https://github.com/NVIDIA/TileGym/pull/126
  • [tests] test_gemma_attention: guard set_backend in test_op with try/skip & other updates by @hannahli-nv in https://github.com/NVIDIA/TileGym/pull/127

New Contributors

  • @liqiangxl made their first contribution in https://github.com/NVIDIA/TileGym/pull/111
  • @DevTechJr made their first contribution in https://github.com/NVIDIA/TileGym/pull/107

Full Changelog: https://github.com/NVIDIA/TileGym/compare/v1.2.0...v1.3.0

Notability

notability 4.0/10

Minor version update of RL environment.