NVIDIA/Megatron-LM core_v0.17.0
NVIDIA/Megatron-LM
Captured source
source ↗published Apr 16, 2026seen 5dcaptured 9hhttp 200method plain
NVIDIA Megatron Core 0.17.0
Repository: NVIDIA/Megatron-LM
Tag: core_v0.17.0
Published: 2026-04-16T19:59:42Z
Prerelease: no
Release notes: Changelog Details
- Fix two minor bugs in MTP implementation for hybrid models by @deepakn94 :: PR: #3194
- Update README.md by @mvirts :: PR: #2111
- mRoPE for MTP by @BestJuly :: PR: #3114
- Fix bug in SFTDataset by @duncanriach :: PR: #3185
- Fix several syntax error by @HollowMan6 :: PR: #3004
- Fix for RL Test by @wdykas :: PR: #3148
- Fix latent moe flops and backward_dw by @buptzyb :: PR: #2977
- Use global user buffer when the bucket size does not fit FixedPoolAllocator by @shengf-nv :: PR: #2857
- ci: Checkpoint retention by @ko3n1g :: PR: #3205
- Add unit test for LatentMoE by @venmugil :: PR: #2892
- ci: Enable unit tests on merge-queue by @ko3n1g :: PR: #3186
- Fix seq pack flag in
get_logprobsby @mathemakitten :: PR: #3206 - ci(fix): Parse unit tests in merge-queue by @ko3n1g :: PR: #3224
- Fix TE 2.12 AllGather CI failure by @BestJuly :: PR: #3101
- ci(hotfix): Pin uv by @ko3n1g :: PR: #3233
- Add a unit test to check that RL
get_logprobswill reuse training cudagraphed forward pass by @mathemakitten :: PR: #3209 - Do not offload grad buffers when training graphs are enabled by @mathemakitten :: PR: #3231
- Fix missing PackedSeqParams import by @parthmannan :: PR: #3214
- Synchronize the request counts for EP inference with strict matching by @santhnm2 :: PR: #3033
- Fix coordinator address collision check in flask by @tdene :: PR: #3208
- Do not let requests fail silently inside inference engine by @tdene :: PR: #3228
- torch saver inference model offload by @wdykas :: PR: #3170
- enable cuda graph ut by @Autumn1998 :: PR: #3197
- Support EP with HSDP by @wplf :: PR: #2840
- [Main] Add the missing part to support 1F1B overlap for Qwen3-Next by @BestJuly :: PR: #2997
- Missing import fix by @parthmannan :: PR: #3241
- Miscellaneous inference cleanup (Replay of !2955) by @santhnm2 :: PR: #3232
- Add DistributedInitConfig by @maanug-nv :: PR: #3173
- Fix checkpoint converter missing parallel group initialization by @yashaswikarnati :: PR: #3217
- Skip empty sequences and chunks in MTP tensor roll by @BestJuly :: PR: #3035
- Implement get_parameters for ChainedOptimizer by @nschank :: PR: #3201
- ci(fix): Create main/dev image tags by @ko3n1g :: PR: #3252
- Reapply "Add MTP support for hybrid models (#2363)" by @sancha :: PR: #3207
- Fix uv install for GH actions by @Phlip79 :: PR: #3259
- Update the project structure in README by @janEbert :: PR: #3251
- Cherry-pick: Fix mtp_num_layers and clip_qk issues (#2581, #2776) by @BestJuly :: PR: #3075
- RL: training cudagraphs functional test by @mathemakitten :: PR: #3235
- [Main] fix cg missing wgrad hook by @Wohox :: PR: #3074
- Avoid .cuda call on meta device in LanguageModel by @nschank :: PR: #3202
- fix checkpointing error message by @dimapihtar :: PR: #3203
- Nano QAT/D fix with sft tokenizer and datasets by @ChenhanYu :: PR: #3254
- Revert "fix checkpointing error message (#3203)" by @ko3n1g :: PR: #3283
- Reapply "fix checkpointing error message (#3203)" (#3283) by @ko3n1g :: PR: #3285
- docs: Add changelog for 0.15.3 by @ko3n1g :: PR: #3286
- ci: Set throughput tests as flaky by @chtruong814 :: PR: #3301
- chore: Move GB200 tests to nightly by @ko3n1g :: PR: #3302
- Ensure type-checker understands use of Submodules in bert_model by @nschank :: PR: #3256
- Override extra_repr instead of __repr__ by @nschank :: PR: #3200
- Replace ModuleSpec with Protocols for LayerNorm submodules by @nschank :: PR: #3090
- Non colocated refit by @wdykas :: PR: #3213
- Fuse permute+pad and unpermute+unpad ops for FP8/FP4 training by @xiaoxi-wangfj :: PR: #2763
- Add check to prevent MFSDP from numeric issue in gradient accumulate fusion by @shjwudp :: PR: #2904
- update get_embedding_ranks and get_position_embedding_ranks docstrings by @c1lovez1 :: PR: #3223
- Param offset in _ParamAndGradBucket should be aligned by @skydoorkai :: PR: #3007
- ci: Add secrets detector by @chtruong814 :: PR: #3180
- Ensure type-checker understands use of Submodules in llava_model by @nschank :: PR: #3257
- updates to support modelopt EAGLE training with CP by @yeyu-nvidia :: PR: #3147
- fully remove legacy tokenizer system by @dimapihtar :: PR: #2946
- M-FSDP: Remove redundant stream waits in HSDP to prevent CG fail by @shjwudp :: PR: #2941
- General README and pyproject fixes by @ahmadki :: PR: #2907
- chore: More aggressive checkpointing by @ko3n1g :: PR: #3315
- ci: Pin down setuptools to lt 82 by @ko3n1g :: PR: #3313
- fix: numpy overflow by @ko3n1g :: PR: #3306
- fix: T5 dataset by @ko3n1g :: PR: #3307
- ci: Revert "ci: Add secrets detector (#3180)" by @chtruong814 :: PR: #3330
- ci: Add more tests, run on merge-queue by @ko3n1g :: PR: #3317
- ci: Remove merge-gate environment check by @chtruong814 :: PR: #3331
- Use FP4 context for mamba by @kwyss-nvidia :: PR: #2604
- ci: Ensure we run all functional tests in merge group by @chtruong814 :: PR: #3332
- Replace ModuleSpec with Protocols for inputs to MLP by @nschank :: PR: #3084
- ci: Fix merge queue functional tests by @chtruong814 :: PR: #3337
- ci: skip queue in merge-gate by @ko3n1g :: PR: #3343
- ci: Timeout for functional tests by @ko3n1g :: PR: #3346
- update checkpointing documentation by @dimapihtar :: PR: #3347
- Update golden values to reflect improvements by @tdene :: PR: #3350
- BUGFIX: gpt vs hybrid model mtp naming mismatch by @sancha :: PR: #3334
- Disable flaky test by @tdene :: PR: #3354
- re-enable gpt grpo tests by @jon-barker :: PR: #3348
- Fix SFT Pipeline when TP>1 by @asolergi-nv :: PR: #3268
- Fixes for KD mode by @AAnoosheh :: PR: #3342
- chore: Update codeowners file by @ko3n1g :: PR: #3365
- Siddharth/fix inference functional tests by @sidsingh-nvidia :: PR: #3357
- Switch oncall by @janEbert :: PR: #3360
- Add missing RMSNorm to llama train script by @AAnoosheh :: PR: #3314
- Fix inference for MTP models by @tdene :: PR: #3297
- Add a logprobs test with real gpt model. by @yobibyte :: PR: #2870
- Add simple GRPO functional test by @tdene :: PR: #3323
- ci: Concurrency control for merge-queue by @ko3n1g :: PR: #3353
- ci: Update golden value download script to work with Github by @chtruong814 :: PR: #3335
- fix: correct typos 'seperated' and 'recieved' by @thecaptain789 :: PR: #3305
- Improved PyTorch profiler and added PyTorch execution trace by @shengf-nv :: PR: #3273
- Removing etc from main index page, shifted…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10Significant library update from NVIDIA