What does this release signal mean?

NVIDIA published NVIDIA/Megatron-LM core_v0.17.0 (NVIDIA/Megatron-LM). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: NVIDIA Megatron-LM core v0.17.0: framework for scaling transformers. · NVIDIA Megatron Core 0.17.0 Repository: NVIDIA/Megatron-LM Tag: core_v0.17.0 Published: 2026-04-16T19:59:42Z Prerelease: no Release notes: Changelog.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

NVIDIA Release: NVIDIA/Megatron-LM core_v0.17.0

Captured source

source ↗

GitHub/github.com/NVIDIA/Megatron-LM

NVIDIA/Megatron-LM core_v0.17.0

Source ↗

published Apr 16, 2026seen Jun 6captured Jun 11http 200method plain

NVIDIA Megatron Core 0.17.0

Repository: NVIDIA/Megatron-LM

Tag: core_v0.17.0

Published: 2026-04-16T19:59:42Z

Prerelease: no

Release notes: Changelog Details

Fix two minor bugs in MTP implementation for hybrid models by @deepakn94 :: PR: #3194
Update README.md by @mvirts :: PR: #2111
mRoPE for MTP by @BestJuly :: PR: #3114
Fix bug in SFTDataset by @duncanriach :: PR: #3185
Fix several syntax error by @HollowMan6 :: PR: #3004
Fix for RL Test by @wdykas :: PR: #3148
Fix latent moe flops and backward_dw by @buptzyb :: PR: #2977
Use global user buffer when the bucket size does not fit FixedPoolAllocator by @shengf-nv :: PR: #2857
ci: Checkpoint retention by @ko3n1g :: PR: #3205
Add unit test for LatentMoE by @venmugil :: PR: #2892
ci: Enable unit tests on merge-queue by @ko3n1g :: PR: #3186
Fix seq pack flag in get_logprobs by @mathemakitten :: PR: #3206
ci(fix): Parse unit tests in merge-queue by @ko3n1g :: PR: #3224
Fix TE 2.12 AllGather CI failure by @BestJuly :: PR: #3101
ci(hotfix): Pin uv by @ko3n1g :: PR: #3233
Add a unit test to check that RL get_logprobs will reuse training cudagraphed forward pass by @mathemakitten :: PR: #3209
Do not offload grad buffers when training graphs are enabled by @mathemakitten :: PR: #3231
Fix missing PackedSeqParams import by @parthmannan :: PR: #3214
Synchronize the request counts for EP inference with strict matching by @santhnm2 :: PR: #3033
Fix coordinator address collision check in flask by @tdene :: PR: #3208
Do not let requests fail silently inside inference engine by @tdene :: PR: #3228
torch saver inference model offload by @wdykas :: PR: #3170
enable cuda graph ut by @Autumn1998 :: PR: #3197
Support EP with HSDP by @wplf :: PR: #2840
[Main] Add the missing part to support 1F1B overlap for Qwen3-Next by @BestJuly :: PR: #2997
Missing import fix by @parthmannan :: PR: #3241
Miscellaneous inference cleanup (Replay of !2955) by @santhnm2 :: PR: #3232
Add DistributedInitConfig by @maanug-nv :: PR: #3173
Fix checkpoint converter missing parallel group initialization by @yashaswikarnati :: PR: #3217
Skip empty sequences and chunks in MTP tensor roll by @BestJuly :: PR: #3035
Implement get_parameters for ChainedOptimizer by @nschank :: PR: #3201
ci(fix): Create main/dev image tags by @ko3n1g :: PR: #3252
Reapply "Add MTP support for hybrid models (#2363)" by @sancha :: PR: #3207
Fix uv install for GH actions by @Phlip79 :: PR: #3259
Update the project structure in README by @janEbert :: PR: #3251
Cherry-pick: Fix mtp_num_layers and clip_qk issues (#2581, #2776) by @BestJuly :: PR: #3075
RL: training cudagraphs functional test by @mathemakitten :: PR: #3235
[Main] fix cg missing wgrad hook by @Wohox :: PR: #3074
Avoid .cuda call on meta device in LanguageModel by @nschank :: PR: #3202
fix checkpointing error message by @dimapihtar :: PR: #3203
Nano QAT/D fix with sft tokenizer and datasets by @ChenhanYu :: PR: #3254
Revert "fix checkpointing error message (#3203)" by @ko3n1g :: PR: #3283
Reapply "fix checkpointing error message (#3203)" (#3283) by @ko3n1g :: PR: #3285
docs: Add changelog for 0.15.3 by @ko3n1g :: PR: #3286
ci: Set throughput tests as flaky by @chtruong814 :: PR: #3301
chore: Move GB200 tests to nightly by @ko3n1g :: PR: #3302
Ensure type-checker understands use of Submodules in bert_model by @nschank :: PR: #3256
Override extra_repr instead of __repr__ by @nschank :: PR: #3200
Replace ModuleSpec with Protocols for LayerNorm submodules by @nschank :: PR: #3090
Non colocated refit by @wdykas :: PR: #3213
Fuse permute+pad and unpermute+unpad ops for FP8/FP4 training by @xiaoxi-wangfj :: PR: #2763
Add check to prevent MFSDP from numeric issue in gradient accumulate fusion by @shjwudp :: PR: #2904
update get_embedding_ranks and get_position_embedding_ranks docstrings by @c1lovez1 :: PR: #3223
Param offset in _ParamAndGradBucket should be aligned by @skydoorkai :: PR: #3007
ci: Add secrets detector by @chtruong814 :: PR: #3180
Ensure type-checker understands use of Submodules in llava_model by @nschank :: PR: #3257
updates to support modelopt EAGLE training with CP by @yeyu-nvidia :: PR: #3147
fully remove legacy tokenizer system by @dimapihtar :: PR: #2946
M-FSDP: Remove redundant stream waits in HSDP to prevent CG fail by @shjwudp :: PR: #2941
General README and pyproject fixes by @ahmadki :: PR: #2907
chore: More aggressive checkpointing by @ko3n1g :: PR: #3315
ci: Pin down setuptools to lt 82 by @ko3n1g :: PR: #3313
fix: numpy overflow by @ko3n1g :: PR: #3306
fix: T5 dataset by @ko3n1g :: PR: #3307
ci: Revert "ci: Add secrets detector (#3180)" by @chtruong814 :: PR: #3330
ci: Add more tests, run on merge-queue by @ko3n1g :: PR: #3317
ci: Remove merge-gate environment check by @chtruong814 :: PR: #3331
Use FP4 context for mamba by @kwyss-nvidia :: PR: #2604
ci: Ensure we run all functional tests in merge group by @chtruong814 :: PR: #3332
Replace ModuleSpec with Protocols for inputs to MLP by @nschank :: PR: #3084
ci: Fix merge queue functional tests by @chtruong814 :: PR: #3337
ci: skip queue in merge-gate by @ko3n1g :: PR: #3343
ci: Timeout for functional tests by @ko3n1g :: PR: #3346
update checkpointing documentation by @dimapihtar :: PR: #3347
Update golden values to reflect improvements by @tdene :: PR: #3350
BUGFIX: gpt vs hybrid model mtp naming mismatch by @sancha :: PR: #3334
Disable flaky test by @tdene :: PR: #3354
re-enable gpt grpo tests by @jon-barker :: PR: #3348
Fix SFT Pipeline when TP>1 by @asolergi-nv :: PR: #3268
Fixes for KD mode by @AAnoosheh :: PR: #3342
chore: Update codeowners file by @ko3n1g :: PR: #3365
Siddharth/fix inference functional tests by @sidsingh-nvidia :: PR: #3357
Switch oncall by @janEbert :: PR: #3360
Add missing RMSNorm to llama train script by @AAnoosheh :: PR: #3314
Fix inference for MTP models by @tdene :: PR: #3297
Add a logprobs test with real gpt model. by @yobibyte :: PR: #2870
Add simple GRPO functional test by @tdene :: PR: #3323
ci: Concurrency control for merge-queue by @ko3n1g :: PR: #3353
ci: Update golden value download script to work with Github by @chtruong814 :: PR: #3335
fix: correct typos 'seperated' and 'recieved' by @thecaptain789 :: PR: #3305
Improved PyTorch profiler and added PyTorch execution trace by @shengf-nv :: PR: #3273
Removing etc from main index page, shifted...

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Significant library update from NVIDIA