What does this release signal mean?

NVIDIA published NVIDIA/Megatron-LM core_v0.18.0 (NVIDIA/Megatron-LM). This release signal is evidence of what shipped, changed, or was packaged for users. High-signal details: NVIDIA's core library for efficient large-scale transformer training. · NVIDIA Megatron Core 0.18.0 Repository: NVIDIA/Megatron-LM Tag: core_v0.18.0 Published: 2026-06-23T00:16:28Z Prerelease: no Release notes: Changelog.... onlylabs links this event to 1 captured evidence page and 6 related release signals.

NVIDIA Release: NVIDIA/Megatron-LM core_v0.18.0

Captured source

source ↗

GitHub/github.com/NVIDIA/Megatron-LM

NVIDIA/Megatron-LM core_v0.18.0

Source ↗

published Jun 23, 2026seen 3dcaptured 3dhttp 200method plain

NVIDIA Megatron Core 0.18.0

Repository: NVIDIA/Megatron-LM

Tag: core_v0.18.0

Published: 2026-06-23T00:16:28Z

Prerelease: no

Release notes: Changelog Details

fix(ci): replace actions/setup-python with apt-get to avoid 429 rate limits by @ko3n1g :: PR: #4072
ci: Fix package name for code-freeze workflow by @ko3n1g :: PR: #4077
chore: bump _code_freeze workflow to v0.86.0 by @ko3n1g :: PR: #4078
Fix checkpoint inspector by @janEbert :: PR: #4079
Update docs to conform to NVIDIA style guides by @megnvidia :: PR: #4068
Miscellaneous inference fixes by @santhnm2 :: PR: #4030
fix fine_grained_callables with fused rmsnorm residual by @CarlosGomes98 :: PR: #4026
[Main][feat] Support overlapping A2A Combine backprop with wgrad GEMM by @Wohox :: PR: #3795
Modify mfsdp default data-parallel-sharding-strategy by @wplf :: PR: #3691
Fix fsdp_dtensor conversion for pretrained-only checkpoints by @DAISY-gh :: PR: #3912
Guard NVshmem issues by @wdykas :: PR: #4093
m-fsdp: wire use_precision_aware_optimizer from ddp_config to ParamAn… by @rapatel :: PR: #4024
Megatron-FSDP: Add MXFP8 transpose helper buffer for Hybrid FSDP by @shjwudp :: PR: #3918
feat(fsdp): use TE general_gemm for mixed-precision wgrad in FSDP path by @Victarry :: PR: #3822
Megatron-FSDP: Fix insufficient double buffers during gradient reduce by @shjwudp :: PR: #4054
Fix M-FSDP MXFP8 related BUGs by @shjwudp :: PR: #3991
Megatron-FSDP: Make _pre_forward_param_unshard and _register_post_backward_hook formal by @shjwudp :: PR: #4029
FIX: Use decoupled gradients for precision-aware M-FSDP grad norm by @XueSongTap :: PR: #3746
Align chat completions endpoint with vLLM by @santhnm2 :: PR: #4063
[Megatron-FSDP] Fix compatibility with frozen parameters and add unit tests by @shjwudp :: PR: #3287
[M-FSDP] Refactor uneven dtensor to full tensor and add UT by @shjwudp :: PR: #3190
Add agent instruction files by @Phlip79 :: PR: #4102
Bump eopt version by @skyw :: PR: #4100
Refactor emerging optimizer integration by @skyw :: PR: #4113
Fix over provisioning of Mamba state memory when max_requests is set by @santhnm2 :: PR: #4114
base strategy simplification by @dimapihtar :: PR: #4001
add support for DCP and FSDP async save by @dimapihtar :: PR: #4027
Add more emerging optimizers (#3907) by @skyw :: PR: #4119
Fix FSDP checkpoint conversion and loading for Qwen3.5-VL by @DAISY-gh :: PR: #3936
docs: update mcore optimizer docstrings to google style by @Akshat8510 :: PR: #2799
Set tensor-parallel attributes irrespective of perform_initialization by @ilml :: PR: #4084
docs: add developer-guide skill with CI/CD and failure navigation guidance by @ko3n1g :: PR: #4035
chore: Move skills by @ko3n1g :: PR: #4136
ci: Let Claude react to comment by @ko3n1g :: PR: #4135
Nemotron3 Super GB200 release config by @maanug-nv :: PR: #4118
Enable CUDA graph for ADAM optimizer by @vasunvidia :: PR: #3429
Claude review should recommend testing by @Phlip79 :: PR: #4137
cleanup: remove unused scatter_gather_tensors_in_pipeline argument by @Phlip79 :: PR: #4140
fix: Remove fail-fast (-x) and guard distributed teardown against deadlock by @ko3n1g :: PR: #4139
Claude: add respond-to-issue skill by @Phlip79 :: PR: #4141
Fix muon getter backward compatability by @skyw :: PR: #4157
Audit of user guide by @megnvidia :: PR: #4098
Fix RerunStateMachine crash (TypeError: 'NoneType' object is not subscriptable) by not saving a checkpoint after a transient NaN / Inf by @yezhengmao1 :: PR: #3981
Preserve type of decorated methods/classes by @nschank :: PR: #4062
update muon test case to use new interface by @skyw :: PR: #4163
[M-FSDP] Fix Tensor Parallel mode detection by @shjwudp :: PR: #3191
fix: remove weights_only=False for multimodal example by @faradawn :: PR: #4104
Cudagraphs: Fix sequence packing segfault more generally by @mathemakitten :: PR: #4162
Make MTP work with materialize_only_last_token_logits by @santhnm2 :: PR: #4166
Add unit test for Mamba EP inference (eager fallback with mixed CUDA graphs) by @santhnm2 :: PR: #4085
update docs in respect to async changes by @dimapihtar :: PR: #4177
update checkpointing docs in respect to async changes by @dimapihtar :: PR: #4208
chore: improve build-and-test skill with trigger rules and dependency workflow by @ko3n1g :: PR: #4199
Fix layerwise optimizer with expt_dp_size=1 and contention with element-wise distributed optimizer by @skyw :: PR: #4138
ci: add --cluster-a100/h100/gb200 args to trigger_internal_ci.py by @ko3n1g :: PR: #4195
ci: Update golden values for nightly tests by @chtruong814 :: PR: #4215
rename async_allgather to overlap_param_gather by @skyw :: PR: #4217
Fix Slack sync for users with GitHub email privacy enabled by @Phlip79 :: PR: #4220
Miscellaneous MTP inference fixes by @santhnm2 :: PR: #4191
Move inference guards out of arguments.py by @mathemakitten :: PR: #4210
Fix: enable fine-grained activation offloading for Mamba model. by @fanshiqing :: PR: #4173
bump NVRx by @dimapihtar :: PR: #4178
Update tokenizer args for Nemotron3 release config by @maanug-nv :: PR: #4239
build: add dynamic git-versioning and drop rc0 pre-release tag by @ko3n1g :: PR: #4212
Fix unnecessary permute padding for non-quantized MoE dispatch by @xiaoxi-wangfj :: PR: #4038
Fix split state dict main by @kunlunl :: PR: #3676
Add /split-pr Claude Code command for splitting PRs by CODEOWNERS by @Phlip79 :: PR: #4160
Enable FP8 DPA for MXFP8 recipe by @vasunvidia :: PR: #4066
Enable AG/RS overlap with explicit process group passing by @jeffnvidia :: PR: #3249
Enable cpu_offloading with Full iteration CUDA graph by @vasunvidia :: PR: #3969
Fix TransformerConfig validation for mixed dense/MoE upcycling by @rkteddy :: PR: #3647
Remove cross-rank synchronization during checkpoint load & deprecate torch.distributed.checkpoint.state_dict_loader.load_state_dict by @asolergi-nv :: PR: #2864
Fix incorrectly set decoupled_grad and DistOpt mechanics for MFSDP. by @cspades :: PR: #4133
Refit Miscelaneous by @wdykas :: PR: #3973
Add conditions_embeddings argument to TransformerBlock, TransformerLayer for DiT (diffusion transformer) by @huvunvidia :: PR: #4134
Fix build_sequences_per_dataset output path arg usage by @DhineshPonnarasan :: PR: #4144
ci: Flush pending CUDA work before the barrier in destroy_model_parallel by @chtruong814 :: PR: #4259
Update oncall schedule...

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine version release, no major breakthrough indicated.