What does this repo signal mean?

NVIDIA published NVIDIA/Megatron-LM (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo NVIDIA/Megatron-LM · language Python. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Infrastructure in the data-business radar.

NVIDIA Repo: NVIDIA/Megatron-LM

Captured source

source ↗

GitHub/github.com/NVIDIA/Megatron-LM

NVIDIA/Megatron-LM repository metadata

Source ↗

published Mar 21, 2019seen 5dcaptured 8hhttp 200method plain

NVIDIA/Megatron-LM

Description: Ongoing research training transformer models at scale

Language: Python

License: NOASSERTION

Stars: 16659

Forks: 4062

Open issues: 880

Created: 2019-03-21T16:15:52Z

Pushed: 2026-06-11T03:34:03Z

Default branch: main

Fork: no

Archived: no

README:

Megatron-LM and Megatron Core =============================

GPU-optimized library for training transformer models at scale

About

This repository contains two components: Megatron-LM and Megatron Core.

Megatron-LM is a reference example that includes Megatron Core plus pre-configured training scripts, ideal for research teams, learning distributed training, and quick experimentation.

Megatron Core is a composable library with GPU-optimized building blocks for custom training frameworks. It provides transformer building blocks, advanced parallelism strategies (TP, PP, DP, EP, and CP), mixed precision support (FP16, BF16, FP8, and FP4), and model architectures, ideal for framework developers and ML engineers building custom training pipelines.

[Megatron Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) provides bidirectional Hugging Face ↔ Megatron checkpoint conversion with production-ready recipes.

Getting Started

Install from PyPI:

uv pip install megatron-core

Or clone and install from source:

git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM
uv pip install -e .

> Note: Building from source can use a lot of memory. If the build runs out of memory, limit parallel compilation jobs by setting MAX_JOBS (for example, MAX_JOBS=4 uv pip install -e .).

For NVIDIA GPU Cloud (NGC) container setup and all installation options, review the [Installation Guide](https://docs.nvidia.com/megatron-core/developer-guide/latest/get-started/install.html).

[Your First Training Run](https://docs.nvidia.com/megatron-core/developer-guide/latest/get-started/quickstart.html) - End-to-end training examples with data preparation
[Parallelism Strategies](https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/parallelism-guide.html) - Scale training across GPUs with TP, PP, DP, EP, and CP
[Contribution Guide](https://docs.nvidia.com/megatron-core/developer-guide/latest/developer/contribute.html) - How to contribute to Megatron Core

Latest News

[2026/05] [DeepSeek-V4 initial support](https://github.com/NVIDIA/Megatron-LM/issues/4468) - Megatron Core's dev branch includes the initial DeepSeek-V4 implementation; Megatron Bridge provides conversion, inference, and pretraining recipes.
[2026/04] [Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron](https://developer.nvidia.com/blog/advancing-emerging-optimizers-for-accelerated-llm-training-with-nvidia-megatron/) - Muon and other emerging optimizers are now supported in Megatron Core via the new [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) library.
[2026/03] [Scalable Training of Mixture-of-Experts Models with Megatron Core](https://arxiv.org/abs/2603.07685) - Technical report on scaling MoE training with integrated optimizations for memory, communication, and computation.
[2026/03] [Implementing Falcon-H1 Hybrid Architecture in Megatron Core](https://developer.nvidia.com/blog/implementing-falcon-h1-hybrid-architecture-in-nvidia-megatron-core/) - Technology Innovation Institute (TII) contributes Falcon-H1 hybrid transformer-Mamba architecture and BitNet ternary quantization support to Megatron Core.
[2026/03] [Megatron Core Roadmap](https://github.com/NVIDIA/Megatron-LM/issues/4003) - Roadmap for upcoming Megatron Core features and improvements.
[2026/03] Deprecating Python 3.10 support: The upcoming 0.17.0 release drops Python 3.10 support. Downstream applications must raise their lower boundary to 3.12 to stay compatible with Megatron Core.
[2026/01] [Dynamic Context Parallelism](https://developer.nvidia.com/blog/speeding-up-variable-length-training-with-dynamic-context-parallelism-and-nvidia-megatron-core/) - Up to 1.48x speedup for variable-length sequence training with adaptive CP sizing.
[2025/12] Megatron Core development has moved to GitHub. All development and CI now happen in the open, and community contributions are welcome.
[2025/10] [Megatron Dev Branch](https://github.com/NVIDIA/Megatron-LM/tree/dev) - Early access branch with experimental features.
[2025/10] [Megatron Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) - Bidirectional converter for interoperability between Hugging Face and Megatron checkpoints, featuring production-ready recipes for popular models.
[2025/08] [Mixture of Experts (MoE) Q3–Q4 2025 Roadmap](https://github.com/NVIDIA/Megatron-LM/issues/1729) - Comprehensive roadmap for MoE features including DeepSeek-V3, Qwen3, advanced parallelism strategies, FP8 optimizations, and Blackwell performance enhancements.
[2025/08] [GPT-OSS Model](https://github.com/NVIDIA/Megatron-LM/issues/1739) - Megatron Core integrates advanced features including YaRN RoPE scaling, attention sinks, and custom activation functions.
[2025/06] [Megatron MoE Model Zoo](https://github.com/yanring/Megatron-MoE-ModelZoo) - Best practices and optimized configurations for training DeepSeek-V3, Mixtral, and Qwen3 MoE models with performance benchmarking and checkpoint conversion tools.

[Previous News](docs/discussions/README.md#previous-news)

Project Structure

Megatron-LM/
├── megatron/
│ ├── core/ # Megatron Core (kernels, parallelism, building blocks)
│ │ ├── models/ # Transformer models
│ │ ├── transformer/ # Transformer building blocks
│ │ ├── tensor_parallel/ # Tensor parallelism
│ │ ├── pipeline_parallel/ # Pipeline parallelism
│ │ ├── distributed/ # Distributed training (FSDP, DDP)
│ │ ├── optimizer/ # Optimizers
│ │ ├── datasets/ # Dataset loaders
│ │ ├── inference/ # Inference engines and server
│ │ └── export/ # Model export (example: TensorRT-LLM)
│ ├── training/ # Training scripts
│ ├── legacy/ # Legacy components
│ ├── post_training/ # Post-training (quantization, distillation, pruning, etc.)
│ └── rl/ # Reinforcement learning (including RLHF)
├── examples/ # Ready-to-use training examples
├── tools/ # Utility tools
├── tests/ #…

Excerpt shown — open the source for the full document.