NVIDIA/Megatron-LM
Python
Captured source
source ↗NVIDIA/Megatron-LM
Description: Ongoing research training transformer models at scale
Language: Python
License: NOASSERTION
Stars: 16659
Forks: 4062
Open issues: 880
Created: 2019-03-21T16:15:52Z
Pushed: 2026-06-11T03:34:03Z
Default branch: main
Fork: no
Archived: no
README:
Megatron-LM and Megatron Core =============================
GPU-optimized library for training transformer models at scale
About
This repository contains two components: Megatron-LM and Megatron Core.
Megatron-LM is a reference example that includes Megatron Core plus pre-configured training scripts, ideal for research teams, learning distributed training, and quick experimentation.
Megatron Core is a composable library with GPU-optimized building blocks for custom training frameworks. It provides transformer building blocks, advanced parallelism strategies (TP, PP, DP, EP, and CP), mixed precision support (FP16, BF16, FP8, and FP4), and model architectures, ideal for framework developers and ML engineers building custom training pipelines.
[Megatron Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) provides bidirectional Hugging Face ↔ Megatron checkpoint conversion with production-ready recipes.
Getting Started
Install from PyPI:
uv pip install megatron-core
Or clone and install from source:
git clone https://github.com/NVIDIA/Megatron-LM.git cd Megatron-LM uv pip install -e .
> Note: Building from source can use a lot of memory. If the build runs out of memory, limit parallel compilation jobs by setting MAX_JOBS (for example, MAX_JOBS=4 uv pip install -e .).
For NVIDIA GPU Cloud (NGC) container setup and all installation options, review the [Installation Guide](https://docs.nvidia.com/megatron-core/developer-guide/latest/get-started/install.html).
- [Your First Training Run](https://docs.nvidia.com/megatron-core/developer-guide/latest/get-started/quickstart.html) - End-to-end training examples with data preparation
- [Parallelism Strategies](https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/parallelism-guide.html) - Scale training across GPUs with TP, PP, DP, EP, and CP
- [Contribution Guide](https://docs.nvidia.com/megatron-core/developer-guide/latest/developer/contribute.html) - How to contribute to Megatron Core
Latest News
- [2026/05] [DeepSeek-V4 initial support](https://github.com/NVIDIA/Megatron-LM/issues/4468) - Megatron Core's
devbranch includes the initial DeepSeek-V4 implementation; Megatron Bridge provides conversion, inference, and pretraining recipes. - [2026/04] [Advancing Emerging Optimizers for Accelerated LLM Training with NVIDIA Megatron](https://developer.nvidia.com/blog/advancing-emerging-optimizers-for-accelerated-llm-training-with-nvidia-megatron/) - Muon and other emerging optimizers are now supported in Megatron Core via the new [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) library.
- [2026/03] [Scalable Training of Mixture-of-Experts Models with Megatron Core](https://arxiv.org/abs/2603.07685) - Technical report on scaling MoE training with integrated optimizations for memory, communication, and computation.
- [2026/03] [Implementing Falcon-H1 Hybrid Architecture in Megatron Core](https://developer.nvidia.com/blog/implementing-falcon-h1-hybrid-architecture-in-nvidia-megatron-core/) - Technology Innovation Institute (TII) contributes Falcon-H1 hybrid transformer-Mamba architecture and BitNet ternary quantization support to Megatron Core.
- [2026/03] [Megatron Core Roadmap](https://github.com/NVIDIA/Megatron-LM/issues/4003) - Roadmap for upcoming Megatron Core features and improvements.
- [2026/03] Deprecating Python 3.10 support: The upcoming 0.17.0 release drops Python 3.10 support. Downstream applications must raise their lower boundary to 3.12 to stay compatible with Megatron Core.
- [2026/01] [Dynamic Context Parallelism](https://developer.nvidia.com/blog/speeding-up-variable-length-training-with-dynamic-context-parallelism-and-nvidia-megatron-core/) - Up to 1.48x speedup for variable-length sequence training with adaptive CP sizing.
- [2025/12] Megatron Core development has moved to GitHub. All development and CI now happen in the open, and community contributions are welcome.
- [2025/10] [Megatron Dev Branch](https://github.com/NVIDIA/Megatron-LM/tree/dev) - Early access branch with experimental features.
- [2025/10] [Megatron Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) - Bidirectional converter for interoperability between Hugging Face and Megatron checkpoints, featuring production-ready recipes for popular models.
- [2025/08] [Mixture of Experts (MoE) Q3–Q4 2025 Roadmap](https://github.com/NVIDIA/Megatron-LM/issues/1729) - Comprehensive roadmap for MoE features including DeepSeek-V3, Qwen3, advanced parallelism strategies, FP8 optimizations, and Blackwell performance enhancements.
- [2025/08] [GPT-OSS Model](https://github.com/NVIDIA/Megatron-LM/issues/1739) - Megatron Core integrates advanced features including YaRN RoPE scaling, attention sinks, and custom activation functions.
- [2025/06] [Megatron MoE Model Zoo](https://github.com/yanring/Megatron-MoE-ModelZoo) - Best practices and optimized configurations for training DeepSeek-V3, Mixtral, and Qwen3 MoE models with performance benchmarking and checkpoint conversion tools.
[Previous News](docs/discussions/README.md#previous-news)
Project Structure
Megatron-LM/ ├── megatron/ │ ├── core/ # Megatron Core (kernels, parallelism, building blocks) │ │ ├── models/ # Transformer models │ │ ├── transformer/ # Transformer building blocks │ │ ├── tensor_parallel/ # Tensor parallelism │ │ ├── pipeline_parallel/ # Pipeline parallelism │ │ ├── distributed/ # Distributed training (FSDP, DDP) │ │ ├── optimizer/ # Optimizers │ │ ├── datasets/ # Dataset loaders │ │ ├── inference/ # Inference engines and server │ │ └── export/ # Model export (example: TensorRT-LLM) │ ├── training/ # Training scripts │ ├── legacy/ # Legacy components │ ├── post_training/ # Post-training (quantization, distillation, pruning, etc.) │ └── rl/ # Reinforcement learning (including RLHF) ├── examples/ # Ready-to-use training examples ├── tools/ # Utility tools ├── tests/ #…
Excerpt shown — open the source for the full document.