sarvamai/Megatron-LM
forked from NVIDIA/Megatron-LM
Captured source
source ↗sarvamai/Megatron-LM
Description: Ongoing research training transformer models at scale
Language: Python
License: NOASSERTION
Stars: 0
Forks: 0
Open issues: 0
Created: 2026-01-05T05:57:56Z
Pushed: 2026-01-05T10:55:27Z
Default branch: main
Fork: yes
Parent repository: NVIDIA/Megatron-LM
Archived: no
README:
Megatron-LM & Megatron Core =========================== GPU-optimized library for training transformer models at scale
⚡ Quick Start
# 1. Install Megatron Core with required dependencies pip install megatron-core pip install --no-build-isolation transformer-engine[pytorch] # 2. Clone repository for examples git clone https://github.com/NVIDIA/Megatron-LM.git cd Megatron-LM
→ [Complete Installation Guide](#installation) - Docker, pip variants (dev,lts,etc.), source installation, and system requirements
Latest News
- 📣 NEW! [DeepSeek & MoE Training with FP8](https://github.com/yanring/Megatron-MoE-ModelZoo) examples are now available, including optimized configurations for
DeepSeek-V3,Qwen2andMixtralmodels with FP8 precision support. - [2025/05] Megatron Core v0.11.0 brings new capabilities for multi-data center LLM training (blog).
Previous News
- [2024/07] Megatron Core v0.7 improves scalability and training resiliency and adds support for multimodal training (blog).
- [2024/06] Megatron Core added supports for Mamba-based models. Check out our paper An Empirical Study of Mamba-based Language Models and code example.
- [2024/01 Announcement] NVIDIA has released the core capabilities in Megatron-LM into **Megatron Core** in this repository. Megatron Core expands upon Megatron-LM's GPU-optimized techniques with more cutting-edge innovations on system-level optimizations, featuring composable and modular APIs. Explore the [Megatron Core intro](#Megatron Core) for more details.
Table of Contents
Getting Started
- [Quick Start](#-quick-start)
- [Latest News](#latest-news)
- [Megatron Overview](#megatron-overview)
- [Project Structure](#project-structure)
- [Megatron-LM: Reference Implementation](#megatron-lm-reference-implementation)
- [Megatron Core: Production Library](#megatron-core-production-library)
- [Installation](#installation)
- [Docker (Recommended)](#-docker-recommended)
- [Pip Installation](#-pip-installation)
- [Source Installation](#-source-installation)
- [System Requirements](#system-requirements)
Core Features
- [Performance Benchmarking](#performance-benchmarking)
- [Weak Scaling Results](#weak-scaling-results)
- [Strong Scaling Results](#strong-scaling-results)
- [Ecosystem Libraries](#ecosystem-libraries)
Training
- [Training](#training)
- [Getting Started](#getting-started)
- [Data Preparation](#data-preparation)
- [Parallelism Strategies](#parallelism-strategies)
- [Data Parallelism (DP)](#data-parallelism-dp)
- [Tensor Parallelism (TP)](#tensor-parallelism-tp)
- [Pipeline Parallelism (PP)](#pipeline-parallelism-pp)
- [Context Parallelism (CP)](#context-parallelism-cp)
- [Expert Parallelism (EP)](#expert-parallelism-ep)
- [Parallelism Selection Guide](#parallelism-selection-guide)
- [Performance Optimizations](#performance-optimizations)
Resources
- [Examples](./examples/) - Training scripts and tutorials
- Documentation - Official docs
- [Community & Support](#-community--support) - Get help and contribute
- [Getting Help](#getting-help)
- [Contributing](#contributing)
- [Citation](#citation)
Megatron Overview
Project Structure
Megatron-LM/ ├── megatron/ │ ├── core/ # Megatron Core (kernels, parallelism, building blocks) │ │ ├── models/ # Transformer models │ │ ├── transformer/ # Transformer building blocks │ │ ├── tensor_parallel/ # Tensor parallelism │ │ ├── pipeline_parallel/ # Pipeline parallelism │ │ ├── distributed/ # Distributed training (FSDP, DDP) │ │ ├── optimizer/ # Optimizers │ │ ├── datasets/ # Dataset loaders │ │ ├── inference/ # Inference engines │ │ └── export/ # Model export (e.g. TensorRT-LLM) │ ├── training/ # Training scripts │ ├── inference/ # Inference server │ ├── legacy/ # Legacy components │ └── post_training/ # Post-training (RLHF, etc.) ├── examples/ # Ready-to-use training examples ├── tools/ # Utility tools ├── tests/ # Comprehensive test suite └── docs/ # Documentation
Megatron-LM: Reference Implementation
Reference implementation that includes Megatron Core plus everything needed to train models.
Best for:
- Training state-of-the-art foundation models at scale with cutting-edge performance on latest NVIDIA hardware
- Research teams exploring new architectures and training techniques
- Learning distributed training concepts and best practices
- Quick experimentation with proven model configurations
What you get:
- Pre-configured training scripts for GPT, LLama, DeepSeek, Qwen, and more.
- End-to-end examples from data prep to evaluation
- Research-focused tools and utilities
Megatron Core: Composable Library
Composable library with GPU-optimized building blocks for custom training frameworks.
Best for:
- Framework developers building on top of modular and optimized components
- Research teams needing custom training loops, optimizers, or data pipelines
- ML engineers requiring fault-tolerant training pipelines
What you get:
- Composable transformer building blocks (attention, MLP, etc.)
- Advanced parallelism strategies (TP, PP, DP, EP, CP)
- Pipeline schedules and distributed optimizers
- Mixed precision support (FP16, BF16, FP8)
- GPU-optimized kernels and memory management
- High-performance dataloaders and dataset utilities
- Model architectures (LLaMA, Qwen, GPT, Mixtral, Mamba, etc.)
Ecosystem Libraries
Libraries used by Megatron Core:
- [Megatron Energon](https://github.com/NVIDIA/Megatron-Energon) 📣 NEW! - Multi-modal data loader (text, images, video, audio) with distributed loading and dataset blending
- **[Transformer…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Routine fork of existing repo