ForkSarvam AISarvam AIpublished Jan 5, 2026seen 5d

sarvamai/Megatron-LM

forked from NVIDIA/Megatron-LM

Open original ↗

Captured source

source ↗
published Jan 5, 2026seen 5dcaptured 9hhttp 200method plain

sarvamai/Megatron-LM

Description: Ongoing research training transformer models at scale

Language: Python

License: NOASSERTION

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-01-05T05:57:56Z

Pushed: 2026-01-05T10:55:27Z

Default branch: main

Fork: yes

Parent repository: NVIDIA/Megatron-LM

Archived: no

README:

Megatron-LM & Megatron Core =========================== GPU-optimized library for training transformer models at scale

⚡ Quick Start

# 1. Install Megatron Core with required dependencies
pip install megatron-core
pip install --no-build-isolation transformer-engine[pytorch]

# 2. Clone repository for examples
git clone https://github.com/NVIDIA/Megatron-LM.git
cd Megatron-LM

→ [Complete Installation Guide](#installation) - Docker, pip variants (dev,lts,etc.), source installation, and system requirements

Latest News

  • 📣 NEW! [DeepSeek & MoE Training with FP8](https://github.com/yanring/Megatron-MoE-ModelZoo) examples are now available, including optimized configurations for DeepSeek-V3, Qwen2 and Mixtral models with FP8 precision support.
  • [2025/05] Megatron Core v0.11.0 brings new capabilities for multi-data center LLM training (blog).

Previous News

  • [2024/07] Megatron Core v0.7 improves scalability and training resiliency and adds support for multimodal training (blog).
  • [2024/06] Megatron Core added supports for Mamba-based models. Check out our paper An Empirical Study of Mamba-based Language Models and code example.
  • [2024/01 Announcement] NVIDIA has released the core capabilities in Megatron-LM into **Megatron Core** in this repository. Megatron Core expands upon Megatron-LM's GPU-optimized techniques with more cutting-edge innovations on system-level optimizations, featuring composable and modular APIs. Explore the [Megatron Core intro](#Megatron Core) for more details.

Table of Contents

Getting Started

  • [Quick Start](#-quick-start)
  • [Latest News](#latest-news)
  • [Megatron Overview](#megatron-overview)
  • [Project Structure](#project-structure)
  • [Megatron-LM: Reference Implementation](#megatron-lm-reference-implementation)
  • [Megatron Core: Production Library](#megatron-core-production-library)
  • [Installation](#installation)
  • [Docker (Recommended)](#-docker-recommended)
  • [Pip Installation](#-pip-installation)
  • [Source Installation](#-source-installation)
  • [System Requirements](#system-requirements)

Core Features

  • [Performance Benchmarking](#performance-benchmarking)
  • [Weak Scaling Results](#weak-scaling-results)
  • [Strong Scaling Results](#strong-scaling-results)
  • [Ecosystem Libraries](#ecosystem-libraries)

Training

  • [Training](#training)
  • [Getting Started](#getting-started)
  • [Data Preparation](#data-preparation)
  • [Parallelism Strategies](#parallelism-strategies)
  • [Data Parallelism (DP)](#data-parallelism-dp)
  • [Tensor Parallelism (TP)](#tensor-parallelism-tp)
  • [Pipeline Parallelism (PP)](#pipeline-parallelism-pp)
  • [Context Parallelism (CP)](#context-parallelism-cp)
  • [Expert Parallelism (EP)](#expert-parallelism-ep)
  • [Parallelism Selection Guide](#parallelism-selection-guide)
  • [Performance Optimizations](#performance-optimizations)

Resources

  • [Examples](./examples/) - Training scripts and tutorials
  • Documentation - Official docs
  • [Community & Support](#-community--support) - Get help and contribute
  • [Getting Help](#getting-help)
  • [Contributing](#contributing)
  • [Citation](#citation)

Megatron Overview

Project Structure

Megatron-LM/
├── megatron/
│ ├── core/ # Megatron Core (kernels, parallelism, building blocks)
│ │ ├── models/ # Transformer models
│ │ ├── transformer/ # Transformer building blocks
│ │ ├── tensor_parallel/ # Tensor parallelism
│ │ ├── pipeline_parallel/ # Pipeline parallelism
│ │ ├── distributed/ # Distributed training (FSDP, DDP)
│ │ ├── optimizer/ # Optimizers
│ │ ├── datasets/ # Dataset loaders
│ │ ├── inference/ # Inference engines
│ │ └── export/ # Model export (e.g. TensorRT-LLM)
│ ├── training/ # Training scripts
│ ├── inference/ # Inference server
│ ├── legacy/ # Legacy components
│ └── post_training/ # Post-training (RLHF, etc.)
├── examples/ # Ready-to-use training examples
├── tools/ # Utility tools
├── tests/ # Comprehensive test suite
└── docs/ # Documentation

Megatron-LM: Reference Implementation

Reference implementation that includes Megatron Core plus everything needed to train models.

Best for:

  • Training state-of-the-art foundation models at scale with cutting-edge performance on latest NVIDIA hardware
  • Research teams exploring new architectures and training techniques
  • Learning distributed training concepts and best practices
  • Quick experimentation with proven model configurations

What you get:

  • Pre-configured training scripts for GPT, LLama, DeepSeek, Qwen, and more.
  • End-to-end examples from data prep to evaluation
  • Research-focused tools and utilities

Megatron Core: Composable Library

Composable library with GPU-optimized building blocks for custom training frameworks.

Best for:

  • Framework developers building on top of modular and optimized components
  • Research teams needing custom training loops, optimizers, or data pipelines
  • ML engineers requiring fault-tolerant training pipelines

What you get:

  • Composable transformer building blocks (attention, MLP, etc.)
  • Advanced parallelism strategies (TP, PP, DP, EP, CP)
  • Pipeline schedules and distributed optimizers
  • Mixed precision support (FP16, BF16, FP8)
  • GPU-optimized kernels and memory management
  • High-performance dataloaders and dataset utilities
  • Model architectures (LLaMA, Qwen, GPT, Mixtral, Mamba, etc.)

Ecosystem Libraries

Libraries used by Megatron Core:

  • [Megatron Energon](https://github.com/NVIDIA/Megatron-Energon) 📣 NEW! - Multi-modal data loader (text, images, video, audio) with distributed loading and dataset blending
  • **[Transformer…

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine fork of existing repo