sarvamai/Megatron-Bridge
forked from NVIDIA-NeMo/Megatron-Bridge
Captured source
source ↗sarvamai/Megatron-Bridge
Description: HuggingFace conversion and training library for Megatron-based models
Language: Python
License: Apache-2.0
Stars: 0
Forks: 0
Open issues: 0
Created: 2025-12-28T11:09:08Z
Pushed: 2026-03-27T12:05:06Z
Default branch: main
Fork: yes
Parent repository: NVIDIA-NeMo/Megatron-Bridge
Archived: no
README:
📣 News
- [03/12/2026] Deprecating Python 3.10 support: We're officially dropping Python 3.10 support with the upcoming 0.4.0 release. Downstream applications must raise their lower boundary to 3.12 to stay compatible with Megatron-Bridge.
- [12/16/2025] Mind Lab successfully used Megatron-bridge and VeRL to trained GRPO Lora for Trillion-parameter model on 64 H800 - See their techblog.
- [12/15/2025] Day 0 support for NVIDIA-NeMotron-3-Nano-30B-A3B-FP8! Reproducible code and custom NGC container: nvcr.io/nvidia/nemo:25.11.nemotron_3_nano
Overview
NeMo Megatron Bridge is a PyTorch-native library within the NeMo Framework that provides pretraining, SFT and LoRA for popular LLM and VLM models. It serves as a powerful bridge, conversion, and verification layer between 🤗 Hugging Face and Megatron Core. It provides bidirectional checkpoint conversion between these formats, enabling other projects to leverage Megatron Core's parallelism capabilities or export models for various inference engines. The bridge includes built-in verification mechanisms to ensure conversion accuracy and checkpoint integrity across different model formats.
On top of the bridge, NeMo Megatron Bridge provides a performant and scalable PyTorch-native training loop that leverages Megatron Core to deliver state-of-the-art training throughput. It supports pretraining and fine-tuning with features like tensor and pipeline parallelism, and mixed precision (FP8, BF16, FP4, etc.). Users can either use existing 🤗 Hugging Face models or define custom PyTorch model definitions for flexible end-to-end workflows.
NeMo Megatron Bridge is a refactor of the previous NeMo training stack that adopts a PyTorch-native training loop to provide greater flexibility and customizability for developers.

🔧 Installation
🐳 NeMo Framework container
The best experience, highest performance, and full feature support are provided by the NeMo Framework container. Fetch the most recent $TAG and run the following to start a container:
docker run --rm -it -w /workdir -v $(pwd):/workdir \
--entrypoint bash \
--gpus all \
nvcr.io/nvidia/nemo:${TAG}For development installation and additional details, please refer to our Contribution guide.
⚡ Quickstart
To get started, install Megatron Bridge or download a NeMo Framework container as described [above](#-installation).
Log in to Hugging Face Hub:
huggingface-cli login --token
Conversion-only quickstart (✅ Core):
from megatron.bridge import AutoBridge
# 1) Create a bridge from a Hugging Face model (hub or local path)
bridge = AutoBridge.from_hf_pretrained("meta-llama/Llama-3.2-1B", trust_remote_code=True)
# 2) Get a Megatron provider and configure parallelism before instantiation
provider = bridge.to_megatron_provider()
provider.tensor_model_parallel_size = 1
provider.pipeline_model_parallel_size = 1
provider.finalize()
# 3) Materialize Megatron Core model(s)
model = provider.provide_distributed_model(wrap_with_ddp=False)
# 4a) Export Megatron → Hugging Face (full HF folder with config/tokenizer/weights)
bridge.save_hf_pretrained(model, "./hf_exports/llama32_1b")
# 4b) Or stream only weights (Megatron → HF)
for name, weight in bridge.export_hf_weights(model, cpu=True):
print(name, tuple(weight.shape))Training quickstart using pre-configured recipes:
from megatron.bridge.recipes.llama import llama32_1b_pretrain_config from megatron.bridge.training.gpt_step import forward_step from megatron.bridge.training.pretrain import pretrain if __name__ == "__main__": # The recipe uses the Llama 3.2 1B model configuration from HuggingFace cfg = llama32_1b_pretrain_config(seq_length=1024) # Override training parameters cfg.train.train_iters = 10 cfg.scheduler.lr_decay_iters = 10000 cfg.model.vocab_size = 8192 cfg.tokenizer.vocab_size = cfg.model.vocab_size pretrain(cfg, forward_step)
You can launch the above script with:
torchrun --nproc-per-node= /path/to/script.py
More examples:
- Conversion scripts overview
- Import/Export checkpoints
- Generation with bridge
- Multi-GPU loading from HF
- Compare HF vs Megatron outputs
- Toy RLHF with Bridge (HF inference + Megatron training)
For a deeper dive into conversion design and advanced usage, see the models README.
🚀 Key Features
- Bridge with 🤗 Hugging Face: Seamless bidirectional conversion between 🤗 Hugging Face and Megatron formats for interoperability (model bridges, [auto…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Routine fork with no traction