NousResearch/Megatron-Bridge
forked from NVIDIA-NeMo/Megatron-Bridge
Captured source
source ↗published May 27, 2026seen 5dcaptured 9hhttp 200method plain
NousResearch/Megatron-Bridge
Description: Training library for Megatron-based models with bidirectional Hugging Face conversion capability
License: Apache-2.0
Stars: 5
Forks: 0
Open issues: 0
Created: 2026-05-27T12:17:08Z
Pushed: 2026-05-27T12:36:35Z
Default branch: main
Fork: yes
Parent repository: NVIDIA-NeMo/Megatron-Bridge
Archived: no
README:
📣 News
- [05/20/2026] **DeepSeek V4** is now merged on main! See the examples README for conversion and inference details.
- [05/20/2026] **Nemotron-3 Nano Omni** day-0 branch support is now merged on main! The 30B-A3B MoE multimodal model supports image, video, audio, and text workflows with checkpoint conversion, inference, SFT, and PEFT (LoRA) examples. Read the NVIDIA Blog and see the examples README for the full walkthrough.
- [05/19/2026] **Nemotron-Labs Diffusion** is now supported on main with autoregressive-to-diffusion conversion, continuous pretraining, checkpoint conversion, and inference workflows. Read the NVIDIA Research blog for the tri-mode language model overview.
- [05/06/2026] **Gemma 4 VL 26B-A4B** is now supported! Checkpoint conversion, SFT, and PEFT (LoRA) recipes for Google's MoE vision-language model (26B total / 4B active params, 128 experts top-k=8, dual sliding/global attention with K=V tying on full-attention layers) are available on main. See the examples README for the full walkthrough.
- [04/28/2026] Day 0 support for **Nemotron-3 Nano Omni**, a 30B-A3B MoE multimodal model that jointly processes image, video, audio, and text. Checkpoint conversion, SFT, and LoRA recipes are available on main — see the examples README for the full walkthrough.
- [04/19/2026] **Qwen3.6-35B-A3B** is now supported! Qwen3.6 uses the same architecture as Qwen3.5 VL MoE (
Qwen3_5MoeForConditionalGeneration) and works with the existing Qwen3.5-VL bridge out of the box — no code changes needed. HF→Megatron conversion and inference verified.
- [04/16/2026] Megatron Bridge 0.4.0 released! New model support (Kimi 2.5, Nemotron 3 Super, Qwen 3.5 VL, MiniMax M2, Sarvam, MiMo, and more), diffusion model collection, sequence-packing improvements, FP8 export, pruning & quantization, Transformers 5.x compatibility, and Python 3.12 migration. Huge thanks to our community contributors: @HollowMan6, @shaltielshmid, @jaeminh, @pavelgein, @ShiftyBlock, @erictang000, @eternally-z, @Hayak3, and @mohit-sarvam! See the full release notes.
- [04/12/2026] **MiniMax-M2.5 / M2.7** are now supported! Both models share the same architecture as MiniMax-M2 and work with the existing bridge out of the box — checkpoint conversion and inference verified on real FP8 checkpoints.
- [04/10/2026] **Qwen3-ASR** is now supported! Checkpoint conversion and inference for Qwen3's ASR model are available on main.
- [04/09/2026] **Bailing MoE V2** is now supported! Checkpoint conversion and inference for the Bailing MoE V2 model are available on main. Thank you to @ccclyu for the community contribution!
- [04/07/2026] Megatron Bridge’s PEFT support was featured at PyTorch Conference Europe 2026 Talk.
- [04/01/2026] **Kimi K2.5 VL** is now supported! Checkpoint conversion, inference, and training recipes for Moonshot AI’s Kimi-K2.5-VL vision-language model are available on main.
- [03/31/2026] Agent Skills for Megatron Bridge! We've added a `skills/` directory with structured guides that AI coding agents (Cursor, Claude Code, Codex, etc.) can use to help you add model support, set up dev environments, tune performance, and more. Try them out, and PRs to improve or add new skills are very welcome!
- [03/26/2026] **Nemotron 3 Super** is now on main! Checkpoint conversion and SFT/LoRA recipes (120B-A12B) are available in the main branch. Read the blog post.
- [03/12/2026] Deprecating Python 3.10 support: We're officially dropping Python 3.10 support with the upcoming 0.4.0 release. Downstream applications must raise their lower…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Routine fork with minimal stars.