ForkArcee AIArcee AIpublished Apr 23, 2024seen 5d

arcee-ai/Pai-Megatron-Patch

forked from alibaba/Pai-Megatron-Patch

Open original ↗

Captured source

source ↗
published Apr 23, 2024seen 5dcaptured 9hhttp 200method plain

arcee-ai/Pai-Megatron-Patch

Description: The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Language: Python

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2024-04-23T17:38:21Z

Pushed: 2024-07-12T20:03:27Z

Default branch: main

Fork: yes

Parent repository: alibaba/Pai-Megatron-Patch

Archived: no

README:

Quick Start

| | Megatron-LM-Dense | Megatron-Core-Dense | Megatron-Core-MoE | MegaBlocks-MoE | |:------------|:--------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------:| | LLama3 | ReadMe | ReadMe | N/A | N/A | | LLama2 | ReadMe | ReadMe | N/A | N/A | | Mistral | ReadMe | ReadMe | ReadMe | N/A | | Qwen2 | N/A | ReadMe | ReadMe | N/A | | Qwen1.5 | ReadMe | ReadMe | ReadMe | ReadMe | | DeepSeek-V2 | N/A | N/A | ReadMe | N/A |

Introduction

English | [简体中文](./README_zh-CN.md)

Pai-Megatron-Patch (https://github.com/alibaba/Pai-Megatron-Patch) is a deep learning training toolkit built for developers to train and predict LLMs & VLMs by using Megatron framework easily. With the continuous development of LLMs, the model structure and scale are rapidly evolving. Although these models can be conveniently manufactured using Transformers or DeepSpeed training framework, the training efficiency is comparably low. This phenomenon becomes even severer when the model scale exceeds 10 billion. The primary objective of Pai-Megatron-Patch is to effectively utilize the computational power of GPUs for LLM. This tool allows convenient training of commonly used LLM with all the accelerating techniques provided by Megatron-LM.

What's New:

  • Support training qwen2 moe models by using Megatron-Core. [🔥🔥 2024.06.19]
  • Support training qwen2 dense models by using Megatron-Core. [🔥🔥 2024.06.12]
  • Support training deepseek-v2-moe models by using Megatron-Core. [🔥🔥 2024.05.30]
  • Support training qwen1.5-moe models by using Megatron-Core. [🔥🔥 2024.05.13]
  • Support training llama3 models by using Megatron-LM and Megatron-Core. [🔥🔥 2024.04.21]
  • Support training qwen1.5 models by using Megatron-Core. [🔥🔥 2024.03.20]
  • Support training qwen1.5 models by using Megatron-LM. [🔥🔥 2024.02.28]
  • Support training mixtral-8x7b moe model by using Megatron-Core. [🔥🔥 2024.01.26]
  • Support training qwen-vl multimodel by using Megatron-LM. [🔥🔥 2023.12.15]
  • Support training LLava multimodel by using Megatron-LM. [🔥🔥 2023.12.01]
  • Support training deepseek model by using Megatron-LM. [🔥🔥 2023.11.24]
  • Support training qwen-72B model by using Megatron-LM. [🔥🔥 2023.11.23]
  • Support training Mistral-7B, Yi-6B and Codellama-34B [🔥🔥 2023.11.16]
  • Upgrade Megatron-LM for Llama2, qwen and baichuan2 to use transformer engine and fp8. [🔥🔥 2023.10.19]
  • Support training qwen-14B and baichuan2-13B model by using Megatron-LM. [🔥🔥 2023.10.08]

Highlights

Pai-Megatron-Patch is developed by the Alibaba Cloud Machine Learning Platform (PAI) algorithm team. The tool aims to assist developers in quickly getting started with Lingjun products and completing the entire development pipeline for LLM, including efficient distributed training, supervised fine-tuning, and offline model inference or verification. It has several merits as follows:

  • Support for multiple commonly used LLM such as llama, llama-2, codellama, deepseek, baichuan, qwen, Falcon, GLM, Starcoder, Bloom, chatglm, etc.
  • Support for model weight conversion: Mapping operator namespaces between Huggingface, Megatron, and Transformer Engine.
  • Support for FP8 training acceleration in Flash Attention 2.0 and Transformer Engine modes, ensuring training convergence.
  • Rich and user-friendly usage examples, offering best practices for the entire workflow of LLM pre-training, fine-tuning, evaluation, and inference, as well as reinforcement learning.

Framework

The design philosophy of Pai-Megatron-Patch is to avoid invasive modifications to the source code of Megatron-LM. In other words, it does not add new modules directly to Megatron-LM. Instead, the functions that need expansion and improvement are presented in the form of patch. This decoupling ensures that users can continue to embrace the best practices of LLM without being affected by upgrades of Megatron-LM.

Pai-Megatron-Patch includes key components for building LLM training, such as model library, tokenizers, model convertors, reinforcement learning , offline text generation, usages examples, and toolkits. The model…

Excerpt shown — open the source for the full document.