What does this fork signal mean?

Arcee AI forked arcee-ai/Megatron-LM (forked from NVIDIA/Megatron-LM). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo arcee-ai/Megatron-LM · parent NVIDIA/Megatron-LM · Routine fork of a training library.. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Arcee AI Fork: arcee-ai/Megatron-LM

Captured source

source ↗

GitHub/github.com/arcee-ai/Megatron-LM

arcee-ai/Megatron-LM repository metadata

Source ↗

published Apr 18, 2024seen Jun 5captured Jun 11http 200method plain

arcee-ai/Megatron-LM

Description: domain adapted MOE training

Language: Python

License: NOASSERTION

Stars: 0

Forks: 0

Open issues: 2

Created: 2024-04-18T23:08:16Z

Pushed: 2024-07-01T18:47:45Z

Default branch: main

Fork: yes

Parent repository: NVIDIA/Megatron-LM

Archived: no

README:

Megatron-LM & Megatron-Core =========================== GPU optimized techniques for training transformer models at-scale

Latest News

[2024/1 Announcement] NVIDIA has released the core capabilities in Megatron-LM into **Megatron-Core** in this repository. Megatron-Core expands upon Megatron-LM's GPU-optimized techniques with more cutting-edge innovations on system-level optimizations, featuring composable and modular APIs. Explore the [Megatron-Core intro](#megatron-core) for more details.

[Megatron Overview](#megatron-overview)
[Megatron-LM](#megatron-lm)
[Megatron-Core](#megatron-core)
[Training Speed and Scalability](#training-speed-and-scalability)
[Setup](#setup)
[Downloading Checkpoints](#downloading-checkpoints)
[Usage](#usage)
[Training](#training)
[Data Preprocessing](#data-preprocessing)
[BERT Pretraining](#bert-pretraining)
[GPT Pretraining](#gpt-pretraining)
[T5 Pretraining](#t5-pretraining)
[Distributed Pretraining](#distributed-pretraining)
[Activation Checkpointing and Recomputation](#activation-checkpointing-and-recomputation)
[Distributed Optimizer](#distributed-optimizer)
[FlashAttention](#flashattention)
[GPT-3 Example](#gpt-3-example)
[Retro and InstructRetro](#retro-and-instructretro)
[Evaluation and Tasks](#evaluation-and-tasks)
[GPT Text Generation](#gpt-text-generation)
[GPT Evaluation](#gpt-evaluation)
[WikiText Perplexity Evaluation](#wikitext-perplexity-evaluation)
[LAMBADA Cloze Accuracy](#lambada-cloze-accuracy)
[BERT Task Evaluation](#bert-task-evaluation)
[RACE Evaluation](#race-evaluation)
[MNLI Evaluation](#mnli-evaluation)
[Llama-2 Inference and Finetuning](#llama-2-inference-and-finetuning)
[Datasets](#datasets)
[Collecting Wikipedia Training Data](#collecting-wikipedia-training-data)
[Collecting GPT Webtext Data](#collecting-gpt-webtext-data)
[Reproducibility](#reproducibility)
[Projects using Megatron](#projects-using-megatron)

Megatron Overview

This repository comprises two essential components: Megatron-LM and Megatron-Core. Megatron-LM serves as a ressearch-oriented framework leveraging Megatron-Core for large language model (LLM) training. Megatron-Core, on the other hand, is a library of GPU optimized training techniques that comes with formal product support including versioned APIs and regular releases. You can use Megatron-Core alongside Megatron-LM or Nvidia NeMo Framework for an end-to-end and cloud-native solution. Alternatively, you can integrate Megatron-Core's building blocks into your preferred training framework.

Megatron-LM

First introduced in 2019, Megatron (1, 2, and 3) sparked a wave of innovation in the AI community, enabling researchers and developers to utilize the underpinnings of this library to further LLM advancements. Today, many of the most popular LLM developer frameworks have been inspired by and built directly leveraging the open-source Megatron-LM library, spurring a wave of foundation models and AI startups. Some of the most popular LLM frameworks built on top of Megatron-LM include Colossal-AI, HuggingFace Accelerate, and NVIDIA NeMo Framework. A list of projects that have directly used Megatron can be found [here](#projects-using-megatron).

Megatron-Core

Megatron-Core is a newly released open-source PyTorch-based library that further expands the collections of GPU optimized techniques inherited from Megatron-LM with more cutting-edge innovations on system-level optimizations. It abstracts them into composable and modular APIs, allowing full flexibility for developers and model researchers to train custom transformers at-scale on NVIDIA accelerated computing infrastructure. This library is compatible with all NVIDIA Tensor Core GPUs, including FP8 acceleration support for NVIDIA Hopper architectures.

Megatron-Core offers the core building blocks such as attention mechanisms, transformer blocks and layers, normalization layers, and embedding techniques. Additional functionality like activation recomputation, distributed checkpointing is also natively built-in to the library. The building blocks and functionality are all GPU optimized, and can be built with advanced parallelization strategies for optimal training speed and stability on NVIDIA Accelerated Computing Infrastructure. Another key component of the Megatron-Core library includes advanced model parallelism techniques (tensor, sequence, and pipeline). Currently, popular LLM model architectures based on Decoder (ex. GPT, Llama), Encoder (ex. BERT), Encoder-Decoder (ex. T5), Retrieval Enhanced Transformers (ex. RETRO), and Mixture of Experts (MoE) can easily be built with performance and efficiency at large compute scales. Developers can also use Megatron-Core's transformer blocks and functional APIs to build their own custom layers.

Training Speed and Scalability

Our codebase is capable of efficiently training very large (hundreds of billions of parameters) language models with both model and data parallelism. To demonstrate how the code scales with multiple GPUs and model sizes, we consider GPT models from 1 billion all the way to 1 trillion parameters. All models use a vocabulary size of 51,200 and a sequence length of 2048. We vary hidden size, number of attention heads, and number of layers to arrive at a specific model size. As the model size increases, we also modestly increase the batch size. We leverage [NVIDIA's Selene...

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Routine fork of a training library.