togethercomputer/gpt-neox
forked from EleutherAI/gpt-neox
Captured source
source ↗togethercomputer/gpt-neox
Description: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
Language: Python
License: Apache-2.0
Stars: 2
Forks: 0
Open issues: 0
Created: 2023-04-24T11:26:48Z
Pushed: 2023-04-24T16:59:51Z
Default branch: main
Fork: yes
Parent repository: EleutherAI/gpt-neox
Archived: no
README: [](https://wandb.ai/eleutherai/neox)
GPT-NeoX
This repository records EleutherAI's library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's Megatron Language Model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training.
For those looking for a TPU-centric codebase, we recommend Mesh Transformer JAX.
If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face `transformers` library instead which supports GPT-NeoX models.
GPT-NeoX 2.0
Prior to 3/9/2023, GPT-NeoX relied on DeeperSpeed, which was based on an old version of DeepSpeed (0.3.15). In order to migrate to the latest upstream DeepSpeed version while allowing users to access the old versions of GPT-NeoX and DeeperSpeed, we have introduced two versioned releases for both libraries:
- Version 1.0 of GPT-NeoX and DeeperSpeed maintain snapshots of the old stable versions that GPT-NeoX-20B and the Pythia Suite were trained on.
- Version 2.0 of GPT-NeoX and DeeperSpeed are the latest versions built on the latest DeepSpeed, and will be maintained going forward.
Contents
- [Quick Start](#quick-start)
- [Environment and Dependencies](#environment-and-dependencies)
- [Usage](#usage)
- [Configuration](#configuration)
- [Datasets](#datasets)
- [Preconfigured Datasets](#preconfigured-datasets)
- [Using Custom Data](#using-custom-data)
- [Training and Finetuning](#training-and-finetuning)
- [Select Pretrained Models](#pretrained-models)
- [GPT-NeoX-20B](#gpt-neox-20b)
- [Pythia](#pythia)
- [Polyglot](#polyglot)
- [Fill-in-the-Middle](#fill-in-the-middle)
- [Inference](#inference)
- [Evaluation](#evaluation)
- [Exporting to Hugging Face](#exporting-to-hugging-face)
- [Monitoring](#monitoring)
- [Weights & Biases](#wandb)
- [TensorBoard](#tensorboard)
- [Administrative Notes](#administrative-notes)
- [Citing GPT-NeoX](#citing-gpt-neox)
- [Licensing](#licensing)
- [Publications](#publications)
- [Acknowledgements](#acknowledgements)
Quick Start
Environment and Dependencies
Host Setup
First make sure you are in an environment with Python 3.8 with an appropriate version of PyTorch 1.8 or later installed. Note: Some of the libraries that GPT-NeoX depends on have not been updated to be compatible with Python 3.10+. Python 3.9 appears to work, but this codebase has been developed and tested for Python 3.8.
To install the remaining basic dependencies, run:
pip install -r requirements/requirements.txt python ./megatron/fused_kernels/setup.py install # optional if not using fused kernels
from the repository root.
Warning: Our codebase relies on DeeperSpeed, our fork of the DeepSpeed library with some added changes. We strongly recommend using Anaconda, a virtual machine, or some other form of environment isolation before continuing. Failure to do so may cause other repositories that rely on DeepSpeed to break.
TensorBoard
=======
Flash Attention
To use Flash-Attention, install the additional dependencies in ./requirements/requirements-flashattention.txt and set the attention type in your configuration accordingly (see [configs](./configs/)). This can provide significant speed-ups over regular attention on certain GPU architectures, including Ampere GPUs (such as A100s); see the repository for more details.
Containerized Setup
We also provide a Dockerfile if you prefer to run NeoX in a container. To use this option, first build an image named gpt-neox from the repository root directory with docker build -t gpt-neox -f Dockerfile .. We also host pre-built images on Docker Hub at `leogao2/gpt-neox`.
You can then run a container based on this image. For instance, the below snippet mounts the cloned repository (gpt-neox) directory to /gpt-neox in the container and uses nvidia-docker to make four GPUs (numbers 0-3) accessible to the container. As noted by the NCCL documentation, both --shm-size=1g and --ulimit memlock=-1 are important to prevent Docker from allocating too little shared memory.
nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES=0,1,2,3 --shm-size=1g --ulimit memlock=-1 --mount type=bind,src=$PWD,dst=/gpt-neox gpt-neox
Usage
All functionality (inference included), should be launched using deepy.py, a wrapper around the deepspeed launcher.
We currently offer three main functions: 1. train.py is used for training and finetuning models. 2. evaluate.py is used to evaluate a trained model using the language model evaluation harness. 3. generate.py is used to sample text from a trained model.
which can be launched with:
./deepy.py [script.py] [./path/to/config_1.yml] [./path/to/config_2.yml] ... [./path/to/config_n.yml]
E.G To generate text unconditionally with the GPT-NeoX-20B model, you can use the following:
./deepy.py…
Excerpt shown — open the source for the full document.