ForkTogether AITogether AIpublished May 19, 2023seen 5d

togethercomputer/FT_Redpajama

forked from NVIDIA/FasterTransformer

Open original ↗

Captured source

source ↗
published May 19, 2023seen 5dcaptured 8hhttp 200method plain

togethercomputer/FT_Redpajama

Description: Transformer related optimization, including BERT, GPT

Language: C++

License: Apache-2.0

Stars: 1

Forks: 1

Open issues: 0

Created: 2023-05-19T07:43:05Z

Pushed: 2023-07-20T06:34:52Z

Default branch: main

Fork: yes

Parent repository: NVIDIA/FasterTransformer

Archived: no

README:

Deploy FT Inference of RedPajama Models Under TogetherCompute Infra

Build the docker image:

sudo docker build -t ft_redpajama --file Redpajama-Together-Dockerfile .

Convert RedPajama model to FT format:

  • Download the checkpoint of RedPajama model from Hugging Face (e.g., RedPajama-INCITE-Chat-7B-v0.1):

git lfs clone https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1

  • Start the ft_redpajama container:

sudo nvidia-docker run --ipc=host --network=host --name test_ft_redpajama -ti -v /PATH_TO_PARENT_DIR_OF_DOWNLOADED_HF_WEIGHTS:/workspace/FasterTransformer/build/model ft_redpajama bash

  • Run the converting script inside the container:

python /workspace/FasterTransformer/examples/pytorch/gptneox/utils/huggingface_gptneox_convert.py -i /workspace/FasterTransformer/build/model/RedPajama-INCITE-Chat-7B-v0.1 -o /workspace/FasterTransformer/build/model/ft-RedPajama-INCITE-Chat-7B-v0.1 -i_g 1 -m_n RedPajama-INCITE-Chat-7B-v0.1 -weight_data_type fp16

To deploy the model:

  • Inside the container, start the together node:

/usr/local/bin/together-node start

  • Inside the container, start the worker process (probably need to change some args to support different models):

python /workspace/FasterTransformer/examples/pytorch/gptneox/serving_redpajama_single_gpu.py