What does this fork signal mean?

Together AI forked togethercomputer/FT_Redpajama (forked from NVIDIA/FasterTransformer). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo togethercomputer/FT_Redpajama · parent NVIDIA/FasterTransformer. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Together AI Fork: togethercomputer/FT_Redpajama

Captured source

source ↗

GitHub/github.com/togethercomputer/FT_Redpajama

togethercomputer/FT_Redpajama repository metadata

Source ↗

published May 19, 2023seen 5dcaptured 8hhttp 200method plain

togethercomputer/FT_Redpajama

Description: Transformer related optimization, including BERT, GPT

Language: C++

License: Apache-2.0

Stars: 1

Forks: 1

Open issues: 0

Created: 2023-05-19T07:43:05Z

Pushed: 2023-07-20T06:34:52Z

Default branch: main

Fork: yes

Parent repository: NVIDIA/FasterTransformer

Archived: no

README:

Deploy FT Inference of RedPajama Models Under TogetherCompute Infra

Build the docker image:

sudo docker build -t ft_redpajama --file Redpajama-Together-Dockerfile .

Convert RedPajama model to FT format:

Download the checkpoint of RedPajama model from Hugging Face (e.g., RedPajama-INCITE-Chat-7B-v0.1):

git lfs clone https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1

Start the ft_redpajama container:

sudo nvidia-docker run --ipc=host --network=host --name test_ft_redpajama -ti -v /PATH_TO_PARENT_DIR_OF_DOWNLOADED_HF_WEIGHTS:/workspace/FasterTransformer/build/model ft_redpajama bash

Run the converting script inside the container:

python /workspace/FasterTransformer/examples/pytorch/gptneox/utils/huggingface_gptneox_convert.py -i /workspace/FasterTransformer/build/model/RedPajama-INCITE-Chat-7B-v0.1 -o /workspace/FasterTransformer/build/model/ft-RedPajama-INCITE-Chat-7B-v0.1 -i_g 1 -m_n RedPajama-INCITE-Chat-7B-v0.1 -weight_data_type fp16

To deploy the model:

Inside the container, start the together node:

/usr/local/bin/together-node start

Inside the container, start the worker process (probably need to change some args to support different models):

python /workspace/FasterTransformer/examples/pytorch/gptneox/serving_redpajama_single_gpu.py