RepoNous ResearchNous Researchpublished Nov 22, 2023seen 5d

NousResearch/StripedHyenaTrainer

Python

Open original ↗

Captured source

source ↗
published Nov 22, 2023seen 5dcaptured 11hhttp 200method plain

NousResearch/StripedHyenaTrainer

Language: Python

License: Apache-2.0

Stars: 68

Forks: 11

Open issues: 0

Created: 2023-11-22T19:44:23Z

Pushed: 2023-12-08T20:26:46Z

Default branch: main

Fork: no

Archived: no

README: This is the training code used to train StripedHyena-Nous-7B.

First, tokenize your data

python tokenization.py \
--dataset your-super-cool-sharegpt-format-dataset \
--tokenizer togethercomputer/StripedHyena-Hessian-7B \
--output tokenized \
--num-proc 32 \
--pad-to-length 4096 \
--truncate

Make sure you have done accelerate config -- we used the provided DeepSpeed configuration. Then, train!

accelerate launch finetune.py \
--model togethercomputer/StripedHyena-Hessian-7B \
--dataset tokenized \
--output-dir output \
--epochs 4 \
--batch-size 12 \
--gradient-accumulate-every 12 \
--warmup-steps 350 \
--learning-rate 0.000004 \
--lr-schedule linear \
--weight-decay 0.1 \
--checkpointing-steps 1000 \
--no-decay poles residues

The --no-decay option disables weight decay on *only* the specified parameters. For StripedHyena, we've found that disabling weight decay on the Hyena operator's poles and residues parameters improves performance. There is also an option --frozen that can completely freeze select parameter groups.