NousResearch/StripedHyenaTrainer
Python
Captured source
source ↗GH
Source ↗published Nov 22, 2023seen 5dcaptured 11hhttp 200method plain
NousResearch/StripedHyenaTrainer
Language: Python
License: Apache-2.0
Stars: 68
Forks: 11
Open issues: 0
Created: 2023-11-22T19:44:23Z
Pushed: 2023-12-08T20:26:46Z
Default branch: main
Fork: no
Archived: no
README: This is the training code used to train StripedHyena-Nous-7B.
First, tokenize your data
python tokenization.py \ --dataset your-super-cool-sharegpt-format-dataset \ --tokenizer togethercomputer/StripedHyena-Hessian-7B \ --output tokenized \ --num-proc 32 \ --pad-to-length 4096 \ --truncate
Make sure you have done accelerate config -- we used the provided DeepSpeed configuration. Then, train!
accelerate launch finetune.py \ --model togethercomputer/StripedHyena-Hessian-7B \ --dataset tokenized \ --output-dir output \ --epochs 4 \ --batch-size 12 \ --gradient-accumulate-every 12 \ --warmup-steps 350 \ --learning-rate 0.000004 \ --lr-schedule linear \ --weight-decay 0.1 \ --checkpointing-steps 1000 \ --no-decay poles residues
The --no-decay option disables weight decay on *only* the specified parameters. For StripedHyena, we've found that disabling weight decay on the Hyena operator's poles and residues parameters improves performance. There is also an option --frozen that can completely freeze select parameter groups.