ForkArcee AIArcee AIpublished Feb 6, 2025seen 5d

arcee-ai/spectrum

forked from QuixiAI/spectrum

Open original ↗

Captured source

source ↗
published Feb 6, 2025seen 5dcaptured 14hhttp 200method plain

arcee-ai/spectrum

Description: for spectral randomness

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2025-02-06T01:50:19Z

Pushed: 2025-02-06T01:58:34Z

Default branch: main

Fork: yes

Parent repository: QuixiAI/spectrum

Archived: no

README:

Spectrum

This repository contains the implementation of Spectrum, as detailed in the paper Spectrum: Targeted Training on Signal to Noise Ratio.

Overview

Spectrum is a tool for scanning and evaluating the Signal-to-Noise Ratio (SNR) of layers in large language models. By identifying the top n% of layers with the highest SNR, you can optimize training efficiency.

Features

  • Model Scanning: Scan models to determine the SNR of each layer.
  • Top n% Layer Identification: Identify and sort the top n% of layers based on their SNR.
  • Unfrozen Parameters Configuration: Generate configuration files for unfreezing specific layers in Axolotl or other libraries.

Installation

To use Spectrum, you need to have Python installed. Clone this repository and install the necessary dependencies:

git clone https://github.com/cognitivecomputations/spectrum.git
cd spectrum
pip install -r requirements.txt

Usage

To use Spectrum, run the following command:

python spectrum.py --model-name --top-percent
  • --model-name: Specify the local model path or the Hugging Face repository.
  • --top-percent: Specify the top percentage of SNR layers you want to retrieve.

Spectrum will check the model_snr_results folder to see if the model has already been scanned. If not, it will prompt you for the batch size to use for scanning. Once the scan is complete, it will output the SNR ratios in the model_snr_results folder and provide a sorted list of the highest to lowest SNR ratios along with an unfrozen parameters YAML file.

Example Command

python spectrum.py --model-name meta-llama/Meta-Llama-3-8B-Instruct --top-percent 50

Scanning New Models

It will check the model_snr_results folder to see if we've already scanned it (we invite you to add your own scans for models we don't have via PR) - if we have, it will give you the top n% of those ratios. Otherwise, it will ask you what batch size you want to scan at. We've been able to use a batch_size of 4 for 70b models on an 8xH100 node. It will then load the model, and you will be presented with all available modules to scan. We typically only select the MLP/attn layers - but if you're doing continued pretraining or language tasks it wouldn't hurt to include all available modules.

It will then scan the model, and output the snr ratios in the model_snr_results folder. It will also output a sorted from highest to lowest SNR ratios, along with an unfrozen parameters yaml. This matches an axolotl config, and you can copy and paste it directly into your axolotl yaml. That's it!

Integration with Axolotl

If you're using Axolotl, the generated YAML file can be directly integrated.

Integration with Other Libraries

For integration with other libraries, we provide a simple script to freeze and unfreeze parameters:

def _freeze_and_unfreeze_parameters(self):
# Freeze all parameters
for param in self.model.parameters():
param.requires_grad = False
#unfreeze spectrum parameters
for name, param in self.model.named_parameters():
if any(unfrozen_param in name for unfrozen_param in self.unfrozen_parameters):
param.requires_grad = True

unfrozen_parameters = [
'model.layers.62.mlp.down_proj', 'model.layers.63.mlp.down_proj', 'model.layers.66.mlp.down_proj',
'model.layers.65.mlp.down_proj', 'model.layers.64.mlp.down_proj', 'model.layers.67.mlp.down_proj',
'model.layers.68.mlp.down_proj', 'model.layers.60.mlp.down_proj', 'model.layers.31.mlp.down_proj',
'model.layers.69.mlp.down_proj', 'model.layers.61.mlp.down_proj', 'model.layers.59.mlp.down_proj',
'model.layers.70.mlp.down_proj', 'model.layers.30.mlp.down_proj', 'model.layers.76.mlp.down_proj',
'model.layers.72.mlp.down_proj', 'model.layers.77.mlp.down_proj', 'model.layers.71.mlp.down_proj',
'model.layers.29.mlp.down_proj', 'model.layers.58.mlp.down_proj', 'model.layers.78.mlp.gate_proj',
'model.layers.77.mlp.gate_proj', 'model.layers.76.mlp.gate_proj', 'model.layers.79.mlp.gate_proj',
'model.layers.75.mlp.gate_proj', 'model.layers.74.mlp.gate_proj', 'model.layers.73.mlp.gate_proj',
'model.layers.70.mlp.gate_proj', 'model.layers.72.mlp.gate_proj', 'model.layers.71.mlp.gate_proj',
'model.layers.69.mlp.gate_proj', 'model.layers.54.mlp.gate_proj', 'model.layers.68.mlp.gate_proj',
'model.layers.57.mlp.gate_proj', 'model.layers.63.mlp.gate_proj', 'model.layers.49.mlp.gate_proj',
'model.layers.55.mlp.gate_proj', 'model.layers.53.mlp.gate_proj', 'model.layers.44.mlp.gate_proj',
'model.layers.46.mlp.gate_proj', 'model.layers.69.mlp.up_proj', 'model.layers.70.mlp.up_proj',
'model.layers.71.mlp.up_proj', 'model.layers.68.mlp.up_proj', 'model.layers.67.mlp.up_proj',
'model.layers.66.mlp.up_proj', 'model.layers.46.mlp.up_proj', 'model.layers.63.mlp.up_proj',
'model.layers.72.mlp.up_proj', 'model.layers.64.mlp.up_proj', 'model.layers.62.mlp.up_proj',
'model.layers.45.mlp.up_proj', 'model.layers.65.mlp.up_proj', 'model.layers.73.mlp.up_proj',
'model.layers.47.mlp.up_proj', 'model.layers.44.mlp.up_proj', 'model.layers.49.mlp.up_proj',
'model.layers.48.mlp.up_proj', 'model.layers.53.mlp.up_proj', 'model.layers.74.mlp.up_proj',
'model.layers.79.self_attn.k_proj', 'model.layers.36.self_attn.k_proj', 'model.layers.35.self_attn.k_proj',
'model.layers.74.self_attn.k_proj', 'model.layers.34.self_attn.k_proj', 'model.layers.78.self_attn.k_proj',
'model.layers.77.self_attn.k_proj', 'model.layers.37.self_attn.k_proj', 'model.layers.39.self_attn.k_proj',
'model.layers.41.self_attn.k_proj', 'model.layers.38.self_attn.k_proj', 'model.layers.33.self_attn.k_proj',
'model.layers.69.self_attn.k_proj', 'model.layers.42.self_attn.k_proj', 'model.layers.32.self_attn.k_proj',
'model.layers.25.self_attn.k_proj', 'model.layers.70.self_attn.k_proj', 'model.layers.22.self_attn.k_proj',
'model.layers.63.self_attn.k_proj', 'model.layers.29.self_attn.k_proj', 'model.layers.14.self_attn.o_proj',
'model.layers.39.self_attn.o_proj', 'model.layers.19.self_attn.o_proj', 'model.layers.16.self_attn.o_proj',
'model.layers.17.self_attn.o_proj', 'model.layers.15.self_attn.o_proj', 'model.layers.69.self_attn.o_proj',
'model.layers.12.self_attn.o_proj',…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Routine fork by same org