CompactifAI/Block_removal_through_constrained_binary_optimization
Python
Captured source
source ↗CompactifAI/Block_removal_through_constrained_binary_optimization
Description: Repo for removing blocks from LLMs using constrained binary optimization
Language: Python
License: NOASSERTION
Stars: 1
Forks: 2
Open issues: 0
Created: 2026-02-02T11:31:21Z
Pushed: 2026-02-23T10:00:15Z
Default branch: main
Fork: no
Archived: no
README:
Code Repository
This repository contains the code used to generate the results in Block removal for large language models through constrained binary optimization. The pipeline consists of data preparation, binary optimization for Hessian construction, energy computation, and model compression, followed by fine-tuning and benchmarking against alternative methods.
---
1. Data Preparation
First, prepare the dataset used throughout the experiments:
python src/prepare_data/prepare_data.py \ --config_file configs/data_configs/llama_3.1_8B_data_config.yaml
This step preprocesses the data and stores it according to the paths specified in the configuration file.
---
2. Binary Optimization and Hessian Construction
All code related to binary optimization is located in:
src/binary_optimization/
2.1 Generate Binary Optimization Samples
Run:
python src/binary_optimization/prepare_binary_optimization.py \ --config_file configs/cbo_configs/llama3_pruning8B.yaml
This script generates the samples required to construct the Hessian matrix and stores a PyTorch tensor A.pt in the directory specified by output_path in the configuration file (e.g., mnt/Amatrices).
---
2.2 Compute Energies
Next, compute the energies associated with removing a fixed number of blocks:
python src/solve_binary_optimization/compute_energies.py \ -A_directory mnt/Amatrices \ -ndel 8
This creates a directory:
mnt/Amatrices/energies_del8/
which contains all binary states corresponding to the removal of 8 blocks, along with their associated energies.
---
3. Model Compression
To generate compressed models from the computed energies, run:
python src/compression/compress_model.py \ --filename mnt/Amatrices/energies_del8/ \ --model_name models/Llama-3.1-8B-Instruct/ \ --k 3
model_namespecifies the path to the original pretrained model.kdetermines how many models are stored.- For example,
k = 3stores the ground state and the first two excited states.
---
4. Fine-tuning
To retrain the compressed models, run:
accelerate launch src/finetuning/finetune_kd.py \ -config_file configs/finetuning_configs/llama3_8B_finetuning_kd.yaml
---
5. Baselines and Benchmark Methods
We benchmark against several alternative compression strategies. To do the evaluations, run:
python src/finetuning/finetune_kd.py \ --model_path model_path
Nemotron
The file to do the calculations NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 can be found in
src/nemotron_files/
5.1 Block Influence Method
python src/block_influence/compression_multiblock.py \ --config_file configs/BI_configs/llama8.yaml
---
5.2 Norm Ratio Method
python src/norm_ratio/compression_multiblock.py \ --config_file configs/norm_ratio_configs/llama8.yaml
---
5.3 Sliding Window Method
To generate models using sliding windows, run:
python src/Sliding_windows/SLM_adapted.py \ --model_path models/Llama-3.1-8B-Instruct/ \ --model_name output/Llama-3.1-8B-Instruct \ --threshold 0.40 \ --dataset \ --target_count 2048
---
6. Hyperparameter Settings
The following ratios were used in the experiments:
- LLaMA-3.1-8B
- 16 blocks removed:
ratio = 0.12 - 8 blocks removed:
ratio = 0.36
- Qwen
- 12 blocks removed:
ratio = 0.64 - 8 blocks removed:
ratio = 0.77
---
License
Patent Pending. The intended use is strictly limited to research and non-commercial projects. If you find these results useful, please cite
@article{jansen2026block,
title={Block removal for large language models through constrained binary optimization},
author={Jansen, David and Rausch, Roman and Montero, David and Orus, Roman},
journal={arXiv preprint arXiv:2602.00161},
year={2026}
}Notes
- All paths, hyperparameters, and preprocessing steps are specified via configuration files to ensure reproducibility.
- Since some of the energies are close to degenerate, the exact order may be slightly modified in some cases due to numerical rounding errors and should be taken into account when evaluating the models.
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10New repo, low stars, niche topic