OpenBMB/BMCook
Python
Captured source
source ↗OpenBMB/BMCook
Description: Model Compression for Big Models
Language: Python
License: Apache-2.0
Stars: 169
Forks: 25
Open issues: 6
Created: 2022-03-09T07:51:28Z
Pushed: 2023-06-30T08:57:51Z
Default branch: main
Fork: no
Archived: no
README:
Overview • Documentation • Installation • Usage • Quick Start • 简体中文
What's New
- 2023/5/27 Support structured pruning of Decoder-only models, and the compression of CPM-Live models。
- 2022/5/17 Support PLMs in model-center.
- 2022/3/29 (BMCook 0.1.0) Now we publicly release the first version of BMCook.
Overview
BMCook is a model compression toolkit for large-scale pre-trained language models (PLMs), which integrates multiple model compression methods. You can combine them in any way to achieve the desired speedup. Specifically, we implement the following four model compression methods, knowledge distillation, model pruning, model quantization, and model MoEfication. It has following features:
- Various Supported Methods. Compared to existing compression toolkits, BMCook supports all mainstream acceleration methods for pre-trained language models.
- User Friendly. Based on BMCook, we can implement different compression methods with just a few lines of codes.
- Combination in Any Way. Due to the decoupled implications, the compression methods can be combined in any way towards extreme acceleration.
Documentation
Our documentation provides more information about the package.
Installation
To use BMCook, first install BMTrain.
From PyPI (Recommend)
$ pip install bmtrain
From Source
$ git clone https://github.com/OpenBMB/BMTrain.git $ cd BMTrain $ python3 setup.py install
Please refer to the installation guide of BMTrain for more details.
Then, install BMCook.
From PyPI (Recommend)
$ pip install bmcook
From source
$ git clone git@github.com:OpenBMB/BMCook.git cd BMCook python3 setup.py install
Usage
1. Design your BMCook config.
You should give a json file to state your compress strategy.
{ "distillation": {
"ce_scale": 0,
"ce_temp": 1,
"mse_hidn_scale": 0,
"mse_hidn_module": ['[placehold]'],
"mse_hidn_proj": false,
"mse_att_scale": 0,
"mse_att_module": ['[placehold]'],
},
"pruning": {
"is_pruning": false,
"pruning_mask_path": None,
"pruned_module": ['[placehold]'],
"mask_method": "m4n2_1d/m4n2_2d/sprune",
"sprune": {
"criterion": "l0",
"training_mask": ['[placehold]'],
"fixed_mask_path": "",
"mask_mode": "train_mask",
"target_sparsity": 0.5
}
},
"quantization": {
"is_quant": false,
"quantized_module": [],
},
"MoEfication": {
"is_moefy": false,
"first_FFN_module": ['[placehold]'],
}
}To notice:
is_moefy,is_quant,ispruningare switch parameters. If false, other parameters will be blocked.mask_methodtakes similar works. Whenmask_methodis "m4n2_1d" or "m4n2_2d", it will execute unstructure pruning, but when is "sprune", thesprunefield will be activated. For distillation, when thece_scaleormse_hidn_scaleis greater than 0, the corresponding distilling mode will be switched on.
- It's not recommended to use MoE and Distilling simultaneously.
2. Basic usage in your code.
BMCook provides unified interface CookTrainer. BMCook will introduce distillation pruning and MoEfication, which may add some terms to model outputs. You can use it to manage your model, and these modifications.
from bmcook import CookTrainer from bmcook.utils.config import ConfigParser #prepare your model, dataloader and optimizer... ... # setting up your BMCook strategy CookTrainer.set_compression(cookconfig, model, optimizer, model_distill) # train for data in dataloader: targets = ... ... outputs = CookTrainer.forward(model, loss_func, targets, *your_model_inputs, **your_model_kwinputs) [loss, model_outputs, lag_loss, sparsity, distill_loss] = outputs
the loss equals to the sum of model_loss, lag_loss and distill_loss. So if you wanna know the model performance, please minus them. Noticed that if sprune is not setted, the lag_loss and loss_func will be None, so do distilling.
model_loss = loss - lag_loss - distill_loss # sprune and distilling both setted. model_loss = loss - distill_loss # only distilling used.
BMCook also provides discrete interfaces to initialize compression settings. If you want to design your Trainer for your own needs, you can use these discrete interfaces. Noticed that the output format should keep the same with CookTrainer when you define your own Trainer. For details about extension on CookTrainer, you can refer to CPMAntTrainer.
from bmcook import BMDistill # Define your own Trainer. Trainer = ... # Set up the distillation Trainer.forward = BMDistill.set_forward(model, teacher, Trainer.forward, cook_config)
3. How to run your code
You can run your code as normal, but should state where your cookconfig is:
torchrun --nnodes=1 --nproc_per_node=1 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \ --save-dir ... \ --model ... \ --start-lr ... \ --cook-config ... \ # give your cook config path
Quick Start
The examples folder provides pruning example based on CPM-Live, GPT2-Base, T5-large, please check examples for more details.
Take GPT2 as example:
Quantization-aware training:
torchrun --nnodes=1 --nproc_per_node=1 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \ --save-dir results/gpt2-int8 \ --model gpt2-base \ --start-lr 1e-4 \ --cook-config configs/gpt2-int8.json \
Quantization-aware training with knowledge distillation:
torchrun --nnodes=1 --nproc_per_node=1 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \ --save-dir results/gpt2-int8-kd \ --model gpt2-base \ --start-lr 1e-4 \ --cook-config configs/gpt2-int8-kd.json \
Model pruning:
torchrun --nnodes=1 --nproc_per_node=1 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost train.py \ --save-dir results/gpt2-prune \ --model gpt2-base \ --start-lr 1e-4 \ --cook-config configs/gpt2-prune.json \
In this case, we only prune the input embedding layer. You can include more modules by changing the…
Excerpt shown — open the source for the full document.