RepoArcee AIArcee AIpublished Aug 21, 2023seen 5d

arcee-ai/mergekit

Python

Open original ↗

Captured source

source ↗
published Aug 21, 2023seen 5dcaptured 11hhttp 200method plain

arcee-ai/mergekit

Description: Tools for merging pretrained large language models.

Language: Python

License: LGPL-3.0

Stars: 7132

Forks: 729

Open issues: 267

Created: 2023-08-21T03:50:04Z

Pushed: 2026-05-06T05:15:16Z

Default branch: main

Fork: no

Archived: no

README:

mergekit

mergekit is a toolkit for merging pre-trained language models. mergekit uses an out-of-core approach to perform unreasonably elaborate merges in resource-constrained situations. Merges can be run entirely on CPU or accelerated with as little as 8 GB of VRAM. Many merging algorithms are supported, with more coming as they catch my attention.

Contents

  • [Why Merge Models?](#why-merge-models)
  • [Features](#features)
  • [Installation](#installation)
  • [Community & Support](#community--support)
  • [Contributing](#contributing)
  • [Community Tools](#community-tools)
  • [Usage](#usage)
  • [Merge Configuration](#merge-configuration)
  • [Parameter Specification](#parameter-specification)
  • [Tokenizer Configuration](#tokenizer-configuration)
  • [Chat Template Configuration](#chat-template-configuration)
  • [Examples](#examples)
  • [Merge Methods](#merge-methods)
  • [LoRA Extraction](#lora-extraction)
  • [Mixture of Experts Merging](#mixture-of-experts-merging)
  • [Evolutionary Merge Methods](#evolutionary-merge-methods)
  • [Multi-Stage Merging (mergekit-multi)](#multi-stage-merging-mergekit-multi)
  • [Raw PyTorch Model Merging (mergekit-pytorch)](#raw-pytorch-model-merging-mergekit-pytorch)
  • [Tokenizer Transplantation (mergekit-tokensurgeon)](#tokenizer-transplantation-mergekit-tokensurgeon)
  • [Citation](#citation)

Why Merge Models?

Model merging is a powerful technique that allows combining the strengths of different models without the computational overhead of ensembling or the need for additional training. By operating directly in the weight space of models, merging can:

  • Combine multiple specialized models into a single versatile model
  • Transfer capabilities between models without access to training data
  • Find optimal trade-offs between different model behaviors
  • Improve performance while maintaining inference costs
  • Create new capabilities through creative model combinations

Unlike traditional ensembling which requires running multiple models, merged models maintain the same inference cost as a single model while often achieving comparable or superior performance.

Features

Key features of mergekit include:

  • Supports Llama, Mistral, GPT-NeoX, StableLM, and more
  • Many [merge methods](#merge-methods)
  • GPU or CPU execution
  • Lazy loading of tensors for low memory use
  • Interpolated gradients for parameter values (inspired by Gryphe's BlockMerge_Gradient script)
  • Piecewise assembly of language models from layers ("Frankenmerging")
  • [Mixture of Experts merging](#mixture-of-experts-merging)
  • [LORA extraction](#lora-extraction)
  • [Evolutionary merge methods](#evolutionary-merge-methods)
  • [Multi-stage merging](#multi-stage-merging-mergekit-multi) for complex workflows.
  • [Merging of raw PyTorch models (mergekit-pytorch)](#raw-pytorch-model-merging-mergekit-pytorch).

Installation

git clone https://github.com/arcee-ai/mergekit.git
cd mergekit

pip install -e . # install the package and make scripts available

If the above fails with the error of:

ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode:
(A "pyproject.toml" file was found, but editable mode currently requires a setuptools-based build.)

You may need to upgrade pip to > 21.3 with the command python3 -m pip install --upgrade pip.

Community & Support

Contributing

We welcome contributions to mergekit! If you have ideas for new merge methods, features, or other improvements, please check out our [contributing guide](CONTRIBUTING.md) for details on how to get started.

Community Tools

  • [FrankensteinAI](https://frankenstein-ai.com/): For those who prefer a browser-based experience without local setup or hardware wrangling, the team at FrankensteinAI has built a hosted platform powered by mergekit. Also features a community gallery and leaderboard for sharing and comparing merged models.

Usage

The script mergekit-yaml is the main entry point for mergekit. It takes a YAML configuration file and an output path, like so:

mergekit-yaml path/to/your/config.yml ./output-model-directory [--cuda] [--lazy-unpickle] [--allow-crimes] [... other options]

This will run the merge and write your merged model to ./output-model-directory.

For more information on the arguments accepted by mergekit-yaml run the command mergekit-yaml --help.

Uploading to Huggingface

When you have a merged model you're happy with, you may want to share it on the Hugging Face Hub. mergekit generates a README.md for your merge with some basic information for a model card. You can edit it to include more details about your merge, like giving it a good name or explaining what it's good at; rewrite it entirely; or use the generated README.md as-is. It is also possible to edit your README.md online once it has been uploaded to the Hub.

Once you're happy with your model card and merged model, you can upload it to the Hugging Face Hub using the huggingface_hub Python library.

# log in to huggingface with an access token (must have write permission)
huggingface-cli login
# upload your model
huggingface-cli upload your_hf_username/my-cool-model ./output-model-directory .

The documentation for huggingface_hub goes into more detail about other options for uploading.

Merge Configuration

Merge configurations are YAML documents specifying the operations to perform in order to produce your merged model. Below are the primary elements of a configuration file:

  • merge_method: Specifies the method to use for merging models. See [Merge Methods](#merge-methods) for a list.
  • slices: Defines slices of layers from different models to be used. This field is mutually exclusive with models.
  • models: Defines entire models to be used for merging. This field is mutually exclusive with slices.

-…

Excerpt shown — open the source for the full document.