RepoIBM (Granite)IBM (Granite)published Oct 2, 2025seen 5d

ibm-granite/gguf

Jupyter Notebook

Open original ↗

Captured source

source ↗
published Oct 2, 2025seen 5dcaptured 9hhttp 200method plain

ibm-granite/gguf

Description: CI/CD for IBM model GGUF conversions, quantizations and packagings for partner delivery

Language: Jupyter Notebook

License: Apache-2.0

Stars: 4

Forks: 0

Open issues: 1

Created: 2025-10-02T20:06:28Z

Pushed: 2025-10-17T16:08:16Z

Default branch: main

Fork: no

Archived: no

README:

gguf

This repository provides an automated CI/CD process to convert, test and deploy IBM Granite models, in safetensor format, from the ibm-granite organization to versioned IBM GGUF collections in Hugging Face Hub under the `ibm-research` organization. This includes:

Topic index

  • [Target IBM models for format conversion](#target-ibm-models-for-format-conversion)
  • [Supported IBM Granite models (GGUF)](#supported-ibm-granite-models-gguf)
  • [Language](#language)
  • [Guardian](#guardian)
  • [Vision](#vision)
  • [Embedding](#embedding-dense)
  • [GGUF Conversion & Quantization](#gguf-conversion--quantization)
  • [GGUF Verification Testing](#gguf-verification-testing)
  • [References](#references)
  • [Releasing GGUF model conversions & quantizations](#releasing-gguf-model-conversions--quantizations)

---

Target IBM models for format conversion

Format conversions (i.e., GGUF) and quantizations will only be provided for canonically hosted model repositories hosted in an official IBM Huggingface organization.

Currently, this includes the following organizations:

  • https://huggingface.co/ibm-granite
  • https://huggingface.co/ibm-research

Additionally, only a select set of IBM models from these orgs. will be converted based upon the following general criteria:

  • The IBM GGUF model needs to be referenced by an AI provider service as a "supported" model.
  • *For example, a local AI provider service such as Ollama or a hosted service such as Replicate.*
  • The GGUF model is referenced by a public blog, tutorial, demo, or other public use case.
  • Specifically, if the model is referenced in an IBM Granite Snack Cookbook

Select quantization will only be made available when:

  • Small form-factor is justified:
  • *e.g., Reduced model size intended running locally on small form-factor devices such as watches and mobile devices.*
  • Performance provides significant benefit without compromising on accuracy (or enabling hallucination).

Supported IBM Granite models (GGUF)

Specifically, the following Granite model repositories are currently supported in GGUF format (by collection) with listed:

###### Language

Typically, this model category includes "instruct" models.

| Source Repo. ID | HF (llama.cpp) Architecture | Target Repo. ID | | --- | --- | --- | | ibm-granite/granite-3.2-2b-instruct | GraniteForCausalLM (gpt2) | ibm-research | | ibm-granite/granite-3.2-8b-instruct | GraniteForCausalLM (gpt2) | ibm-research |

  • Supported quantizations: fp16, Q2_K, Q3_K_L, Q3_K_M, Q3_K_S, Q4_0, Q4_1, Q4_K_M, Q4_K_S, Q5_0, Q5_1, Q5_K_M, Q5_K_S, Q6_K, Q8_0

###### Guardian

| Source Repo. ID | HF (llama.cpp) Architecture | Target HF Org. | | --- | --- | --- | | ibm-granite/granite-guardian-3.2-3b-a800m | GraniteMoeForCausalLM (granitemoe) | ibm-research | | ibm-granite/granite-guardian-3.2-5b | GraniteMoeForCausalLM (granitemoe) | ibm-research |

  • Supported quantizations: fp16, Q4_K_M, Q5_K_M, Q6_K, Q8_0

###### Vision

| HF (llama.cpp) Architecture | Source Repo. ID | Target HF Org. | | --- | --- | --- | | ibm-granite/granite-vision-3.2-2b | GraniteForCausalLM (granite), LlavaNextForConditionalGeneration | ibm-research |

  • Supported quantizations: fp16, Q4_K_M, Q5_K_M, Q8_0

###### Embedding (dense)

| Source Repo. ID | HF (llama.cpp) Architecture | Target HF Org. | | --- | --- | --- | | ibm-granite/granite-embedding-30m-english | Roberta (roberta-bpe) | ibm-research | | ibm-granite/granite-embedding-125m-english | Roberta (roberta-bpe) | ibm-research | | ibm-granite/granite-embedding-107m-multilingual | Roberta (roberta-bpe) | ibm-research | | ibm-granite/granite-embedding-278m-multilingual | Roberta (roberta-bpe) | ibm-research |

  • Supported quantizations: fp16, Q8_0

Note: Sparse model architecture (i.e., HF RobertaMaskedLM) is not currently supported; therefore, there is no conversion for ibm-granite/granite-embedding-30m-sparse.

###### RAG LoRA support**

  • LoRA support is currently in plan (no date).

---

GGUF Conversion & Quantization

The GGUF format is defined in the GGUF specification. The specification describes the structure of the file, how it is encoded, and what information is included.

Currently, the primary means to convert from HF SafeTensors format to GGUF will be the canonical llama.cpp tool convert-hf-to-gguf.py.

for example:

python llama.cpp/convert-hf-to-gguf.py ./ --outfile output_file.gguf --outtype q8_0

Alternatives

##### Ollama CLI (future)

  • https://github.com/ollama/ollama/blob/main/docs/import.md#quantizing-a-model
$ ollama create --quantize q4_K_M mymodel
transferring model data
quantizing F16 model to Q4_K_M
creating new layer sha256:735e246cc1abfd06e9cdcf95504d6789a6cd1ad7577108a70d9902fef503c1bd
creating new layer sha256:0853f0ad24e5865173bbf9ffcc7b0f5d56b66fd690ab1009867e45e7d2c4db0f
writing manifest
success

Note: The Ollama CLI tool only supports a subset of quantizations:

  • (rounding): q4_0, q4_1, q5_0, q5_1, q8_0
  • k-means: q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q5_K_M, q6_K

##### Hugging Face endorsed tool "ggml-org/gguf-my-repo"

  • https://huggingface.co/spaces/ggml-org/gguf-my-repo

Note:

  • Similar to Ollama CLI, the web UI supports only a subset of quantizations.

---

GGUF Verification Testing

As a baseline, each converted model MUST successfully be run in the following providers:

##### llama.cpp testing

llama.cpp - As the core implementation of the GGUF format which is either a direct dependency or utilized as forked code in most all downstream GGUF providers, testing is essential. Specifically, testing to verify the model can be hosted using the llama-server service.

  • *See the specific…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Low stars, routine repo.