RepoOpenBMB (MiniCPM)OpenBMB (MiniCPM)published Jan 29, 2024seen 5d

OpenBMB/MiniCPM

Jupyter Notebook

Open original ↗

Captured source

source ↗
published Jan 29, 2024seen 5dcaptured 13hhttp 200method plain

OpenBMB/MiniCPM

Description: MiniCPM5-1B: A SOTA 1B on-device LLM, small yet powerful.

Language: Jupyter Notebook

License: Apache-2.0

Stars: 9421

Forks: 620

Open issues: 18

Created: 2024-01-29T08:21:15Z

Pushed: 2026-05-31T09:32:22Z

Default branch: main

Fork: no

Archived: no

README:

中文 | English

MiniCPM Tech Report | MiniCPM Wiki (in Chinese) | MiniCPM-V Repo | UltraData

Join our discord and Feishu/Lark | Join Us

> [!NOTE] > ### 🏆 2026 Sparse Operator Acceleration & Race (SOAR) is Now Live! > > The MiniCPM-SALA architecture is just the beginning. Realizing its full potential requires deep system-level synergy and cross-layer compilation optimization. > > OpenBMB, in collaboration with SGLang and NVIDIA, invites global geeks to tackle the limits of 9B-scale, 1M-token inference on a dedicated NVIDIA 6000D environment. > > * 💰 Prize Pool: >$100,000 USD (Top Prize: $89,000) > * 🚀 Goal: Optimize single and multi-batch performance via cross-layer compilation. > > 👉 [Learn more and Register](https://soar.openbmb.cn/)

✨ Highlights

We are releasing MiniCPM5-1B, the first model in the MiniCPM5 series. It is a dense 1B Transformer built for on-device, local deployment, and resource-constrained scenarios, reaching 1B-class open-source SOTA.

🏆 1B-class open-source SOTA: MiniCPM5-1B reaches an average score of 42.57 across reasoning, knowledge, code, instruction-following, math, logic and agentic benchmarks, above the highest average score of 35.61 among strong open-source models in the same size class; its strengths are most visible in agentic tool use, code, and competition math.

![MiniCPM5-1B capability comparison by domain](./assets/minicpm5/public_leaderboard_radar_en.png)

🧠 Hybrid Reasoning: built-in ` chat template, switch via enable_thinking`. The same checkpoint serves as both a fast assistant and a deliberate reasoner.

🛠️ Deployment / Fine-tuning Agent Skills: the repo provides single-page cookbooks for major inference backends and fine-tuning frameworks, each paired with an [Agent Skill](./skills/) to help developers reproduce deployment and fine-tuning workflows.

🐱 Desktop Pet: a local-LLM desktop pet driven by MiniCPM5-1B — see [Desktop Pet](#desktop-pet) below.

🔥 Changelog

  • 📌 [2026.05.19] [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) is released: a compact 1B-class dense model for on-device and resource-constrained use, paired with deployment / fine-tuning [Agent Skills](./skills/).
  • [2026.02.11] [MiniCPM-SALA](https://huggingface.co/openbmb/MiniCPM-SALA) is released: a sparse-and-linear hybrid attention model for million-token context modeling and efficient inference.
  • [2025.09.05] [MiniCPM4.1 series](https://huggingface.co/collections/openbmb/minicpm-4-6841ab29d180257e940baa9b) is released: a trainable sparse-attention model with hybrid reasoning.
  • [2025.06.06] [MiniCPM4](https://huggingface.co/collections/openbmb/minicpm-4-6841ab29d180257e940baa9b) is released: an end-side model with over 5x generation acceleration on typical edge chips.

Older entries (2024 + InfLLM-V2 paper)

  • [2025.09.29] [InfLLM-V2 paper](https://arxiv.org/abs/2509.24663) is released! We can train a sparse attention model with only 5B long-text tokens.
  • [2024.09.05] We release [MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B)! This model outperforms Phi-3.5-mini-instruct and GPT-3.5-Turbo-0125 and is comparable to several models with 7B-9B parameters like Llama3.1-8B-Instruct, Qwen2-7B-Instruct, and GLM-4-9B-Chat.
  • [2024.07.05] Released [MiniCPM-S-1B](https://huggingface.co/openbmb/MiniCPM-S-1B-sft)! This model achieves an average sparsity of 87.89% in the FFN layer, reducing FFN FLOPs by 84%, while maintaining downstream task performance.
  • [2024.04.11] Released [MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k), [MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) and [MiniCPM-1B](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)! Click here to read our technical blog.
  • [2024.02.01] Released [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)! This model performs similarly to Mistral-7B on public benchmarks (with better performance in Chinese, math, and code abilities) and overall outperforms models like Llama2-13B, MPT-30B, and Falcon-40B.

🧭 Quick Links

  • [✨ Highlights](#-highlights)
  • [🔥 Changelog](#-changelog)
  • [📦 Model Downloads](#-model-downloads)
  • [🚀 MiniCPM5-1B](#-minicpm5-1b)
  • [Introduction](#introduction)
  • [Evaluation Results](#evaluation-results)
  • [Training Recipe](#training-recipe)
  • [What does RL + OPD bring?](#what-does-rl--opd-bring)
  • [Quickstart](#quickstart)
  • [Deployment and Fine-tuning Cookbooks and Agent Skills](#deployment-and-fine-tuning-cookbooks-and-agent-skills)
  • [Other Supported Frameworks](#other-supported-frameworks)
  • [Desktop Pet](#desktop-pet)
  • [🧪 MiniCPM-SALA](#-minicpm-sala)
  • [⚡ MiniCPM4 & MiniCPM4.1 Series](#-minicpm4-and-minicpm41-series)
  • [Legacy topics →](./docs/README-legacy.md): BitCPM4 quantization, MiniCPM4 applications
  • [📄 LICENSE](#-license) · [🏛 Institutions](#-institutions) · [📚 Citation](#-citation)

📦 Model Downloads

Current release: MiniCPM5-1B (BF16, GGUF, MLX):

| HuggingFace | ModelScope | |---|---| | MiniCPM5-1B | MiniCPM5-1B | | MiniCPM5-1B-SFT | MiniCPM5-1B-SFT | | MiniCPM5-1B-Base | MiniCPM5-1B-Base | | MiniCPM5-1B-GGUF | MiniCPM5-1B-GGUF | | MiniCPM5-1B-MLX | MiniCPM5-1B-MLX |

Other key releases:

| HuggingFace | ModelScope | |---|---| | MiniCPM-SALA | MiniCPM-SALA | | MiniCPM4.1-8B | MiniCPM4.1-8B | |…

Excerpt shown — open the source for the full document.