OpenBMB/MiniCPM
Jupyter Notebook
Captured source
source ↗OpenBMB/MiniCPM
Description: MiniCPM5-1B: A SOTA 1B on-device LLM, small yet powerful.
Language: Jupyter Notebook
License: Apache-2.0
Stars: 9421
Forks: 620
Open issues: 18
Created: 2024-01-29T08:21:15Z
Pushed: 2026-05-31T09:32:22Z
Default branch: main
Fork: no
Archived: no
README:
中文 | English
MiniCPM Tech Report | MiniCPM Wiki (in Chinese) | MiniCPM-V Repo | UltraData
Join our discord and Feishu/Lark | Join Us
> [!NOTE] > ### 🏆 2026 Sparse Operator Acceleration & Race (SOAR) is Now Live! > > The MiniCPM-SALA architecture is just the beginning. Realizing its full potential requires deep system-level synergy and cross-layer compilation optimization. > > OpenBMB, in collaboration with SGLang and NVIDIA, invites global geeks to tackle the limits of 9B-scale, 1M-token inference on a dedicated NVIDIA 6000D environment. > > * 💰 Prize Pool: >$100,000 USD (Top Prize: $89,000) > * 🚀 Goal: Optimize single and multi-batch performance via cross-layer compilation. > > 👉 [Learn more and Register](https://soar.openbmb.cn/)
✨ Highlights
We are releasing MiniCPM5-1B, the first model in the MiniCPM5 series. It is a dense 1B Transformer built for on-device, local deployment, and resource-constrained scenarios, reaching 1B-class open-source SOTA.
🏆 1B-class open-source SOTA: MiniCPM5-1B reaches an average score of 42.57 across reasoning, knowledge, code, instruction-following, math, logic and agentic benchmarks, above the highest average score of 35.61 among strong open-source models in the same size class; its strengths are most visible in agentic tool use, code, and competition math.

🧠 Hybrid Reasoning: built-in ` chat template, switch via enable_thinking`. The same checkpoint serves as both a fast assistant and a deliberate reasoner.
🛠️ Deployment / Fine-tuning Agent Skills: the repo provides single-page cookbooks for major inference backends and fine-tuning frameworks, each paired with an [Agent Skill](./skills/) to help developers reproduce deployment and fine-tuning workflows.
🐱 Desktop Pet: a local-LLM desktop pet driven by MiniCPM5-1B — see [Desktop Pet](#desktop-pet) below.
🔥 Changelog
- 📌 [2026.05.19] [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) is released: a compact 1B-class dense model for on-device and resource-constrained use, paired with deployment / fine-tuning [Agent Skills](./skills/).
- [2026.02.11] [MiniCPM-SALA](https://huggingface.co/openbmb/MiniCPM-SALA) is released: a sparse-and-linear hybrid attention model for million-token context modeling and efficient inference.
- [2025.09.05] [MiniCPM4.1 series](https://huggingface.co/collections/openbmb/minicpm-4-6841ab29d180257e940baa9b) is released: a trainable sparse-attention model with hybrid reasoning.
- [2025.06.06] [MiniCPM4](https://huggingface.co/collections/openbmb/minicpm-4-6841ab29d180257e940baa9b) is released: an end-side model with over 5x generation acceleration on typical edge chips.
Older entries (2024 + InfLLM-V2 paper)
- [2025.09.29] [InfLLM-V2 paper](https://arxiv.org/abs/2509.24663) is released! We can train a sparse attention model with only 5B long-text tokens.
- [2024.09.05] We release [MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B)! This model outperforms Phi-3.5-mini-instruct and GPT-3.5-Turbo-0125 and is comparable to several models with 7B-9B parameters like Llama3.1-8B-Instruct, Qwen2-7B-Instruct, and GLM-4-9B-Chat.
- [2024.07.05] Released [MiniCPM-S-1B](https://huggingface.co/openbmb/MiniCPM-S-1B-sft)! This model achieves an average sparsity of 87.89% in the FFN layer, reducing FFN FLOPs by 84%, while maintaining downstream task performance.
- [2024.04.11] Released [MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k), [MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) and [MiniCPM-1B](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)! Click here to read our technical blog.
- [2024.02.01] Released [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)! This model performs similarly to Mistral-7B on public benchmarks (with better performance in Chinese, math, and code abilities) and overall outperforms models like Llama2-13B, MPT-30B, and Falcon-40B.
🧭 Quick Links
- [✨ Highlights](#-highlights)
- [🔥 Changelog](#-changelog)
- [📦 Model Downloads](#-model-downloads)
- [🚀 MiniCPM5-1B](#-minicpm5-1b)
- [Introduction](#introduction)
- [Evaluation Results](#evaluation-results)
- [Training Recipe](#training-recipe)
- [What does RL + OPD bring?](#what-does-rl--opd-bring)
- [Quickstart](#quickstart)
- [Deployment and Fine-tuning Cookbooks and Agent Skills](#deployment-and-fine-tuning-cookbooks-and-agent-skills)
- [Other Supported Frameworks](#other-supported-frameworks)
- [Desktop Pet](#desktop-pet)
- [🧪 MiniCPM-SALA](#-minicpm-sala)
- [⚡ MiniCPM4 & MiniCPM4.1 Series](#-minicpm4-and-minicpm41-series)
- [Legacy topics →](./docs/README-legacy.md): BitCPM4 quantization, MiniCPM4 applications
- [📄 LICENSE](#-license) · [🏛 Institutions](#-institutions) · [📚 Citation](#-citation)
📦 Model Downloads
Current release: MiniCPM5-1B (BF16, GGUF, MLX):
| HuggingFace | ModelScope | |---|---| | MiniCPM5-1B | MiniCPM5-1B | | MiniCPM5-1B-SFT | MiniCPM5-1B-SFT | | MiniCPM5-1B-Base | MiniCPM5-1B-Base | | MiniCPM5-1B-GGUF | MiniCPM5-1B-GGUF | | MiniCPM5-1B-MLX | MiniCPM5-1B-MLX |
Other key releases:
| HuggingFace | ModelScope | |---|---| | MiniCPM-SALA | MiniCPM-SALA | | MiniCPM4.1-8B | MiniCPM4.1-8B | |…
Excerpt shown — open the source for the full document.