RepoOpenBMB (MiniCPM)OpenBMB (MiniCPM)published Jan 29, 2024seen 5d

OpenBMB/MiniCPM-V

Python

Open original ↗

Captured source

source ↗
published Jan 29, 2024seen 5dcaptured 10hhttp 200method plain

OpenBMB/MiniCPM-V

Description: A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone

Language: Python

License: Apache-2.0

Stars: 25582

Forks: 2006

Open issues: 52

Created: 2024-01-29T05:30:33Z

Pushed: 2026-06-04T10:01:45Z

Default branch: main

Fork: no

Archived: no

README:

MiniCPM-V and MiniCPM-o are multimodal LLM series designed for strong performance and efficient deployment on devices. MiniCPM-V focuses on efficient vision-language understanding across image, video and text inputs. MiniCPM-o extends the family toward real-time end-to-end omnimodal interaction with streaming video and audio inputs plus text and speech outputs. The most notable models in the series currently include:

  • MiniCPM-V 4.6: 🔥🔥🔥 The latest and most efficient model in the MiniCPM-V series. With a total of 1.3B parameters, it surpasses larger models like Gemma4-E2B-it in performance, while showing superior efficiency than smaller models like Qwen3.5-0.8B (achieving ~1.5x token throughput). Powered by the latest intra-ViT early compression technique in LLaVA-UHD v4, MiniCPM-V 4.6 reduces the visual encoding computation cost by more than 50%, and supports mixed 4x/16x visual token compression rate for a more flexible performance-efficiency trade-off in different tasks. The model can be deployed across common mobile platforms, including iOS, Android and HarmonyOS, with edge adaptation code open-sourced.
  • MiniCPM-o 4.5: ⭐️⭐️⭐️ The latest and most capable model in the MiniCPM-o series. With a total of 9B parameters, this end-to-end model approaches Gemini 2.5 Flash in vision, speech, and full-duplex multimodal live streaming, making it one of the most versatile and performant models in the open-source community. The new full-duplex multimodal live streaming capability means that the output streams (speech and text), and the real-time input streams (video and audio) do not block each other. This enables MiniCPM-o 4.5 to see, listen, and speak simultaneously in a real-time omnimodal conversation, and perform proactive interactions such as proactive reminding.

News

  • [2026.05.17] ⭐️⭐️⭐️ We release the [API service](./docs/api.md) of MiniCPM-V 4.6 and MiniCPM-o 4.5, and a public free API key for MiniCPM-V 4.6 is available for everyone!
  • [2026.05.11] 🔥🔥🔥 We open-source MiniCPM-V 4.6, with mixed 4x/16x visual token compression. Powered by strong encoding efficiency and its lightweight 1.3B scale, it is our most edge-deployment-friendly model to date, achieving ~1.5x token throughput compared to Qwen3.5 0.8B. Try it now!
  • [2026.02.06] 🥳 🥳 🥳 We open-sourced a realtime web demo deployable on your own devices like Mac or GPU. [Try it now](#web-demo-deployment)!
  • [2026.02.03] 🔥🔥🔥 We open-source MiniCPM-o 4.5, which matches Gemini 2.5 Flash on vision and speech, and supports full-duplex multimodal live streaming. Try it now!
  • [2025.08.26] 🔥🔥🔥 We open-source MiniCPM-V 4.5, which outperforms GPT-4o-latest, Gemini-2.0 Pro, and Qwen2.5-VL 72B. It advances popular capabilities of MiniCPM-V, and brings useful new features. Try it now!
  • [2025.08.01] ⭐️⭐️⭐️ We open-sourced the MiniCPM-V & o Cookbook! It provides comprehensive guides for diverse user scenarios, paired with our new Docs Site for smoother onboarding.
  • [2025.03.01] 🚀🚀🚀 RLAIF-V, the alignment technique of MiniCPM-o, is accepted by CVPR 2025 Highlights!The code, dataset, paper are open-sourced!
  • [2025.01.19] ⭐️⭐️⭐️ MiniCPM-o tops GitHub Trending and reaches top-2 on Hugging Face Trending!
  • [2024.05.23] 🔥🔥🔥 MiniCPM-V tops GitHub Trending and Hugging Face Trending! Our demo, recommended by Hugging Face Gradio’s official account, is available here. Come and try it out!

Click to view more news.

  • [2026.05.07] 📢📢📢 We release the MiniCPM-o 4.5 technical report, introducing the key techniques behind its real-time full-duplex omni-modal interaction. Read it here.
  • [2026.02.05] 📢📢📢 We note the web demo may experience latency issues due to network conditions. We are working actively to provide a Docker image for local deployment of the real-time interactive Demo as soon as possible. Please stay tuned!
  • [2025.09.18] 📢📢📢 MiniCPM-V 4.5 technical report is now released! See here.
  • [2025.09.01] ⭐️⭐️⭐️ MiniCPM-V 4.5 has been officially supported by llama.cpp, vLLM, and LLaMA-Factory. You are welcome to use it directly through these official channels! Support for additional frameworks such as Ollama and SGLang is actively in progress.
  • [2025.08.02] 🚀🚀🚀 We open-source MiniCPM-V 4.0, which outperforms GPT-4.1-mini-20250414 in image understanding. It advances popular features of MiniCPM-V 2.6, and largely improves the efficiency. We also open-source the iOS App on iPhone and iPad. Try it now!
  • [2025.06.20] ⭐️⭐️⭐️ Our official Ollama repository is released. Try our latest models with one click
  • [2025.01.24] 📢📢📢 MiniCPM-o 2.6 technical report is released! See here.
  • [2025.01.23] 💡💡💡 MiniCPM-o 2.6 is now supported by Align-Anything, a framework by PKU-Alignment Team for aligning any-to-any modality large models with human intentions. It supports DPO and SFT fine-tuning on both vision and audio. Try it now!
  • [2025.01.19] 📢 ATTENTION! We are currently working on merging MiniCPM-o 2.6 into the official repositories of llama.cpp, Ollama, and vllm. Until the merge is complete, please USE OUR LOCAL FORKS of…

Excerpt shown — open the source for the full document.