WritingOpenBMB (MiniCPM)OpenBMB (MiniCPM)published May 30, 2023seen 4d

Finally! The foundation model CPM-Bee is open-sourced now!

Open original ↗

Captured source

source ↗

Finally! The foundation model CPM-Bee is open source now! | by OpenBMB | Medium Open in app Sign up Sign in [](https://medium.com/?source=---top_nav_layout_nav-----------------------------------------) Write Sign up Sign in ![](https://miro.medium.com/v2/resize:fill:64:64/1*dmbNkD5D-u45r44go_cf0g.png)

Finally! The foundation model CPM-Bee is open source now!

![OpenBMB ](https://medium.com/@openbmb?source=post_page---byline--3cf6a0017a50---------------------------------------) OpenBMB ·Follow 6 min read ·May 30, 2023 -- [](https://medium.com/m/signin?actionUrl=https://medium.com/_/bookmark/p/3cf6a0017a50&operation=register&redirect=https://medium.com/@openbmb/finally-the-foundation-model-cpm-bee-is-open-sourced-now-3cf6a0017a50&source=---header_actions--3cf6a0017a50---------------------bookmark_footer------------------) Listen Share ![]() Ever since the establishment of the OpenBMB open-source community, we have been steadfastly committed to the idea of “bringing large models into every household.” To achieve the mission, we have been developing a full-process acceleration system for model development to efficiently support the pre-training, fine-tuning, application, and inference steps of large models. We have also initiated theCPM-Live projectfor training billion-scale models through live broadcasts. Finally! The progress bar of CPM-Live’s second phase has reached 100%, and we are excited to usher in the second milestone of CPM-Live: therelease and open source of CPM-Bee! ![]()

Fully upgraded, CPM-Bee hatched from CPM-Ant.

The CPM (Chinese Pretrained Model) series consists of large models self-developed by our team, including thefirst Chinese large model CPM-1, the efficient and user-friendly large model CPM-2, and the controllable and sustainable large model CPM-3.The project proposal of theCPM-Live projectfor training billion-scale models through live broadcasts was released on May 26, 2022. The training of the first-phase model,CPM-Ant, officially commenced on May 29, 2022, and the report was successfully published on September 16, 2022. As the second-phase model of CPM-Live,CPM-Beestarted training on October 13, 2022, and has undergone significant improvements in both foundational capabilities and performance compared to CPM-Ant. The CPM-Bee foundation model encompasses a wide range of functions with high accuracy in semantic understanding. It can efficiently complete various basic tasks, including but not limited totext completion, text generation, translation, Q&A, score prediction, multiple-choice questions, and more.For better user accessibility, we designed the model input and output in aJSONstructured format during pre-training, enabling users to complete various tasks by simply adjusting different task fields.

"Text Generation": {"input": "今天天气很好,我和妈妈一起去公园,", "prompt": "往后写两句话", "": ""}
"Translation": {"input": "北京是中国的首都", "prompt": "中翻英", "": ""}
"Score Prediction": {"input":"之前多次聚餐都选择这里,有各种大小的包房同时能容纳很多人,环境好有特色还有表演,整体聚餐氛围一下被带动起来。现在由于炭火改成了电烤羊,口感真的不如从前,不过其他菜品都还是不错,烤羊剩下的拆骨肉最后还能再加工一下椒盐的也很好吃。","question":"评分是多少?(1-5)","":""}
""Multiple Choice"": {""input"": ""父母都希望自己的孩子诚实、勇敢、有礼貌。要想让孩子成为这样的人,父母首先得从自己做起,要是连自己都做不到,又怎能要求孩子做到呢?"", ""options"": {"">"": ""少提要求"", "">"": ""降低标准"", "">"": ""自己先做好"", "">"": ""让孩子拿主意""}, ""question"": ""教育孩子时,父母应该:"", "">"": """"}

CPM-Bee is afully open-source,commercially availablebillion-parameter foundation model for both Chinese and English languages. It adopts the Transformer auto-regressive model and is pre-trained on a trillion-level high-quality corpus, possessing powerful foundational capabilities. To be more specific, here is a summary of the features of CPM-Bee:

  • Open-source and commercially available:OpenBMB is always adhering to the open-source spirit of “bringing large models into every household.” The CPM-Bee foundation model will be fully open-sourced and commercially available to promote the development of large models. For commercial users, they can simply apply for and obtain an officially authorized certificate to use the model for commercial purposes.
  • Excellent performance in Chinese and English:The pre-training corpus of CPM-Bee foundation model has undergone a rigorous selection and proportioning process in order to achieve outstanding performance in both Chinese and English languages. Please refer to the evaluation tasks and results for specific details.
  • Massive-scale high-quality corpus:CPM-Bee foundation model is trained on a trillion-level corpus, making it one of the models with the most extensive corpora in the whole open-source community.…

Excerpt shown — open the source for the full document.