RepoOpenBMB (MiniCPM)OpenBMB (MiniCPM)published Feb 7, 2024seen 5d

OpenBMB/UltraLink

Python

Open original ↗

Captured source

source ↗
published Feb 7, 2024seen 5dcaptured 10hhttp 200method plain

OpenBMB/UltraLink

Description: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset

Language: Python

License: MIT

Stars: 28

Forks: 6

Open issues: 0

Created: 2024-02-07T03:15:24Z

Pushed: 2025-01-19T09:28:05Z

Default branch: main

Fork: no

Archived: no

README:

News

  • ❗️❗️ Febrary 6, 2024: Releasing a multi-lingual, knowledge-grounded data augmented, multi-round dialogue dataset UltraLink and the model weight of UltraLink-LM.

Introduction

UltraLink

UltraLink is a multi-lingual, knowledge-grounded data augmented, multi-round dialogue dataset. It contains language-specific chat data, language-agnostic chat data, code data and math data in 5 languages: English, Chinese, Spanish, Russian, and French. It can be downloaded in this huggingface link. Different from previous works that simply translate English instructions, we consider both the language-specific and language-agnostic abilities of LLMs. Firstly, we introduce a knowledge-grounded data augmentation approach to elicit more culture-specific knowledge of LLMs, improving their ability to serve users from different countries. Moreover, we find modern LLMs possess strong cross-lingual transfer capabilities, thus repeatedly learning identical content in various languages is not necessary. Consequently, we can substantially prune the language-agnostic SFT data without any performance degradation, making multilingual SFT more efficient.

UltraLink-LM

> The UltraLink-LM is a massively multilingual generative language model that follows instructions in 5 languages, English, French, Russian, Spanish, and Chinese. The model is capable of generating text in 5 languages with high quality and diversity. > UltraLink-LM outperforms PolyLM-Chat-13b, [Guanaco](JosephusCheung/Guanaco), and Bloomz-7b1-mt in code, math and chat abilities in four languages, and has a high-quality and diverse text generation performance in all languages. > The UltraLink-LM is trained using UltraLink, UltraChat, Magicoder-Evol, Magicoder-OSS, MetaMathQA, and ShareGPT. > We release the checkpoints under a MIT license to further our mission of multilingual technologies empowering a multilingual world. It can be downloaded in this huggingface link.

Performance

We report 6 evaluations in this section: multilingual HumanEval, MGSM, OMGEval, ARC, Hellaswag and MMLU. Natural language generation performance is evaluated by HumanEval MGSM and OMGEval, while natural language understanding is evaluated by ARC, Hellaswag and MMLU. Evaluations of modern LLMs may be biased and affected by many factors, we are also actively working on more comprehensive evaluation methods.

Multilingual HumanEval

HumanEval is a well-known benchmark for evaluating the code ability of LLMs. It execute the code snippets generated by the model and evaluate their correctness. Since there are no existing multilingual test set for code generation, we use GPT-3.5 with carefully-designed prompts to translation HumanEval into other languages.

| Model | En | Zh | Es | Ru | Fr | Avg | | ---------------------- | -------- | -------- | -------- | -------- | -------- | -------- | | Aya-101 | 0.6 | 0 | 0 | 0 | 0 | 0.1 | | Aya-5-languages* | 6.1 | 9.8 | 6.1 | 8.5 | 4.3 | 7.0 | | Bloomz-7b1-mt | 8.5 | 7.3 | 6.1 | 8.5 | 6.1 | 7.3 | | Phoenix-inst-chat-7b | 11.0 | 10.4 | 8.5 | 1.2 | 13.4 | 12.2 | | PolyLM-Multialpaca-13b | 8.5 | 7.3 | 6.1 | 6.1 | 6.1 | 6.8 | | PolyLM-Chat-13b | 10.4 | 7.9 | 6.1 | 7.3 | 8.5 | 8.1 | | Chimera-inst-chat-13b | 14.6 | 13.4 | 14.6 | 12.8 | 14.0 | 13.9 | | Okapi-7b | 12.2 | 11.0 | 8.5 | 8.5 | 8.5 | 9.8 | | Guanaco-7b | 9.2 | 6.7 | 11.0 | 9.8 | 12.8 | 9.9 | | Guanaco-13b | 18.3 | 15.9 | 9.8 | 8.5 | 14.6 | 12.2 | | UltraLink-LM | 60.4 | 43.9 | 40.9 | 49.4 | 39.6 | 46.8 |

\* Specially, Aya-5-languages is obtained by randomly extracting 3M data after selecting 5 languages(which are same languages that UltraLink supports) and then finetuned with 1 epoch on Llama-13b.

MGSM

We employ MGSM to evaluate the math reasoning abilities, which is a multilingual benchmark. It compares the result with correct answers and evaluates the model's ability to perform mathematical reasoning. | Model | En | Zh | Es | Ru | Fr | Avg | | ---------------------- | -------- | -------- | -------- | -------- | -------- | -------- | | Aya-101 | 8.8 | 4 | 6 | 8 | 9.2 | 7.2 | | Aya-5-languages | 28.8 | 5.6 | 18 | 17.2 | 19.2 | 17.8 | | Bloomz-7b1-mt | 2.8 | 1.6 | 2.0 | 0.4 | 2.8 | 1.7 | | Phoenix-inst-chat-7b | 3.2 | 3.2 | 2.8 | 3.2 | 3.2 | 3.1 | | PolyLM-Multialpaca-13b | 1.2 | 2.8 | 1.6 | 2.8 | 2.4 | 2.4 | | PolyLM-Chat-13b | 10.8 | 6.4 | 4.8 | 4.4 | 5.6 | 5.3 | | Chimera-inst-chat-13b | 14.0 | 11.6 | 10.0 | 12.0 |…

Excerpt shown — open the source for the full document.