RepoTencent HunyuanTencent Hunyuanpublished Oct 23, 2024seen 1w

Tencent-Hunyuan/Tencent-Hunyuan-Large

Python

Open original ↗

Captured source

source ↗

Tencent-Hunyuan/Tencent-Hunyuan-Large

Language: Python

License: NOASSERTION

Stars: 1585

Forks: 120

Open issues: 19

Created: 2024-10-23T06:43:41Z

Pushed: 2024-12-06T08:15:56Z

Default branch: main

Fork: no

Archived: no

README:

中文&nbsp | English

🫣&nbspHugging Face&nbsp&nbsp | &nbsp&nbsp🖥️&nbsp&nbspofficial website&nbsp&nbsp|&nbsp&nbsp🕖&nbsp&nbsp HunyuanAPI&nbsp&nbsp|&nbsp&nbsp🐳&nbsp&nbsp Gitee

Technical Report&nbsp&nbsp|&nbsp&nbsp Demo&nbsp&nbsp&nbsp|&nbsp&nbsp Tencent Cloud TI&nbsp&nbsp&nbsp

Download Models

Models Huggingface Download URL Tencent Cloud Download URL

Hunyuan-A52B-Instruct-FP8 Hunyuan-A52B-Instruct-FP8 Hunyuan-A52B-Instruct-FP8

Hunyuan-A52B-Instruct Hunyuan-A52B-Instruct Hunyuan-A52B-Instruct

Hunyuan-A52B-Pretrain Hunyuan-A52B-Pretrain Hunyuan-A52B-Pretrain

Model Introduction

With the rapid development of artificial intelligence technology, large language models (LLMs) have made significant progress in fields such as natural language processing, computer vision, and scientific tasks. However, as the scale of these models increases, optimizing resource consumption while maintaining high performance has become a key challenge. To address this challenge, we have explored Mixture of Experts (MoE) models. The currently unveiled Hunyuan-Large (Hunyuan-MoE-A52B) model is the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters. This is currently the largest open-source Transformer-based MoE model in the industry, featuring a total of 389 billion parameters and 52 billion active parameters.

By open-sourcing the Hunyuan-Large model and revealing related technical details, we hope to inspire more researchers with innovative ideas and collectively advance the progress and application of AI technology. We welcome you to join our open-source community to explore and optimize future AI models together!

Introduction to Technical Advantages

Model

  • High-Quality Synthetic Data: By enhancing training with synthetic data, Hunyuan-Large can learn richer representations, handle long-context inputs, and generalize better to unseen data.
  • KV Cache Compression: Utilizes Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies to significantly reduce memory usage and computational overhead of KV caches, improving inference throughput.
  • Expert-Specific Learning Rate Scaling: Sets different learning rates for different experts to ensure each sub-model effectively learns from the data and contributes to overall performance.
  • Long-Context Processing Capability: The pre-trained model supports text sequences up to 256K, and the Instruct model supports up to 128K, significantly enhancing the ability to handle long-context tasks.
  • Extensive Benchmarking: Conducts extensive experiments across various languages and tasks to validate the practical effectiveness and safety of Hunyuan-Large.

Inference Framework

  • This open-source release offers two inference backend options tailored for the Hunyuan-Large model: the popular vLLM-backend and the TensorRT-LLM Backend. Both solutions include optimizations for enhanced performance. For instance, the introduction of a new CLA structure significantly reduces GPU memory usage, achieving a 50% savings in the KV-Cache portion, which ensures efficient handling of long text scenarios. Additionally, by employing FP8 quantization, we achieve a 50% reduction in memory usage compared to traditional FP16/BF16 quantization, while maintaining precision and resulting in a 70% increase in throughput. Meanwhile, by leveraging the efficient operators at the core of TRT-LLM, the performance of the TRT-LLM solution surpasses that of vLLM by over 30%. The TRT-LLM solution is widely used in Tencent's Hunyuan project. In this release, we are initially open-sourcing the vLLM solution, with plans to release the TRT-LLM solution in the near future.

Training Framework

  • The Hunyuan-Large open-source model is fully compatible with the Hugging Face format, enabling researchers and developers to perform model fine-tuning using the hf-deepspeed framework. Additionally, we support training acceleration through the use of flash attention. To further assist in the adoption process, we have made the corresponding training scripts and model implementations publicly available to the community through this release, facilitating subsequent model training and fine-tuning operations based on these resources.

Related News

  • 2024.11.25 Our self-developed long-context benchmark, i.e., PenguinScrolls, has been officially released! You can explore the project on GitHub and access the dataset on Hugging Face.
  • 2024.11.18 Hunyuan-A52B-Instruct and Hunyuan-A52B-Instruct-FP8 model update.
  • 2024.11.5 TI Platform has integrated Hunyuan-Large model already, you can easily train and deploy it in just a few steps. Visit Chat with Hunyuan-Large to experience real-time conversations with the model, and explore Hunyuan-Large Best Practice on TI to create your own customized Hunyuan-Large model.
  • 2024.11.5 We have open-sourced Hunyuan-A52B-Pretrain, Hunyuan-A52B-Instruct, and Hunyuan-A52B-Instruct-FP8 on Hugging Face. We also released a technical report and a training and inference operations manual, providing detailed information on the model's capabilities and the procedures for training and inference.

Benchmark Evaluation

Hunyuan-Large pre-trained model achieves the best overall performance compared to both Dense and MoE based competitors having similar activated parameter sizes. For aggregated benchmarks such as MMLU, MMLU-Pro, and CMMLU, Hunyuan-Large consistently achieves the best performance, confirming its comprehensive abilities on aggregated tasks. Hunyuan-Large also shows superior performance in commonsense understanding and reasoning, and classical NLP tasks such as QA and reading comprehension tasks (e.g., CommonsenseQA, PIQA and TriviaQA). For the mathematics capability,...

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

Notable large model release with decent traction.