What does this repo signal mean?

Tencent Hunyuan published Tencent-Hunyuan/Hunyuan-0.5B (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo Tencent-Hunyuan/Hunyuan-0.5B · language Python · Small model release with moderate traction. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Tencent Hunyuan Repo: Tencent-Hunyuan/Hunyuan-0.5B

Captured source

source ↗

GitHub/github.com/Tencent-Hunyuan/Hunyuan-0.5B

Tencent-Hunyuan/Hunyuan-0.5B repository metadata

Source ↗

published Aug 4, 2025seen 5dcaptured 8hhttp 200method plain

Tencent-Hunyuan/Hunyuan-0.5B

Language: Python

License: NOASSERTION

Stars: 55

Forks: 6

Open issues: 3

Created: 2025-08-04T03:29:11Z

Pushed: 2025-08-05T05:00:25Z

Default branch: main

Fork: no

Archived: no

README:

中文&nbsp ｜ English

🤗 Hugging Face | ModelScope | AngelSlim

🖥️ Official Website | 🕖 HunyuanAPI | 🕹️ Demo

GITHUB | cnb.cool | LICENSE

Model Introduction

Hunyuan is Tencent's open-source efficient large language model series, designed for versatile deployment across diverse computational environments. From edge devices to high-concurrency production systems, these models deliver optimal performance with advanced quantization support and ultra-long context capabilities.

We have released a series of Hunyuan dense models, comprising both pre-trained and instruction-tuned variants, with parameter scales of 0.5B, 1.8B, 4B, and 7B. These models adopt training strategies similar to the Hunyuan-A13B, thereby inheriting its robust performance characteristics. This comprehensive model family enables flexible deployment optimization - from resource-constrained edge computing with smaller variants to high-throughput production environments with larger models, all while maintaining strong capabilities across diverse scenarios.

Key Features and Advantages

Hybrid Reasoning Support: Supports both fast and slow thinking modes, allowing users to flexibly choose according to their needs.
Ultra-Long Context Understanding: Natively supports a 256K context window, maintaining stable performance on long-text tasks.
Enhanced Agent Capabilities: Optimized for agent tasks, achieving leading results on benchmarks such as BFCL-v3, τ-Bench and C3-Bench.
Efficient Inference: Utilizes Grouped Query Attention (GQA) and supports multiple quantization formats, enabling highly efficient inference.

Benchmark

Note: The following benchmarks are evaluated by TRT-LLM-backend on several base models.

| Model | Hunyuan-0.5B-Pretrain | Hunyuan-1.8B-Pretrain | Hunyuan-4B-Pretrain | Hunyuan-7B-Pretrain| |:------------------:|:---------------:|:--------------:|:-------------:|:---------------:| | MMLU | 54.02 | 64.62 | 74.01 | 79.82 | | MMLU-Redux | 54.72 | 64.42 | 73.53 | 79 | | MMLU-Pro | 31.15 | 38.65 | 51.91 | 57.79 | | SuperGPQA | 17.23 | 24.98 | 27.28 | 30.47 | | BBH | 45.92 | 74.32 | 75.17 | 82.95 | | GPQA | 27.76 | 35.81 | 43.52 | 44.07 | | GSM8K | 55.64 | 77.26 | 87.49 | 88.25 | | MATH | 42.95 | 62.85 | 72.25 | 74.85 | | EvalPlus | 39.71 | 60.67 | 67.76 | 66.96 | | MultiPL-E | 21.83 | 45.92 | 59.87 | 60.41 | | MBPP | 43.38 | 66.14 | 76.46 | 76.19 | | CRUX-O | 30.75 | 36.88 | 56.5 | 60.75 | | Chinese SimpleQA | 12.51 | 22.31 | 30.53 | 38.86 | | simpleQA (5shot) | 2.38 | 3.61 | 4.21 | 5.69 |

| Topic | Bench | Hunyuan-0.5B-Instruct | Hunyuan-1.8B-Instruct | Hunyuan-4B-Instruct | Hunyuan-7B-Instruct| |:-------------------:|:----------------------------------------------------:|:-------------:|:------------:|:-----------:|:---------------------:| | Mathematics | AIME 2024 AIME 2025 MATH | 17.2 20 48.5 | 56.7 53.9 86 | 78.3 66.5 92.6 | 81.1 75.3 93.7 | | Science | GPQA-Diamond OlympiadBench | 23.3 29.6 | 47.2 63.4 | 61.1 73.1 | 60.1 76.5 | | Coding | Livecodebench Fullstackbench | 11.1 20.9 | 31.5 42 | 49.4 54.6 | 57 56.3 | | Reasoning | BBH DROP ZebraLogic | 40.3 52.8 34.5 | 64.6 76.7 74.6 | 83 78.2 83.5 | 87.8 85.9 85.1 | | Instruction Following | IF-Eval SysBench | 49.7 28.1 | 67.6 55.5 | 76.6 68 | 79.3 72.7 | | Agent | BFCL v3 τ-Bench ComplexFuncBench C3-Bench | 49.8 14.4 13.9 45.3 | 58.3 18.2 22.3 54.6 | 67.9 30.1 26.3 64.3 | 70.8 35.3 29.2 68.5 | | Long Context | PenguinScrolls longbench-v2 FRAMES | 53.9 34.7 41.9 | 73.1 33.2 55.6 | 83.1 44.1 79.2 | 82 43 78.6 |

Use with transformers

First, please install transformers. We will merge it into the main branch later.

pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca

Our model defaults to using slow-thinking reasoning, and there are two ways to disable CoT reasoning. 1. Pass "enable_thinking=False" when calling apply_chat_template. 2. Adding "/no_think" before the prompt will force the model not to use perform CoT reasoning. Similarly, adding "/think" before the prompt will force the model to perform CoT reasoning.

The following code snippet shows how to use the transformers library to load and apply the model. It also demonstrates how to enable and disable the reasoning mode , and how to parse the reasoning process along with the final output.

we use tencent/Hunyuan-7B-Instruct for example

from transformers import AutoModelForCausalLM, AutoTokenizer
import os
import re

model_name_or_path = "tencent/Hunyuan-7B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto") # You may want to use bfloat16 and/or move to GPU here
messages = [
{"role": "user", "content": "Write a short summary of the benefits of regular exercise"},
]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True,return_tensors="pt",
enable_thinking=True # Toggle thinking mode (default: True)
)

outputs = model.generate(tokenized_chat.to(model.device), max_new_tokens=2048)

output_text = tokenizer.decode(outputs[0])
print("output_text=",output_text)
think_pattern = r'(.*?)'
think_matches = re.findall(think_pattern, output_text, re.DOTALL)

answer_pattern = r'(.*?)'
answer_matches = re.findall(answer_pattern, output_text, re.DOTALL)

think_content = [match.strip() for match in think_matches][0]
answer_content = [match.strip() for match in answer_matches][0]
print(f"thinking_content:{think_content}\n\n")
print(f"answer_content:{answer_content}\n\n")

We recommend using the following set of parameters for inference. Note that our model does not have the default system_prompt.

{
"do_sample": true,
"top_k": 20,
"top_p": 0.8,
"repetition_penalty": 1.05,
"temperature": 0.7
}

Training Data Format

If you need to fine-tune our Instruct model, we recommend processing the data…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Small model release with moderate traction