What does this repo signal mean?

Tencent Hunyuan published Tencent-Hunyuan/HunyuanVision. This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo Tencent-Hunyuan/HunyuanVision · New model release, low traction. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Tencent Hunyuan Repo: Tencent-Hunyuan/HunyuanVision

Captured source

source ↗

GitHub/github.com/Tencent-Hunyuan/HunyuanVision

Tencent-Hunyuan/HunyuanVision repository metadata

Source ↗

published Oct 6, 2025seen Jun 5captured Jun 11http 200method plain

Tencent-Hunyuan/HunyuanVision

Stars: 95

Forks: 1

Open issues: 1

Created: 2025-10-06T05:35:38Z

Pushed: 2025-10-21T05:17:13Z

Default branch: main

Fork: no

Archived: no

README:

📑🤗 Paper & Weights are coming &nbsp&nbsp | &nbsp&nbsp 💻 API&nbsp&nbsp | &nbsp&nbsp 💭 Direct Chat @ LMArena&nbsp&nbsp

We are excited to introduce Hunyuan-Vision-1.5, a mamba-transformer hybrid architecture vision-language model that offers advanced multilingual multimodal understanding and reasoning capabilities.

News

October 6, 2025: hunyuan-vision-1.5-thinking ranked 3rd on LmArena, the best performing model in China.

📑 Open-source Plan

Hunyuan-Vision-1.5 (vision language model)
[ ] Hunyuan-Vision-1.5 Technical Report
[ ] Hunyuan-Vision-1.5 Checkpoints (A56B, 4B)
[ ] Hunyuan-ViT-V1 Checkpoints
[ ] TRT Inference Support
[ ] "Thinking on Images" Support
[ ] VLLM Support

Highlights

⚙️ Hybrid Architecture: With a novel mamba-transformer hybrid architecture, Hunyuan-Vision-1.5 achieves state-of-the-art performance on multimodal understanding tasks while delivering highly efficient inference.

🧩 Advanced "Thinking-on-Image" Reasoning: Beyond strong and robust multimodal reasoning, Hunyuan-Vision-1.5 offers more advanced thinking-with-image capabilities that support deeper multimodal understanding and reasoning with a novel visual reflection paradigm.

🌐 Versatility: Hunyuan-Vision-1.5 excels across various tasks from image and video understanding to more advanced tasks such as visual reasoning and 3D spatial comprehension. It also offers a seamless, multilingual user experience across real-world applications, ensuring smooth performance across languages and diverse task contexts.

We will open-source Hunyuan-Vision-1.5. The model and technical report will be released in late October. Stay tuned for more updates!

Quickstart

Hunyuan-Vision-1.5 is now available at Tencent Cloud. You are welcome to try our most advanced model right now. ("Thinking on Images" mode will be available later)

from openai import OpenAI

# set your Tencent Cloud API key here

API_KEY = ""

client = OpenAI(

api_key=API_KEY,

base_url="https://api.hunyuan.cloud.tencent.com/v1",

)

MODEL_NAME = 'hunyuan-t1-vision-20250916'

completion = client.chat.completions.create(

model=MODEL_NAME,

messages=[

{"role": "user", "content": [{"type": "image_url","image_url": {"url": "https://dscache.tencent-cloud.cn/upload/uploader/hunyuan-64b418fd052c033b228e04bc77bbc4b54fd7f5bc.png"}},{"type": "text", "text": "What is it?"},]}]

)

print(completion.choices[0])

Our model is also available on LMArena Direct Chat. You can try it out by selecting hunyuan-vision-1.5-thinking from the model list in Direct Chat.

Model Capabilities

Multimodal Understanding

Hunyuan-Vision-1.5 is a vision-language model designed for general-purpose multimodal understanding and reasoning. It excels across various tasks from image and video recognition, OCR, diagram understanding to more advanced tasks such as visual reasoning and 3D spatial comprehension.

Multilingual

We aim to offer a seamless, multilingual user experience across real-world applications, ensuring smooth performance across languages and diverse task contexts. You can try our model using your preferred language.

Advanced Multimodal Thinking

Hunyuan-Vision-1.5 offers more advanced thinking-with-image capabilities that support deeper multimodal understanding and reasoning with a "thinking on images" paradigm. Our model is optimized to use various extra tools to help the thinking process by modifying input images (crop/zoom-in, drawing points/lines/boxes, etc.) or acquiring additional knowledge via web search.

Notability

notability 5.0/10

New model release, low traction