Tencent-Hunyuan/HunyuanVideo-Avatar
Python
Captured source
source ↗Tencent-Hunyuan/HunyuanVideo-Avatar
Language: Python
License: NOASSERTION
Stars: 2116
Forks: 343
Open issues: 73
Created: 2025-05-26T11:24:20Z
Pushed: 2025-12-16T12:32:46Z
Default branch: main
Fork: no
Archived: no
README:
HunyuanVideo-Avatar 🌅

> **HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters**
🔥🔥🔥 News!!
- Jun 06, 2025: 🔥 HunyuanVideo-Avatar supports Single GPU with only 10GB VRAM, with TeaCache included, HUGE THANKS to Wan2GP
- May 28, 2025: 🔥 HunyuanVideo-Avatar is available in Cloud-Native-Build (CNB) HunyuanVideo-Avatar.
- May 28, 2025: 👋 We release the inference code and model weights of HunyuanVideo-Avatar. [Download](weights/README.md).
📑 Open-source Plan
- HunyuanVideo-Avatar
- [x] Inference
- [x] Checkpoints
- [ ] ComfyUI
Contents
- [HunyuanVideo-Avatar 🌅](#HunyuanVideo-Avatar-)
- [🔥🔥🔥 News!!](#-news)
- [📑 Open-source Plan](#-open-source-plan)
- [Contents](#contents)
- [Abstract](#abstract)
- [HunyuanVideo-Avatar Overall Architecture](#HunyuanVideo-Avatar-overall-architecture)
- [🎉 HunyuanVideo-Avatar Key Features](#-HunyuanVideo-Avatar-key-features)
- [Multimodal Video customization](#multimodal-video-customization)
- [Various Applications](#various-applications)
- [📈 Comparisons](#-comparisons)
- [📜 Requirements](#-requirements)
- [🛠️ Dependencies and Installation](#️-dependencies-and-installation)
- [Installation Guide for Linux](#installation-guide-for-linux)
- [🧱 Download Pretrained Models](#-download-pretrained-models)
- [🚀 Parallel Inference on Multiple GPUs](#-parallel-inference-on-multiple-gpus)
- [🔑 Single-gpu Inference](#-single-gpu-inference)
- [Run with very low VRAM](#run-with-very-low-vram)
- [Run a Gradio Server](#run-a-gradio-server)
- [🔗 BibTeX](#-bibtex)
- [Acknowledgements](#acknowledgements)
---
Abstract
Recent years have witnessed significant progress in audio-driven human animation. However, critical challenges remain in (i) generating highly dynamic videos while preserving character consistency, (ii) achieving precise emotion alignment between characters and audio, and (iii) enabling multi-character audio-driven animation. To address these challenges, we propose HunyuanVideo-Avatar, a multimodal diffusion transformer (MM-DiT)-based model capable of simultaneously generating dynamic, emotion-controllable, and multi-character dialogue videos. Concretely, HunyuanVideo-Avatar introduces three key innovations: (i) A character image injection module is designed to replace the conventional addition-based character conditioning scheme, eliminating the inherent condition mismatch between training and inference. This ensures the dynamic motion and strong character consistency; (ii) An Audio Emotion Module (AEM) is introduced to extract and transfer the emotional cues from an emotion reference image to the target generated video, enabling fine-grained and accurate emotion style control; (iii) A Face-Aware Audio Adapter (FAA) is proposed to isolate the audio-driven character with latent-level face mask, enabling independent audio injection via cross-attention for multi-character scenarios. These innovations empower HunyuanVideo-Avatar to surpass state-of-the-art methods on benchmark datasets and a newly proposed wild dataset, generating realistic avatars in dynamic, immersive scenarios. The source code and model weights will be released publicly.
HunyuanVideo-Avatar Overall Architecture

We propose HunyuanVideo-Avatar, a multi-modal diffusion transformer(MM-DiT)-based model capable of generating dynamic, emotion-controllable, and multi-character dialogue videos.
🎉 HunyuanVideo-Avatar Key Features

High-Dynamic and Emotion-Controllable Video Generation
HunyuanVideo-Avatar supports animating any input avatar images to high-dynamic and emotion-controllable videos with simple audio conditions. Specifically, it takes as input multi-style avatar images at arbitrary scales and resolutions. The system supports multi-style avatars encompassing photorealistic, cartoon, 3D-rendered, and anthropomorphic characters. Multi-scale generation spanning portrait, upper-body and full-body. It generates videos with high-dynamic foreground and background, achieving superior realistic and naturalness. In addition, the system supports controlling facial emotions of the characters conditioned on input audio.
Various Applications
HunyuanVideo-Avatar supports various downstream tasks and applications. For instance, the system generates talking avatar videos, which could be applied to e-commerce, online streaming, social media video production, etc. In addition, its multi-character animation feature enlarges the application such as video content creation, editing, etc.
📜 Requirements
- An NVIDIA GPU with CUDA support is required.
- The model is tested on a machine with 8GPUs.
- Minimum: The minimum GPU memory required is 24GB for 704px768px129f but very slow.
- Recommended: We recommend using a GPU with 96GB of memory for better generation quality.
- Tips: If OOM occurs when using GPU with 80GB of memory, try to reduce the image resolution.
- Tested operating system: Linux
🛠️ Dependencies and Installation
Begin by cloning the repository:
git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-Avatar.git cd HunyuanVideo-Avatar
Installation Guide for Linux
We recommend CUDA versions 12.4 or 11.8 for the manual installation.
Conda's installation instructions are available here.
# 1. Create conda environment conda create -n HunyuanVideo-Avatar python==3.10.9 # 2. Activate the environment conda activate HunyuanVideo-Avatar # 3. Install PyTorch and other dependencies using conda # For CUDA 11.8 conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia # For CUDA 12.4 conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia # 4. Install pip dependencies python -m pip install -r requirements.txt # 5. Install flash…
Excerpt shown — open the source for the full document.
Notability
notability 7.0/10Notable video avatar repo with strong early stars.