RepoTencent HunyuanTencent Hunyuanpublished May 26, 2025seen 5d

Tencent-Hunyuan/HunyuanVideo-Avatar

Python

Open original ↗

Captured source

source ↗

Tencent-Hunyuan/HunyuanVideo-Avatar

Language: Python

License: NOASSERTION

Stars: 2116

Forks: 343

Open issues: 73

Created: 2025-05-26T11:24:20Z

Pushed: 2025-12-16T12:32:46Z

Default branch: main

Fork: no

Archived: no

README:

HunyuanVideo-Avatar 🌅

![image](assets/material/teaser.png)

> **HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters**

🔥🔥🔥 News!!

  • Jun 06, 2025: 🔥 HunyuanVideo-Avatar supports Single GPU with only 10GB VRAM, with TeaCache included, HUGE THANKS to Wan2GP
  • May 28, 2025: 🔥 HunyuanVideo-Avatar is available in Cloud-Native-Build (CNB) HunyuanVideo-Avatar.
  • May 28, 2025: 👋 We release the inference code and model weights of HunyuanVideo-Avatar. [Download](weights/README.md).

📑 Open-source Plan

  • HunyuanVideo-Avatar
  • [x] Inference
  • [x] Checkpoints
  • [ ] ComfyUI

Contents

  • [HunyuanVideo-Avatar 🌅](#HunyuanVideo-Avatar-)
  • [🔥🔥🔥 News!!](#-news)
  • [📑 Open-source Plan](#-open-source-plan)
  • [Contents](#contents)
  • [Abstract](#abstract)
  • [HunyuanVideo-Avatar Overall Architecture](#HunyuanVideo-Avatar-overall-architecture)
  • [🎉 HunyuanVideo-Avatar Key Features](#-HunyuanVideo-Avatar-key-features)
  • [Multimodal Video customization](#multimodal-video-customization)
  • [Various Applications](#various-applications)
  • [📈 Comparisons](#-comparisons)
  • [📜 Requirements](#-requirements)
  • [🛠️ Dependencies and Installation](#️-dependencies-and-installation)
  • [Installation Guide for Linux](#installation-guide-for-linux)
  • [🧱 Download Pretrained Models](#-download-pretrained-models)
  • [🚀 Parallel Inference on Multiple GPUs](#-parallel-inference-on-multiple-gpus)
  • [🔑 Single-gpu Inference](#-single-gpu-inference)
  • [Run with very low VRAM](#run-with-very-low-vram)
  • [Run a Gradio Server](#run-a-gradio-server)
  • [🔗 BibTeX](#-bibtex)
  • [Acknowledgements](#acknowledgements)

---

Abstract

Recent years have witnessed significant progress in audio-driven human animation. However, critical challenges remain in (i) generating highly dynamic videos while preserving character consistency, (ii) achieving precise emotion alignment between characters and audio, and (iii) enabling multi-character audio-driven animation. To address these challenges, we propose HunyuanVideo-Avatar, a multimodal diffusion transformer (MM-DiT)-based model capable of simultaneously generating dynamic, emotion-controllable, and multi-character dialogue videos. Concretely, HunyuanVideo-Avatar introduces three key innovations: (i) A character image injection module is designed to replace the conventional addition-based character conditioning scheme, eliminating the inherent condition mismatch between training and inference. This ensures the dynamic motion and strong character consistency; (ii) An Audio Emotion Module (AEM) is introduced to extract and transfer the emotional cues from an emotion reference image to the target generated video, enabling fine-grained and accurate emotion style control; (iii) A Face-Aware Audio Adapter (FAA) is proposed to isolate the audio-driven character with latent-level face mask, enabling independent audio injection via cross-attention for multi-character scenarios. These innovations empower HunyuanVideo-Avatar to surpass state-of-the-art methods on benchmark datasets and a newly proposed wild dataset, generating realistic avatars in dynamic, immersive scenarios. The source code and model weights will be released publicly.

HunyuanVideo-Avatar Overall Architecture

![image](assets/material/method.png)

We propose HunyuanVideo-Avatar, a multi-modal diffusion transformer(MM-DiT)-based model capable of generating dynamic, emotion-controllable, and multi-character dialogue videos.

🎉 HunyuanVideo-Avatar Key Features

![image](assets/material/demo.png)

High-Dynamic and Emotion-Controllable Video Generation

HunyuanVideo-Avatar supports animating any input avatar images to high-dynamic and emotion-controllable videos with simple audio conditions. Specifically, it takes as input multi-style avatar images at arbitrary scales and resolutions. The system supports multi-style avatars encompassing photorealistic, cartoon, 3D-rendered, and anthropomorphic characters. Multi-scale generation spanning portrait, upper-body and full-body. It generates videos with high-dynamic foreground and background, achieving superior realistic and naturalness. In addition, the system supports controlling facial emotions of the characters conditioned on input audio.

Various Applications

HunyuanVideo-Avatar supports various downstream tasks and applications. For instance, the system generates talking avatar videos, which could be applied to e-commerce, online streaming, social media video production, etc. In addition, its multi-character animation feature enlarges the application such as video content creation, editing, etc.

📜 Requirements

  • An NVIDIA GPU with CUDA support is required.
  • The model is tested on a machine with 8GPUs.
  • Minimum: The minimum GPU memory required is 24GB for 704px768px129f but very slow.
  • Recommended: We recommend using a GPU with 96GB of memory for better generation quality.
  • Tips: If OOM occurs when using GPU with 80GB of memory, try to reduce the image resolution.
  • Tested operating system: Linux

🛠️ Dependencies and Installation

Begin by cloning the repository:

git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-Avatar.git
cd HunyuanVideo-Avatar

Installation Guide for Linux

We recommend CUDA versions 12.4 or 11.8 for the manual installation.

Conda's installation instructions are available here.

# 1. Create conda environment
conda create -n HunyuanVideo-Avatar python==3.10.9

# 2. Activate the environment
conda activate HunyuanVideo-Avatar

# 3. Install PyTorch and other dependencies using conda
# For CUDA 11.8
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# For CUDA 12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

# 4. Install pip dependencies
python -m pip install -r requirements.txt
# 5. Install flash…

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

Notable video avatar repo with strong early stars.