RepoTencent HunyuanTencent Hunyuanpublished Mar 4, 2025seen 1w

Tencent-Hunyuan/HunyuanVideo-I2V

Python

Open original ↗

Captured source

source ↗

Tencent-Hunyuan/HunyuanVideo-I2V

Description: HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo

Language: Python

License: NOASSERTION

Stars: 1827

Forks: 191

Open issues: 57

Created: 2025-03-04T12:02:05Z

Pushed: 2026-04-07T06:13:09Z

Default branch: main

Fork: no

Archived: no

README:

[中文阅读](./README_zh.md)

HunyuanVideo-I2V 🌅

👋 Join our WeChat and Discord

-----

Following the great successful open-sourcing of our HunyuanVideo, we proudly present the HunyuanVideo-I2V, a new image-to-video generation framework to accelerate open-source community exploration!

This repo contains official PyTorch model definitions, pre-trained weights and inference/sampling code. You can find more visualizations on our project page. Meanwhile, we have released the LoRA training code for customizable special effects, which can be used to create more interesting video effects.

> **HunyuanVideo: A Systematic Framework For Large Video Generation Model**

🔥🔥🔥 News!!

  • Mar 13, 2025: 🚀 We release the parallel inference code for HunyuanVideo-I2V powered by xDiT.
  • Mar 11, 2025: 🎉 We have updated the lora training and inference code after fixing the bug.
  • Mar 07, 2025: 🔥 We have fixed the bug in our open-source version that caused ID changes. Please try the new model weights of HunyuanVideo-I2V to ensure full visual consistency in the first frame and produce higher quality videos.
  • Mar 06, 2025: 👋 We release the inference code and model weights of HunyuanVideo-I2V. Download.

🎥 Demo

I2V Demo

First Frame Consistency Demo

| Reference Image | Generated Video | |:----------------:|:----------------:| | | | | | | | | |

Customizable I2V LoRA Demo

| I2V Lora Effect | Reference Image | Generated Video | |:---------------:|:--------------------------------:|:----------------:| | Hair growth | | | | Embrace | | |

🧩 Community Contributions

If you develop/use HunyuanVideo-I2V in your projects, welcome to let us know.

📑 Open-source Plan

  • HunyuanVideo-I2V (Image-to-Video Model)
  • [x] Inference
  • [x] Checkpoints
  • [x] ComfyUI
  • [x] Lora training scripts
  • [x] Multi-gpus Sequence Parallel inference (Faster inference speed on more gpus)

Contents

  • [HunyuanVideo-I2V 🌅](#hunyuanvideo-i2v-)
  • [🔥🔥🔥 News!!](#-news)
  • [🎥 Demo](#-demo)
  • [I2V Demo](#i2v-demo)
  • [Frist Frame Consistency Demo](#frist-frame-consistency-demo)
  • [Customizable I2V LoRA Demo](#customizable-i2v-lora-demo)
  • [🧩 Community Contributions](#-community-contributions)
  • [📑 Open-source Plan](#-open-source-plan)
  • [Contents](#contents)
  • [HunyuanVideo-I2V Overall Architecture](#hunyuanvideo-i2v-overall-architecture)
  • [📜 Requirements](#-requirements)
  • [🛠️ Dependencies and Installation](#️-dependencies-and-installation)
  • [Installation Guide for Linux](#installation-guide-for-linux)
  • [🧱 Download Pretrained Models](#-download-pretrained-models)
  • [🔑 Single-gpu Inference](#-single-gpu-inference)
  • [Tips for Using Image-to-Video Models](#tips-for-using-image-to-video-models)
  • [Using Command Line](#using-command-line)
  • [More Configurations](#more-configurations)
  • [🎉 Customizable I2V LoRA effects training](#-customizable-i2v-lora-effects-training)
  • [Requirements](#requirements)
  • [Environment](#environment)
  • [Training data construction](#training-data-construction)
  • [Training](#training)
  • [Inference](#inference)
  • [🚀 Parallel Inference on Multiple GPUs by xDiT](#-parallel-inference-on-multiple-gpus-by-xdit)
  • [Using Command Line](#using-command-line-1)
  • [🔗 BibTeX](#-bibtex)
  • [Acknowledgements](#acknowledgements)

---

HunyuanVideo-I2V Overall Architecture

Leveraging the advanced video generation capabilities of HunyuanVideo, we have extended its application to image-to-video generation tasks. To achieve this, we employ a token replace technique to effectively reconstruct and incorporate reference image information into the video generation process.

Since we utilizes a pre-trained Multimodal Large Language Model (MLLM) with a Decoder-Only architecture as the text encoder, we can significantly enhance the model's ability to comprehend the semantic content of the input image and to seamlessly integrate information from both the image and its associated caption. Specifically, the input image is processed by the MLLM to generate semantic image tokens. These tokens are then concatenated with the video latent tokens, enabling comprehensive full-attention computation across the combined data.

The overall architecture of our system is designed to maximize the synergy between image and text modalities, ensuring a robust and coherent generation of video content from static images. This integration not only improves the fidelity of the generated videos but also enhances the model's ability to interpret and utilize complex multimodal inputs. The overall architecture is as follows.

📜 Requirements

The following table shows the requirements for running HunyuanVideo-I2V model (batch size = 1) to generate videos:

| Model | Resolution | GPU Peak Memory | |:----------------:|:-----------:|:----------------:| | HunyuanVideo-I2V | 720p | 60GB |

  • An NVIDIA GPU with CUDA support is required.
  • The model is tested on a single 80G GPU.
  • Minimum: The minimum GPU memory required is 60GB for 720p.
  • Recommended: We recommend using a GPU with 80GB of memory for better generation quality.
  • Tested operating system: Linux

🛠️ Dependencies and Installation

Begin by cloning the repository:...

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

Notable image-to-video model release with strong stars.