RepoTencent HunyuanTencent Hunyuanpublished Aug 27, 2025seen 5d

Tencent-Hunyuan/HunyuanWorld-Voyager

Python

Open original ↗

Captured source

source ↗

Tencent-Hunyuan/HunyuanWorld-Voyager

Description: Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.

Language: Python

License: NOASSERTION

Stars: 1566

Forks: 165

Open issues: 29

Created: 2025-08-27T09:34:10Z

Pushed: 2026-04-15T17:30:08Z

Default branch: main

Fork: no

Archived: no

README: [中文阅读](README_zh.md)

HunyuanWorld-Voyager

-----

We introduce HunyuanWorld-Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Voyager can generate 3D-consistent scene videos for world exploration following custom camera trajectories. It can also generate aligned depth and RGB video for efficient and direct 3D reconstruction.

🔥🔥🔥 News!!

  • April 16, 2026: 🤗 We release HY-World-2.0, state-of-the-art 3D world model!
  • December 18, 2025: 👋 We release HunyuanWorld-1.5 (WorldPlay), enabling real-time world creation and play!
  • October 22, 2025: 👋 We release HunyuanWorld-1.1 (WorldMirror), supporting 3D world creation from videos or multi-view images!
  • October 16, 2025: 👋 We recently propose FlashWorld, enabling 3DGS world generation in 5~10 seconds on a single GPU!
  • Sep 2, 2025: 👋 We release the code and model weights of HunyuanWorld-Voyager. [Download](ckpts/README.md).

> Join our [Wechat](#) and [Discord](https://discord.gg/dNBrdrGGMa) group to discuss and find help from us.

| Wechat Group | Xiaohongshu | X | Discord | |--------------------------------------------------|-------------------------------------------------------|---------------------------------------------|---------------------------------------------------| | | | | |

🎥 Demo

Demo Video

Camera-Controllable Video Generation

| Input | Generated Video | |:----------------:|:----------------:| | | | | | | | | |

Multiple Applications

  • Video Reconstruction

| Generated Video | Reconstructed Point Cloud | |:---------------:|:--------------------------------:| | | |

  • Image-to-3D Generation

| | | |:---------------:|:---------------:| | | |

  • Video Depth Estimation

| | | |:---------------:|:---------------:| | | |

☯️ HunyuanWorld-Voyager Introduction

Architecture

Voyager consists of two key components:

(1) World-Consistent Video Diffusion: A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence.

(2) Long-Range World Exploration: An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency.

To train Voyager, we propose a scalable data engine, i.e., a video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Using this pipeline, we compile a dataset of over 100,000 video clips, combining real-world captures and synthetic Unreal Engine renders.

Performance

Method WorldScore Average Camera Control Object Control Content Alignment 3D Consistency Photometric Consistency Style Consistency Subjective Quality

WonderJourney 🟡63.75 🟡84.6 37.1 35.54 80.6 79.03 62.82 🟢66.56

WonderWorld 🟢72.69 🔴92.98 51.76 🔴71.25 🔴86.87 85.56 70.57 49.81

EasyAnimate 52.85 26.72 54.5 50.76 67.29 47.35 🟡73.05 50.31

Allegro 55.31 24.84 🟡57.47 🟡51.48 70.5 69.89 65.6 47.41

Gen-3 60.71 29.47 🟢62.92 50.49 68.31 🟢87.09 62.82 🟡63.85

CogVideoX-I2V 62.15 38.27 40.07 36.73 🟢86.21 🔴88.12 🟢83.22 62.44

Voyager 🔴77.62 🟢85.95 🔴66.92 🟢68.92 🟡81.56 🟡85.99 🔴84.89 🔴71.09

Quantitative comparison on WorldScore Benchmark. 🔴 indicates the 1st, 🟢 indicates the 2nd, 🟡 indicates the 3rd.

📜 Requirements

The following table shows the requirements for running Voyager (batch size = 1) to generate videos:

| Model | Resolution | GPU Peak Memory | |:----------------:|:-----------:|:----------------:| | HunyuanWorld-Voyager | 540p | 60GB |

  • An NVIDIA GPU with CUDA support is required.
  • The model is tested on a single 80G GPU.
  • Minimum: The minimum GPU memory required is 60GB for 540p.
  • Recommended: We recommend using a GPU with 80GB of memory for better generation quality.
  • Tested operating system: Linux

🛠️ Dependencies and Installation

Begin by cloning the repository:

git clone https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager
cd HunyuanWorld-Voyager

Installation Guide for Linux

We recommend CUDA versions 12.4 or 11.8 for the manual installation.

# 1. Create conda environment
conda create -n voyager python==3.11.9

# 2. Activate the environment
conda activate voyager

# 3. Install PyTorch and other dependencies using conda
# For CUDA 12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

# 4. Install pip dependencies
python -m pip install -r requirements.txt
python -m pip install transformers==4.39.3

# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
python -m pip install flash-attn

# 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)
python -m pip install xfuser==0.4.2

In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions:

# Making sure you have installed CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00 (or simply using our CUDA 12 docker image).
pip install nvidia-cublas-cu12==12.4.5.8
export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/

To create your own input conditions, you also need to install the following dependencies:

pip install --no-deps git+https://github.com/microsoft/MoGe.git
pip install scipy==1.11.4
pip install git+https://github.com/EasternJournalist/utils3d.git@c5daf6f6c244d251f252102d09e9b7bcef791a38

🧱 Download Pretrained Models

A detailed guidance for downloading pretrained models is shown [here](ckpts/README.md). Briefly,

huggingface-cli download tencent/HunyuanWorld-Voyager --local-dir ./ckpts

🔑 Inference…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

New repo from Tencent, moderate traction.