Tencent-Hunyuan/HunyuanWorld-Voyager
Python
Captured source
source ↗Tencent-Hunyuan/HunyuanWorld-Voyager
Description: Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.
Language: Python
License: NOASSERTION
Stars: 1566
Forks: 165
Open issues: 29
Created: 2025-08-27T09:34:10Z
Pushed: 2026-04-15T17:30:08Z
Default branch: main
Fork: no
Archived: no
README: [中文阅读](README_zh.md)
HunyuanWorld-Voyager
-----
We introduce HunyuanWorld-Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Voyager can generate 3D-consistent scene videos for world exploration following custom camera trajectories. It can also generate aligned depth and RGB video for efficient and direct 3D reconstruction.
🔥🔥🔥 News!!
- April 16, 2026: 🤗 We release HY-World-2.0, state-of-the-art 3D world model!
- December 18, 2025: 👋 We release HunyuanWorld-1.5 (WorldPlay), enabling real-time world creation and play!
- October 22, 2025: 👋 We release HunyuanWorld-1.1 (WorldMirror), supporting 3D world creation from videos or multi-view images!
- October 16, 2025: 👋 We recently propose FlashWorld, enabling 3DGS world generation in 5~10 seconds on a single GPU!
- Sep 2, 2025: 👋 We release the code and model weights of HunyuanWorld-Voyager. [Download](ckpts/README.md).
> Join our [Wechat](#) and [Discord](https://discord.gg/dNBrdrGGMa) group to discuss and find help from us.
| Wechat Group | Xiaohongshu | X | Discord | |--------------------------------------------------|-------------------------------------------------------|---------------------------------------------|---------------------------------------------------| | | | | |
🎥 Demo
Demo Video
Camera-Controllable Video Generation
| Input | Generated Video | |:----------------:|:----------------:| | | | | | | | | |
Multiple Applications
- Video Reconstruction
| Generated Video | Reconstructed Point Cloud | |:---------------:|:--------------------------------:| | | |
- Image-to-3D Generation
| | | |:---------------:|:---------------:| | | |
- Video Depth Estimation
| | | |:---------------:|:---------------:| | | |
☯️ HunyuanWorld-Voyager Introduction
Architecture
Voyager consists of two key components:
(1) World-Consistent Video Diffusion: A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence.
(2) Long-Range World Exploration: An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency.
To train Voyager, we propose a scalable data engine, i.e., a video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Using this pipeline, we compile a dataset of over 100,000 video clips, combining real-world captures and synthetic Unreal Engine renders.
Performance
Method WorldScore Average Camera Control Object Control Content Alignment 3D Consistency Photometric Consistency Style Consistency Subjective Quality
WonderJourney 🟡63.75 🟡84.6 37.1 35.54 80.6 79.03 62.82 🟢66.56
WonderWorld 🟢72.69 🔴92.98 51.76 🔴71.25 🔴86.87 85.56 70.57 49.81
EasyAnimate 52.85 26.72 54.5 50.76 67.29 47.35 🟡73.05 50.31
Allegro 55.31 24.84 🟡57.47 🟡51.48 70.5 69.89 65.6 47.41
Gen-3 60.71 29.47 🟢62.92 50.49 68.31 🟢87.09 62.82 🟡63.85
CogVideoX-I2V 62.15 38.27 40.07 36.73 🟢86.21 🔴88.12 🟢83.22 62.44
Voyager 🔴77.62 🟢85.95 🔴66.92 🟢68.92 🟡81.56 🟡85.99 🔴84.89 🔴71.09
Quantitative comparison on WorldScore Benchmark. 🔴 indicates the 1st, 🟢 indicates the 2nd, 🟡 indicates the 3rd.
📜 Requirements
The following table shows the requirements for running Voyager (batch size = 1) to generate videos:
| Model | Resolution | GPU Peak Memory | |:----------------:|:-----------:|:----------------:| | HunyuanWorld-Voyager | 540p | 60GB |
- An NVIDIA GPU with CUDA support is required.
- The model is tested on a single 80G GPU.
- Minimum: The minimum GPU memory required is 60GB for 540p.
- Recommended: We recommend using a GPU with 80GB of memory for better generation quality.
- Tested operating system: Linux
🛠️ Dependencies and Installation
Begin by cloning the repository:
git clone https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager cd HunyuanWorld-Voyager
Installation Guide for Linux
We recommend CUDA versions 12.4 or 11.8 for the manual installation.
# 1. Create conda environment conda create -n voyager python==3.11.9 # 2. Activate the environment conda activate voyager # 3. Install PyTorch and other dependencies using conda # For CUDA 12.4 conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia # 4. Install pip dependencies python -m pip install -r requirements.txt python -m pip install transformers==4.39.3 # 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above) python -m pip install flash-attn # 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3) python -m pip install xfuser==0.4.2
In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions:
# Making sure you have installed CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00 (or simply using our CUDA 12 docker image). pip install nvidia-cublas-cu12==12.4.5.8 export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/
To create your own input conditions, you also need to install the following dependencies:
pip install --no-deps git+https://github.com/microsoft/MoGe.git pip install scipy==1.11.4 pip install git+https://github.com/EasternJournalist/utils3d.git@c5daf6f6c244d251f252102d09e9b7bcef791a38
🧱 Download Pretrained Models
A detailed guidance for downloading pretrained models is shown [here](ckpts/README.md). Briefly,
huggingface-cli download tencent/HunyuanWorld-Voyager --local-dir ./ckpts
🔑 Inference…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10New repo from Tencent, moderate traction.