RepoTencent HunyuanTencent Hunyuanpublished Aug 13, 2025seen 5d

Tencent-Hunyuan/Hunyuan-GameCraft-1.0

Python

Open original ↗

Captured source

source ↗

Tencent-Hunyuan/Hunyuan-GameCraft-1.0

Description: Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Language: Python

License: NOASSERTION

Stars: 723

Forks: 75

Open issues: 18

Created: 2025-08-13T07:56:51Z

Pushed: 2025-11-28T07:05:10Z

Default branch: main

Fork: no

Archived: no

README:

Hunyuan-GameCraft 🎮

![image](asset/teaser.png)

> **Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition**

🔥🔥🔥 News!!

  • Aug 21, 2025: 📣 We release the code of the [Gradio demo](scripts/launch_app_sp.sh). Welcome and have a try!! 🎮
  • Aug 14, 2025: 👋 We release the inference code and model weights of Hunyuan-GameCraft. [Download](weights/README.md).

📑 Open-source Plan

  • Hunyuan-GameCraft
  • [x] Inference
  • [x] Checkpoints
  • [x] [Gradio](#️-gradio-launching)
  • [ ] HuggingFace Demo

Contents

  • [Hunyuan-GameCraft 🎮](#hunyuan-gamecraft-)
  • [🔥🔥🔥 News!!](#-news)
  • [📑 Open-source Plan](#-open-source-plan)
  • [Contents](#contents)
  • [Abstract](#abstract)
  • [Overall Architecture](#overall-architecture)
  • [📜 Requirements](#-requirements)
  • [🛠️ Dependencies and Installation](#️-dependencies-and-installation)
  • [Installation Guide for Linux](#installation-guide-for-linux)
  • [🧱 Download Pretrained Models](#-download-pretrained-models)
  • [🚀 Parallel Inference on Multiple GPUs](#-parallel-inference-on-multiple-gpus)
  • [🔑 Single-gpu with Low-VRAM Inference](#-single-gpu-with-low-vram-inference)
  • [🖥️ Gradio Launching](#️-gradio-launching)
  • [🔗 BibTeX](#-bibtex)
  • [Acknowledgements](#acknowledgements)

---

Abstract

Recent advances in diffusion-based and controllable video generation have enabled high-quality and temporally coherent video synthesis, laying the groundwork for immersive interactive gaming experiences. However, current methods face limitations in dynamics, physically realistic, long-term consistency, and efficiency, which limit the ability to create various gameplay videos. To address these gaps, we introduce Hunyuan-GameCraft, a novel framework for high-dynamic interactive video generation in game environments. To achieve fine-grained action control, we unify standard keyboard and mouse inputs into a shared camera representation space, facilitating smooth interpolation between various camera and movement operations. Then we propose a hybrid history-conditioned training strategy that extends video sequences autoregressively while preserving game scene information. Additionally, to enhance inference efficiency and playability, we achieve model distillation to reduce computational overhead while maintaining consistency across long temporal sequences, making it suitable for real-time deployment in complex interactive environments. The model is trained on a large-scale dataset comprising over one million gameplay recordings across over 100 AAA games, ensuring broad coverage and diversity, then fine-tuned on a carefully annotated synthetic dataset to enhance precision and control. The curated game scene data significantly improves the visual fidelity, realism and action controllability. Extensive experiments demonstrate that Hunyuan-GameCraft significantly outperforms existing models, advancing the realism and playability of interactive game video generation.

Overall Architecture

![image](asset/method.png)

Given a reference image and the corresponding prompt, the keyboard or mouse signal, we transform these options to the continuous camera space. Then we design a light-weight action encoder to encode the input camera trajectory. The action and image features are added after patchify. For long video extension, we design a variable mask indicator, where 1 and 0 indicate history frames and predicted frames, respectively.

📜 Requirements

  • An NVIDIA GPU with CUDA support is required.
  • The model is tested on a machine with 8*H20/H800GPUs.
  • Minimum: The minimum GPU memory required is 24GB but very slow.
  • Recommended: We recommend using a GPU with 80GB of memory for better generation quality.
  • Tested operating system: Linux

🛠️ Dependencies and Installation

Begin by cloning the repository:

git clone https://github.com/Tencent-Hunyuan/Hunyuan-GameCraft-1.0.git
cd Hunyuan-GameCraft-1.0

Installation Guide for Linux

We recommend CUDA versions 12.4 for the manual installation.

Conda's installation instructions are available here.

# 1. Create conda environment
conda create -n HYGameCraft python==3.10

# 2. Activate the environment
conda activate HYGameCraft

# 3. Install PyTorch and other dependencies using conda
conda install pytorch==2.5.1 torchvision==0.20.0 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

# 4. Install pip dependencies
python -m pip install -r requirements.txt
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3

Additionally, you can also use HunyuanVideo Docker image. Use the following command to pull and run the docker image.

# For CUDA 12.4 (updated to avoid float point exception)
docker pull hunyuanvideo/hunyuanvideo:cuda_12
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12
pip install diffusers==0.34.0 transformers==4.54.1

🧱 Download Pretrained Models

The details of download pretrained models are shown [here](weights/README.md).

🚀 Parallel Inference on Multiple GPUs

For example, to generate a video using 8 GPUs, you can use the following command, where --action-list w s d a simulate keyboard manipulation signals to help you generate a video of the corresponding content. --action-speed-list 0.2 0.2 0.2 0.2 represents the displacement distance and can be replaced with any value between 0 and 3.

You can try any combination and any length of the action list (one action per 33 frames, 25FPS) to generate a long video, and make sure the length of --action-speed-list must be the same as --action-list. It should be noticed that the inference time is linearly related to the action length:

#!/bin/bash…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

New game model from Tencent, moderate traction.