Tencent-Hunyuan/R-DMesh
Python
Captured source
source ↗Tencent-Hunyuan/R-DMesh
Description: [SIGGRAPH2026] Official code for SIGGRAPH2026 paper: R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow
Language: Python
Stars: 37
Forks: 5
Open issues: 0
Created: 2026-04-29T07:36:06Z
Pushed: 2026-05-14T07:36:46Z
Default branch: main
Fork: no
Archived: no
README:
📖 Overview
We present R-DMesh: a unified video-guided 4D mesh generation framework that tackles the long-overlooked pose misalignment dilemma. Given a static mesh and a reference video with arbitrary initial poses, our method automatically rectifies the mesh to the video's starting state and generates high-fidelity, temporally consistent animations. Beyond video-driven animation, R-DMesh naturally supports a wide range of downstream applications, including pose retargeting, motion retargeting, and holistic 4D generation.
🔥 Latest News
- May 13, 2026: 👋 The checkpoint of R-DMesh has been released! Please give it a try!
- May 12, 2026: 👋 The training & inference code of R-DMesh has been released! The checkpoint will be released in a few days.
- Mar 28, 2026: 👋 R-DMesh has been accepted by SIGGRAPH2026! We will release the code asap. Please stay tuned for updates!
🔧 Preparation
1. Environment Setup
# Create conda environment conda create -n rdmesh python=3.11 conda activate rdmesh # Install torch pip install torch==2.8.0 torchvision==0.23.0 # Install dependencies pip install -r requirements.txt
2. Download Pretrained Models
Download the pretrained checkpoints from 🤗 HuggingFace and place them under ./ckpts/:
# Option 1: Use huggingface-cli huggingface-cli download JarrentWu/R-DMesh --local-dir ./ckpts # Option 2: Manually download and organize
You also need to download Wan2.2-TI2V-5B for video conditioning:
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./ckpts/Wan2.2-TI2V-5B
3. Prepare Test Data
Place your input meshes and reference videos under ./test_data/.
📂 Expected Directory Structure
After the steps above, your project should look like this:
R-DMesh/ ├── test_data/ │ ├── meshes/ # Input meshes (.glb or .fbx) │ │ └── your_mesh.glb │ │ └── your_mesh2.fbx │ └── videos/ # Reference videos (.mp4) │ └── your_video.mp4 └── ckpts/ ├── dvae/ # VAE checkpoints ├── rf_model/ # Rectified Flow (DiT) checkpoints ├── dvae_factor/ # VAE normalization factors └── Wan2.2-TI2V-5B/ # Wan video model
📖 Inference
🎬 Animate a Mesh with Reference Video
python test_drive.py \ --mesh_list your_mesh.glb \ --video_list your_video.mp4 \ --rf_exp rdmeshdit --rf_epoch f \ --num_hops 5 --alpha_hops 0.7 \ --num_traj 4096 --guidance_scale 1.5 \ --export
> 💡 The command above assumes the [default directory structure](#-expected-directory-structure) from the Preparation section. > If your files are placed elsewhere, specify the paths explicitly:
--data_dir /your/path/to/meshes \ --video_data_dir /your/path/to/videos \ --vae_dir /your/path/to/dvae \ --rf_model_dir /your/path/to/rf_model \ --json_dir /your/path/to/dvae_factor \ --wan_model_dir /your/path/to/Wan2.2-TI2V-5B
An example is as follows, run:
python test_drive.py \ --mesh_list warrok_w_kurniawan.fbx \ --video_list dance7.mp4 \ --rf_exp rdmeshdit --rf_epoch f \ --num_hops 5 --alpha_hops 0.7 \ --num_traj 4096 --guidance_scale 1.5 \ --export
Then, you will get the dynamic mesh fbx file and a frontal rendered video, the generated 4D asset should look like:
> ⚠️ Note on custom driving videos: > If you want to use your own video to drive the mesh, please first remove the background and replace it with pure black using tools such as SAM 3 (or other video matting / segmentation tools) before running inference. Videos with cluttered or non-black backgrounds may lead to degraded motion extraction and poor animation quality.
🏋️♂️ Training
The complete training pipeline consists of the following 6 stages, which must be executed sequentially:
① Data Preparation → ② Train R-DMesh VAE → ③ Extract Video Latents → ④ Extract DMesh Latents → ⑤ Compute DMesh Feature Statistics → ⑥ Train R-DMesh DiT
| Stage | Step | Script | Output | | :---: | :--- | :--- | :--- | | ① | Data Preparation | data_construction/ | Mesh / Video dataset | | ② | Train R-DMesh VAE | train_dvae.py | VAE checkpoints | | ③ | Extract Video Latents | Wan2_2/save_vid_latents.py | Video latents | | ④ | Extract DMesh Latents | save_dmesh_latents.py | DMesh latents | | ⑤ | Compute DMesh Feature Statistics | test_vae_factor_misalign.py | Mean / std JSON factors | | ⑥ | Train R-DMesh DiT | train_dit.py | DiT checkpoints |
---
① Data Preparation
Please refer to the scripts and README in the data_construction folder to build your training / validation data. This part of the code will be released soon.
---
② Train R-DMesh VAE
Train the R-DMesh VAE that compresses dynamic meshes into a latent space. To be noted, we adopt PLTA attention from AnimateAnyMesh++ for better performance.
torchrun --nproc_per_node=8 train_dvae.py \ --data_dir /path/to/training/data \ --val_data_dir /path/to/validation/data \ --ckpts_dir /path/to/checkpoints \ --log_dir ./logs/test \ --exp test \ --train_epoch 1000 --batch_size 32 --lr 1e-4 \ --enc_depth 8 --dim 256 --max_length 4096 \ --latent_dim 64 --latent_dim_x1 16 --num_t 64 \ --num_hops 4 --hop_mode band --n_layers 2 \ --sep_rec_loss --per_instance_loss \ --validate --is_training
(Optional) After training, you can evaluate the reconstruction quality of the VAE using the [Test R-DMesh VAE](#test-r-dmesh-vae) script in the Evaluation section.
---
③ Extract Video Latents
Extract latent features from reference videos using the pretrained video model (Wan2.2-TI2V-5B). These latents will serve as the conditioning signal for DiT training.
torchrun --nproc_per_node=8 save_vid_latents.py \ --data_dir /path/to/mesh/data \ --video_data_dir…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low stars, routine repo fork