Skip to content

Tencent-Hunyuan/R-DMesh

Repository files navigation

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow (SIGGRAPH 2026)

Zijie Wu1,2, Lixin Xu2, Puhua Jiang2, Sicong Liu2, Chunchao Guo2, Xiang Bai1
1Huazhong University of Science and Technology (HUST), 2Tencent Hunyuan

Project Paper PDF Video Hugging Face Weights Download from Google Drive

Demo GIF

πŸ“– Overview

We present R-DMesh: a unified video-guided 4D mesh generation framework that tackles the long-overlooked pose misalignment dilemma. Given a static mesh and a reference video with arbitrary initial poses, our method automatically rectifies the mesh to the video's starting state and generates high-fidelity, temporally consistent animations. Beyond video-driven animation, R-DMesh naturally supports a wide range of downstream applications, including pose retargeting, motion retargeting, and holistic 4D generation.

πŸ”₯ Latest News

  • May 13, 2026: πŸ‘‹ The checkpoint of R-DMesh has been released! Please give it a try!
  • May 12, 2026: πŸ‘‹ The training & inference code of R-DMesh has been released! The checkpoint will be released in a few days.
  • Mar 28, 2026: πŸ‘‹ R-DMesh has been accepted by SIGGRAPH2026! We will release the code asap. Please stay tuned for updates!

πŸ”§ Preparation

1. Environment Setup

# Create conda environment
conda create -n rdmesh python=3.11
conda activate rdmesh

# Install torch
pip install torch==2.8.0 torchvision==0.23.0

# Install dependencies
pip install -r requirements.txt

2. Download Pretrained Models

Download the pretrained checkpoints from πŸ€— HuggingFace and place them under ./ckpts/:

# Option 1: Use huggingface-cli
huggingface-cli download JarrentWu/R-DMesh --local-dir ./ckpts

# Option 2: Manually download and organize

You also need to download Wan2.2-TI2V-5B for video conditioning:

huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./ckpts/Wan2.2-TI2V-5B

3. Prepare Test Data

Place your input meshes and reference videos under ./test_data/.

πŸ“‚ Expected Directory Structure

After the steps above, your project should look like this:

R-DMesh/
β”œβ”€β”€ test_data/
β”‚   β”œβ”€β”€ meshes/                  # Input meshes (.glb or .fbx)
β”‚   β”‚   └── your_mesh.glb
β”‚   β”‚   └── your_mesh2.fbx   
β”‚   └── videos/                  # Reference videos (.mp4)
β”‚       └── your_video.mp4
└── ckpts/
    β”œβ”€β”€ dvae/                    # VAE checkpoints
    β”œβ”€β”€ rf_model/                # Rectified Flow (DiT) checkpoints
    β”œβ”€β”€ dvae_factor/             # VAE normalization factors
    └── Wan2.2-TI2V-5B/          # Wan video model

πŸ“– Inference

🎬 Animate a Mesh with Reference Video

python test_drive.py \
    --mesh_list your_mesh.glb \
    --video_list your_video.mp4 \
    --rf_exp rdmeshdit --rf_epoch f \
    --num_hops 5 --alpha_hops 0.7 \
    --num_traj 4096 --guidance_scale 1.5 \
    --export

πŸ’‘ The command above assumes the default directory structure from the Preparation section.
If your files are placed elsewhere, specify the paths explicitly:

    --data_dir /your/path/to/meshes \
    --video_data_dir /your/path/to/videos \
    --vae_dir /your/path/to/dvae \
    --rf_model_dir /your/path/to/rf_model \
    --json_dir /your/path/to/dvae_factor \
    --wan_model_dir /your/path/to/Wan2.2-TI2V-5B

An example is as follows, run:

python test_drive.py \
    --mesh_list warrok_w_kurniawan.fbx \
    --video_list dance7.mp4 \
    --rf_exp rdmeshdit --rf_epoch f \
    --num_hops 5 --alpha_hops 0.7 \
    --num_traj 4096 --guidance_scale 1.5 \
    --export

Then, you will get the dynamic mesh fbx file and a frontal rendered video, the generated 4D asset should look like:

Demo GIF

⚠️ Note on custom driving videos:
If you want to use your own video to drive the mesh, please first remove the background and replace it with pure black using tools such as SAM 3 (or other video matting / segmentation tools) before running inference. Videos with cluttered or non-black backgrounds may lead to degraded motion extraction and poor animation quality.

πŸ‹οΈβ€β™‚οΈ Training

The complete training pipeline consists of the following 6 stages, which must be executed sequentially:

β‘  Data Preparation  β†’  β‘‘ Train R-DMesh VAE  β†’  β‘’ Extract Video Latents  β†’  β‘£ Extract DMesh Latents  β†’  β‘€ Compute DMesh Feature Statistics  β†’  β‘₯ Train R-DMesh DiT
Stage Step Script Output
β‘  Data Preparation data_construction/ Mesh / Video dataset
β‘‘ Train R-DMesh VAE train_dvae.py VAE checkpoints
β‘’ Extract Video Latents Wan2_2/save_vid_latents.py Video latents
β‘£ Extract DMesh Latents save_dmesh_latents.py DMesh latents
β‘€ Compute DMesh Feature Statistics test_vae_factor_misalign.py Mean / std JSON factors
β‘₯ Train R-DMesh DiT train_dit.py DiT checkpoints

β‘  Data Preparation

Please refer to the scripts and README in the data_construction folder to build your training / validation data. This part of the code will be released soon.


β‘‘ Train R-DMesh VAE

Train the R-DMesh VAE that compresses dynamic meshes into a latent space. To be noted, we adopt PLTA attention from AnimateAnyMesh++ for better performance.

torchrun --nproc_per_node=8 train_dvae.py \
    --data_dir /path/to/training/data \
    --val_data_dir /path/to/validation/data \
    --ckpts_dir /path/to/checkpoints \
    --log_dir ./logs/test \
    --exp test \
    --train_epoch 1000 --batch_size 32 --lr 1e-4 \
    --enc_depth 8 --dim 256 --max_length 4096 \
    --latent_dim 64 --latent_dim_x1 16 --num_t 64 \
    --num_hops 4 --hop_mode band --n_layers 2 \
    --sep_rec_loss --per_instance_loss \
    --validate --is_training

(Optional) After training, you can evaluate the reconstruction quality of the VAE using the Test R-DMesh VAE script in the Evaluation section.


β‘’ Extract Video Latents

Extract latent features from reference videos using the pretrained video model (Wan2.2-TI2V-5B). These latents will serve as the conditioning signal for DiT training.

torchrun --nproc_per_node=8 save_vid_latents.py \
    --data_dir /path/to/mesh/data \
    --video_data_dir /path/to/video/data \
    --checkpoint_dir /path/to/pretrained/models \
    --output_dir /path/to/output/latents \
    --batch_size 1 \
    --video_width 256 \
    --samp_layer 10

β‘£ Extract DMesh Latents

Encode meshes into latent variables using the VAE trained in Stage β‘‘. These latents will be used as the prediction target for DiT training.

torchrun --nproc_per_node=8 save_dmesh_latents.py \
    --dataset_dir /path/to/dataset \
    --ckpt_dir /path/to/checkpoints \
    --output_dir /path/to/output/latents \
    --exp your_experiment_name \
    --epoch which_epoch \
    --max_length 8192 --batch_size 16 \
    --num_t 64 --num_hops 4 --num_traj 512

β‘€ Compute DMesh Feature Statistics

Compute the per-channel mean and standard deviation of the DMesh latents produced by the VAE. These statistics are used to normalize (rescale) the latents so that their distribution is well-conditioned for DiT training. The resulting factors are saved as JSON files and later consumed by train_dit.py via the --json_dir argument.

torchrun --nproc_per_node=8 test_vae_factor_misalign.py \
    --data_dir /path/to/training/data \
    --vae_dir /path/to/dvae/checkpoints \
    --vae_exp your_vae_experiment \
    --vae_epoch which_epoch \
    --max_length max_vertex_count --batch_size 16 \
    --num_hops 4 --num_t 64

β‘₯ Train R-DMesh DiT

Train the conditional Diffusion Transformer using the video latents from Stage β‘’, the DMesh latents from Stage β‘£, and the normalization factors from Stage β‘€.

torchrun --nproc_per_node=8 train_dit.py \
    --data_dir /path/to/latent/data \
    --latent_data_dir /path/to/video/latents \
    --save_dir /path/to/checkpoints \
    --log_dir ./logs/test \
    --json_dir ./dvae_factors \
    --dvae_dir /path/to/dvae/checkpoints \
    --vae_exp your_vae_experiment \
    --vae_epoch which_epoch \
    --exp test \
    --batch_size 16 --max_length max_vertex_count --train_epoch 500 --lr 1e-4 \
    --mode vc --dit_layers 10 --cond_drop_prob 0.1 \
    --rescale

πŸ” Evaluation

Test R-DMesh VAE

Evaluate the reconstruction quality of a trained R-DMesh VAE.

python test_dvae.py \
    --dataset_dir /path/to/test/data \
    --ckpt_dir /path/to/checkpoints \
    --exp your_experiment_name \
    --epoch which_epoch \
    --max_length 4096 \
    --num_hops 4 --alpha_hops 0.5 \
    --render

πŸ“š Citation

If you find our work interesting or helpful for your research, please consider citing:

@inproceedings{wu2026rdmesh,
  title={R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow},
  author={Wu, Zijie and Xu, Lixin and Jiang, Puhua and Liu, Sicong and Guo, Chunchao and Bai, Xiang},
  booktitle={Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
  pages={1--12},
  year={2026}
}

Please also consider citing AnimateAnyMesh and AnimateAnyMesh++, which inspired this work and provided techniques adopted in R-DMesh.

@inproceedings{wu2025animateanymesh,
  title={Animateanymesh: A feed-forward 4d foundation model for text-driven universal mesh animation},
  author={Wu, Zijie and Yu, Chaohui and Wang, Fan and Bai, Xiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={13557--13568},
  year={2025}
}
@article{wu2026animateanymesh++,
  title={AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation},
  author={Wu, Zijie and Yu, Chaohui and Wang, Fan and Bai, Xiang},
  journal={arXiv preprint arXiv:2604.26917},
  year={2026}
}

πŸ™ Acknowledgments

Our code references some great repos, which are AnimateAnyMesh, AnimateAnyMesh++ and Wan2_2. We thank the authors for their excellent works!

About

[SIGGRAPH2026] Official code for SIGGRAPH2026 paper: R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages