Zijie Wu1,2, Lixin Xu2, Puhua Jiang2, Sicong Liu2, Chunchao Guo2, Xiang Bai1
1Huazhong University of Science and Technology (HUST), 2Tencent Hunyuan
We present R-DMesh: a unified video-guided 4D mesh generation framework that tackles the long-overlooked pose misalignment dilemma. Given a static mesh and a reference video with arbitrary initial poses, our method automatically rectifies the mesh to the video's starting state and generates high-fidelity, temporally consistent animations. Beyond video-driven animation, R-DMesh naturally supports a wide range of downstream applications, including pose retargeting, motion retargeting, and holistic 4D generation.
- May 13, 2026: π The checkpoint of R-DMesh has been released! Please give it a try!
- May 12, 2026: π The training & inference code of R-DMesh has been released! The checkpoint will be released in a few days.
- Mar 28, 2026: π R-DMesh has been accepted by SIGGRAPH2026! We will release the code asap. Please stay tuned for updatesοΌ
# Create conda environment
conda create -n rdmesh python=3.11
conda activate rdmesh
# Install torch
pip install torch==2.8.0 torchvision==0.23.0
# Install dependencies
pip install -r requirements.txtDownload the pretrained checkpoints from π€ HuggingFace and place them under ./ckpts/:
# Option 1: Use huggingface-cli
huggingface-cli download JarrentWu/R-DMesh --local-dir ./ckpts
# Option 2: Manually download and organizeYou also need to download Wan2.2-TI2V-5B for video conditioning:
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./ckpts/Wan2.2-TI2V-5BPlace your input meshes and reference videos under ./test_data/.
After the steps above, your project should look like this:
R-DMesh/
βββ test_data/
β βββ meshes/ # Input meshes (.glb or .fbx)
β β βββ your_mesh.glb
β β βββ your_mesh2.fbx
β βββ videos/ # Reference videos (.mp4)
β βββ your_video.mp4
βββ ckpts/
βββ dvae/ # VAE checkpoints
βββ rf_model/ # Rectified Flow (DiT) checkpoints
βββ dvae_factor/ # VAE normalization factors
βββ Wan2.2-TI2V-5B/ # Wan video model
python test_drive.py \
--mesh_list your_mesh.glb \
--video_list your_video.mp4 \
--rf_exp rdmeshdit --rf_epoch f \
--num_hops 5 --alpha_hops 0.7 \
--num_traj 4096 --guidance_scale 1.5 \
--exportπ‘ The command above assumes the default directory structure from the Preparation section.
If your files are placed elsewhere, specify the paths explicitly:
--data_dir /your/path/to/meshes \
--video_data_dir /your/path/to/videos \
--vae_dir /your/path/to/dvae \
--rf_model_dir /your/path/to/rf_model \
--json_dir /your/path/to/dvae_factor \
--wan_model_dir /your/path/to/Wan2.2-TI2V-5BAn example is as follows, run:
python test_drive.py \
--mesh_list warrok_w_kurniawan.fbx \
--video_list dance7.mp4 \
--rf_exp rdmeshdit --rf_epoch f \
--num_hops 5 --alpha_hops 0.7 \
--num_traj 4096 --guidance_scale 1.5 \
--exportThen, you will get the dynamic mesh fbx file and a frontal rendered video, the generated 4D asset should look like:
β οΈ Note on custom driving videos:
If you want to use your own video to drive the mesh, please first remove the background and replace it with pure black using tools such as SAM 3 (or other video matting / segmentation tools) before running inference. Videos with cluttered or non-black backgrounds may lead to degraded motion extraction and poor animation quality.
The complete training pipeline consists of the following 6 stages, which must be executed sequentially:
β Data Preparation β β‘ Train R-DMesh VAE β β’ Extract Video Latents β β£ Extract DMesh Latents β β€ Compute DMesh Feature Statistics β β₯ Train R-DMesh DiT
| Stage | Step | Script | Output |
|---|---|---|---|
| β | Data Preparation | data_construction/ |
Mesh / Video dataset |
| β‘ | Train R-DMesh VAE | train_dvae.py |
VAE checkpoints |
| β’ | Extract Video Latents | Wan2_2/save_vid_latents.py |
Video latents |
| β£ | Extract DMesh Latents | save_dmesh_latents.py |
DMesh latents |
| β€ | Compute DMesh Feature Statistics | test_vae_factor_misalign.py |
Mean / std JSON factors |
| β₯ | Train R-DMesh DiT | train_dit.py |
DiT checkpoints |
Please refer to the scripts and README in the data_construction folder to build your training / validation data. This part of the code will be released soon.
Train the R-DMesh VAE that compresses dynamic meshes into a latent space. To be noted, we adopt PLTA attention from AnimateAnyMesh++ for better performance.
torchrun --nproc_per_node=8 train_dvae.py \
--data_dir /path/to/training/data \
--val_data_dir /path/to/validation/data \
--ckpts_dir /path/to/checkpoints \
--log_dir ./logs/test \
--exp test \
--train_epoch 1000 --batch_size 32 --lr 1e-4 \
--enc_depth 8 --dim 256 --max_length 4096 \
--latent_dim 64 --latent_dim_x1 16 --num_t 64 \
--num_hops 4 --hop_mode band --n_layers 2 \
--sep_rec_loss --per_instance_loss \
--validate --is_training(Optional) After training, you can evaluate the reconstruction quality of the VAE using the Test R-DMesh VAE script in the Evaluation section.
Extract latent features from reference videos using the pretrained video model (Wan2.2-TI2V-5B). These latents will serve as the conditioning signal for DiT training.
torchrun --nproc_per_node=8 save_vid_latents.py \
--data_dir /path/to/mesh/data \
--video_data_dir /path/to/video/data \
--checkpoint_dir /path/to/pretrained/models \
--output_dir /path/to/output/latents \
--batch_size 1 \
--video_width 256 \
--samp_layer 10Encode meshes into latent variables using the VAE trained in Stage β‘. These latents will be used as the prediction target for DiT training.
torchrun --nproc_per_node=8 save_dmesh_latents.py \
--dataset_dir /path/to/dataset \
--ckpt_dir /path/to/checkpoints \
--output_dir /path/to/output/latents \
--exp your_experiment_name \
--epoch which_epoch \
--max_length 8192 --batch_size 16 \
--num_t 64 --num_hops 4 --num_traj 512Compute the per-channel mean and standard deviation of the DMesh latents produced by the VAE. These statistics are used to normalize (rescale) the latents so that their distribution is well-conditioned for DiT training. The resulting factors are saved as JSON files and later consumed by train_dit.py via the --json_dir argument.
torchrun --nproc_per_node=8 test_vae_factor_misalign.py \
--data_dir /path/to/training/data \
--vae_dir /path/to/dvae/checkpoints \
--vae_exp your_vae_experiment \
--vae_epoch which_epoch \
--max_length max_vertex_count --batch_size 16 \
--num_hops 4 --num_t 64Train the conditional Diffusion Transformer using the video latents from Stage β’, the DMesh latents from Stage β£, and the normalization factors from Stage β€.
torchrun --nproc_per_node=8 train_dit.py \
--data_dir /path/to/latent/data \
--latent_data_dir /path/to/video/latents \
--save_dir /path/to/checkpoints \
--log_dir ./logs/test \
--json_dir ./dvae_factors \
--dvae_dir /path/to/dvae/checkpoints \
--vae_exp your_vae_experiment \
--vae_epoch which_epoch \
--exp test \
--batch_size 16 --max_length max_vertex_count --train_epoch 500 --lr 1e-4 \
--mode vc --dit_layers 10 --cond_drop_prob 0.1 \
--rescaleEvaluate the reconstruction quality of a trained R-DMesh VAE.
python test_dvae.py \
--dataset_dir /path/to/test/data \
--ckpt_dir /path/to/checkpoints \
--exp your_experiment_name \
--epoch which_epoch \
--max_length 4096 \
--num_hops 4 --alpha_hops 0.5 \
--renderIf you find our work interesting or helpful for your research, please consider citing:
@inproceedings{wu2026rdmesh,
title={R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow},
author={Wu, Zijie and Xu, Lixin and Jiang, Puhua and Liu, Sicong and Guo, Chunchao and Bai, Xiang},
booktitle={Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
pages={1--12},
year={2026}
}Please also consider citing AnimateAnyMesh and AnimateAnyMesh++, which inspired this work and provided techniques adopted in R-DMesh.
@inproceedings{wu2025animateanymesh,
title={Animateanymesh: A feed-forward 4d foundation model for text-driven universal mesh animation},
author={Wu, Zijie and Yu, Chaohui and Wang, Fan and Bai, Xiang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={13557--13568},
year={2025}
}
@article{wu2026animateanymesh++,
title={AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation},
author={Wu, Zijie and Yu, Chaohui and Wang, Fan and Bai, Xiang},
journal={arXiv preprint arXiv:2604.26917},
year={2026}
}Our code references some great repos, which are AnimateAnyMesh, AnimateAnyMesh++ and Wan2_2. We thank the authors for their excellent works!


