R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow (SIGGRAPH 2026)

Zijie Wu^1,2, Lixin Xu², Puhua Jiang², Sicong Liu², Chunchao Guo², Xiang Bai¹
¹Huazhong University of Science and Technology (HUST), ²Tencent Hunyuan

📖 Overview

We present R-DMesh: a unified video-guided 4D mesh generation framework that tackles the long-overlooked pose misalignment dilemma. Given a static mesh and a reference video with arbitrary initial poses, our method automatically rectifies the mesh to the video's starting state and generates high-fidelity, temporally consistent animations. Beyond video-driven animation, R-DMesh naturally supports a wide range of downstream applications, including pose retargeting, motion retargeting, and holistic 4D generation.

🔥 Latest News

May 13, 2026: 👋 The checkpoint of R-DMesh has been released! Please give it a try!
May 12, 2026: 👋 The training & inference code of R-DMesh has been released! The checkpoint will be released in a few days.
Mar 28, 2026: 👋 R-DMesh has been accepted by SIGGRAPH2026! We will release the code asap. Please stay tuned for updates！

🔧 Preparation

1. Environment Setup

# Create conda environment
conda create -n rdmesh python=3.11
conda activate rdmesh

# Install torch
pip install torch==2.8.0 torchvision==0.23.0

# Install dependencies
pip install -r requirements.txt

2. Download Pretrained Models

Download the pretrained checkpoints from 🤗 HuggingFace and place them under ./ckpts/:

# Option 1: Use huggingface-cli
huggingface-cli download JarrentWu/R-DMesh --local-dir ./ckpts

# Option 2: Manually download and organize

You also need to download Wan2.2-TI2V-5B for video conditioning:

huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./ckpts/Wan2.2-TI2V-5B

3. Prepare Test Data

Place your input meshes and reference videos under ./test_data/.

📂 Expected Directory Structure

After the steps above, your project should look like this:

R-DMesh/
├── test_data/
│   ├── meshes/                  # Input meshes (.glb or .fbx)
│   │   └── your_mesh.glb
│   │   └── your_mesh2.fbx   
│   └── videos/                  # Reference videos (.mp4)
│       └── your_video.mp4
└── ckpts/
    ├── dvae/                    # VAE checkpoints
    ├── rf_model/                # Rectified Flow (DiT) checkpoints
    ├── dvae_factor/             # VAE normalization factors
    └── Wan2.2-TI2V-5B/          # Wan video model

📖 Inference

🎬 Animate a Mesh with Reference Video

python test_drive.py \
    --mesh_list your_mesh.glb \
    --video_list your_video.mp4 \
    --rf_exp rdmeshdit --rf_epoch f \
    --num_hops 5 --alpha_hops 0.7 \
    --num_traj 4096 --guidance_scale 1.5 \
    --export

💡 The command above assumes the default directory structure from the Preparation section.
If your files are placed elsewhere, specify the paths explicitly:

    --data_dir /your/path/to/meshes \
    --video_data_dir /your/path/to/videos \
    --vae_dir /your/path/to/dvae \
    --rf_model_dir /your/path/to/rf_model \
    --json_dir /your/path/to/dvae_factor \
    --wan_model_dir /your/path/to/Wan2.2-TI2V-5B

An example is as follows, run:

python test_drive.py \
    --mesh_list warrok_w_kurniawan.fbx \
    --video_list dance7.mp4 \
    --rf_exp rdmeshdit --rf_epoch f \
    --num_hops 5 --alpha_hops 0.7 \
    --num_traj 4096 --guidance_scale 1.5 \
    --export

Then, you will get the dynamic mesh fbx file and a frontal rendered video, the generated 4D asset should look like:

⚠️ Note on custom driving videos:
If you want to use your own video to drive the mesh, please first remove the background and replace it with pure black using tools such as SAM 3 (or other video matting / segmentation tools) before running inference. Videos with cluttered or non-black backgrounds may lead to degraded motion extraction and poor animation quality.

🏋️‍♂️ Training

The complete training pipeline consists of the following 6 stages, which must be executed sequentially:

① Data Preparation  →  ② Train R-DMesh VAE  →  ③ Extract Video Latents  →  ④ Extract DMesh Latents  →  ⑤ Compute DMesh Feature Statistics  →  ⑥ Train R-DMesh DiT

Stage	Step	Script	Output
①	Data Preparation	`data_construction/`	Mesh / Video dataset
②	Train R-DMesh VAE	`train_dvae.py`	VAE checkpoints
③	Extract Video Latents	`Wan2_2/save_vid_latents.py`	Video latents
④	Extract DMesh Latents	`save_dmesh_latents.py`	DMesh latents
⑤	Compute DMesh Feature Statistics	`test_vae_factor_misalign.py`	Mean / std JSON factors
⑥	Train R-DMesh DiT	`train_dit.py`	DiT checkpoints

① Data Preparation

Please refer to the scripts and README in the data_construction folder to build your training / validation data. This part of the code will be released soon.

② Train R-DMesh VAE

Train the R-DMesh VAE that compresses dynamic meshes into a latent space. To be noted, we adopt PLTA attention from AnimateAnyMesh++ for better performance.

torchrun --nproc_per_node=8 train_dvae.py \
    --data_dir /path/to/training/data \
    --val_data_dir /path/to/validation/data \
    --ckpts_dir /path/to/checkpoints \
    --log_dir ./logs/test \
    --exp test \
    --train_epoch 1000 --batch_size 32 --lr 1e-4 \
    --enc_depth 8 --dim 256 --max_length 4096 \
    --latent_dim 64 --latent_dim_x1 16 --num_t 64 \
    --num_hops 4 --hop_mode band --n_layers 2 \
    --sep_rec_loss --per_instance_loss \
    --validate --is_training

(Optional) After training, you can evaluate the reconstruction quality of the VAE using the Test R-DMesh VAE script in the Evaluation section.

③ Extract Video Latents

Extract latent features from reference videos using the pretrained video model (Wan2.2-TI2V-5B). These latents will serve as the conditioning signal for DiT training.

torchrun --nproc_per_node=8 save_vid_latents.py \
    --data_dir /path/to/mesh/data \
    --video_data_dir /path/to/video/data \
    --checkpoint_dir /path/to/pretrained/models \
    --output_dir /path/to/output/latents \
    --batch_size 1 \
    --video_width 256 \
    --samp_layer 10

④ Extract DMesh Latents

Encode meshes into latent variables using the VAE trained in Stage ②. These latents will be used as the prediction target for DiT training.

torchrun --nproc_per_node=8 save_dmesh_latents.py \
    --dataset_dir /path/to/dataset \
    --ckpt_dir /path/to/checkpoints \
    --output_dir /path/to/output/latents \
    --exp your_experiment_name \
    --epoch which_epoch \
    --max_length 8192 --batch_size 16 \
    --num_t 64 --num_hops 4 --num_traj 512

⑤ Compute DMesh Feature Statistics

Compute the per-channel mean and standard deviation of the DMesh latents produced by the VAE. These statistics are used to normalize (rescale) the latents so that their distribution is well-conditioned for DiT training. The resulting factors are saved as JSON files and later consumed by train_dit.py via the --json_dir argument.

torchrun --nproc_per_node=8 test_vae_factor_misalign.py \
    --data_dir /path/to/training/data \
    --vae_dir /path/to/dvae/checkpoints \
    --vae_exp your_vae_experiment \
    --vae_epoch which_epoch \
    --max_length max_vertex_count --batch_size 16 \
    --num_hops 4 --num_t 64

⑥ Train R-DMesh DiT

Train the conditional Diffusion Transformer using the video latents from Stage ③, the DMesh latents from Stage ④, and the normalization factors from Stage ⑤.

torchrun --nproc_per_node=8 train_dit.py \
    --data_dir /path/to/latent/data \
    --latent_data_dir /path/to/video/latents \
    --save_dir /path/to/checkpoints \
    --log_dir ./logs/test \
    --json_dir ./dvae_factors \
    --dvae_dir /path/to/dvae/checkpoints \
    --vae_exp your_vae_experiment \
    --vae_epoch which_epoch \
    --exp test \
    --batch_size 16 --max_length max_vertex_count --train_epoch 500 --lr 1e-4 \
    --mode vc --dit_layers 10 --cond_drop_prob 0.1 \
    --rescale

🔍 Evaluation

Test R-DMesh VAE

Evaluate the reconstruction quality of a trained R-DMesh VAE.

python test_dvae.py \
    --dataset_dir /path/to/test/data \
    --ckpt_dir /path/to/checkpoints \
    --exp your_experiment_name \
    --epoch which_epoch \
    --max_length 4096 \
    --num_hops 4 --alpha_hops 0.5 \
    --render

📚 Citation

If you find our work interesting or helpful for your research, please consider citing:

@inproceedings{wu2026rdmesh,
  title={R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow},
  author={Wu, Zijie and Xu, Lixin and Jiang, Puhua and Liu, Sicong and Guo, Chunchao and Bai, Xiang},
  booktitle={Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
  pages={1--12},
  year={2026}
}

Please also consider citing AnimateAnyMesh and AnimateAnyMesh++, which inspired this work and provided techniques adopted in R-DMesh.

@inproceedings{wu2025animateanymesh,
  title={Animateanymesh: A feed-forward 4d foundation model for text-driven universal mesh animation},
  author={Wu, Zijie and Yu, Chaohui and Wang, Fan and Bai, Xiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={13557--13568},
  year={2025}
}
@article{wu2026animateanymesh++,
  title={AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation},
  author={Wu, Zijie and Yu, Chaohui and Wang, Fan and Bai, Xiang},
  journal={arXiv preprint arXiv:2604.26917},
  year={2026}
}

🙏 Acknowledgments

Our code references some great repos, which are AnimateAnyMesh, AnimateAnyMesh++ and Wan2_2. We thank the authors for their excellent works!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow (SIGGRAPH 2026)

📖 Overview

🔥 Latest News

🔧 Preparation

1. Environment Setup

2. Download Pretrained Models

3. Prepare Test Data

📂 Expected Directory Structure

📖 Inference

🎬 Animate a Mesh with Reference Video

🏋️‍♂️ Training

① Data Preparation

② Train R-DMesh VAE

③ Extract Video Latents

④ Extract DMesh Latents

⑤ Compute DMesh Feature Statistics

⑥ Train R-DMesh DiT

🔍 Evaluation

Test R-DMesh VAE

📚 Citation

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Wan2_2		Wan2_2
diffusion		diffusion
networks		networks
test_data		test_data
utils		utils
README.md		README.md
requirements.txt		requirements.txt
save_dmesh_latents.py		save_dmesh_latents.py
test_drive.py		test_drive.py
test_dvae.py		test_dvae.py
test_vae_factor_misalign.py		test_vae_factor_misalign.py
train_dit.py		train_dit.py
train_dvae.py		train_dvae.py

Folders and files

Latest commit

History

Repository files navigation

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow (SIGGRAPH 2026)

📖 Overview

🔥 Latest News

🔧 Preparation

1. Environment Setup

2. Download Pretrained Models

3. Prepare Test Data

📂 Expected Directory Structure

📖 Inference

🎬 Animate a Mesh with Reference Video

🏋️‍♂️ Training

① Data Preparation

② Train R-DMesh VAE

③ Extract Video Latents

④ Extract DMesh Latents

⑤ Compute DMesh Feature Statistics

⑥ Train R-DMesh DiT

🔍 Evaluation

Test R-DMesh VAE

📚 Citation

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages