Deployment Guide: Running LandmarkDiff on your own GPU #231

dreamlessx · 2026-03-14T22:39:59Z

dreamlessx
Mar 14, 2026
Maintainer

A quick guide for deploying LandmarkDiff locally with GPU support. This covers Docker, pip install, and common troubleshooting.

Option 1: Docker (recommended)

The easiest path if you have the NVIDIA Container Toolkit installed.

git clone https://github.com/dreamlessx/LandmarkDiff-public.git
cd LandmarkDiff-public

# Start the Gradio demo with GPU passthrough
docker compose up landmarkdiff
# Open http://localhost:7860

For training:

docker compose --profile training up train

Requires Docker 24+ and NVIDIA Container Toolkit. Minimum 6 GB VRAM for inference, 25 GB for training.

Option 2: pip install

# Create a fresh environment
conda create -n landmarkdiff python=3.11
conda activate landmarkdiff

# Install PyTorch with CUDA (adjust the CUDA version to match your driver)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# Install LandmarkDiff with all extras
git clone https://github.com/dreamlessx/LandmarkDiff-public.git
cd LandmarkDiff-public
pip install -e ".[train,eval,app,dev]"

# Launch the demo
python scripts/app.py

Option 3: HPC with Apptainer/Singularity

For clusters that do not allow Docker (most HPC environments):

# Build the Apptainer image
apptainer build landmarkdiff.sif containers/landmarkdiff.def

# Run inference
apptainer exec --nv landmarkdiff.sif python scripts/run_inference.py photo.jpg \\
    --procedure rhinoplasty --intensity 60 --mode controlnet

See GPU_TRAINING_GUIDE.md for SLURM job scripts and multi-node configurations.

GPU requirements

Use case	Minimum VRAM	Recommended
Inference (ControlNet)	6 GB	8 GB
Inference (ControlNet+IP)	8 GB	12 GB
Training (Phase A)	25 GB	40 GB (A100)
Training (Phase B)	40 GB	80 GB (A100)

TPS mode runs on CPU only and needs no GPU at all.

Common issues

CUDA out of memory: Lower num_inference_steps to 20 or use --mode tps for CPU-only. You can also enable CPU offloading via pipeline.enable_model_cpu_offload().

MediaPipe fails to detect face: Ensure the input image has a clearly visible face at reasonable resolution (512x512 minimum). Extreme angles or heavy occlusion can cause detection failure.

Models not downloading: On first run, LandmarkDiff downloads ~6 GB of model weights from Hugging Face. If you are behind a firewall, set HF_HUB_OFFLINE=0 and pre-download models with huggingface-cli download.

Share your deployment setup or ask questions below.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment Guide: Running LandmarkDiff on your own GPU #231

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Deployment Guide: Running LandmarkDiff on your own GPU #231

Uh oh!

dreamlessx Mar 14, 2026 Maintainer

Option 1: Docker (recommended)

Option 2: pip install

Option 3: HPC with Apptainer/Singularity

GPU requirements

Common issues

Replies: 0 comments

dreamlessx
Mar 14, 2026
Maintainer