Christina Ourania Tze Β· Daniel Dauner Β· Yiyi Liao Β· Dzmitry Tsishkou Β· Andreas Geiger
Paper Β |Β Project Page
We introduce PrITTI, a latent diffusion-based framework that leverages primitives as the main foundational elements for generating compositional, controllable, and editable 3D semantic scene layouts. Our approach enables applications such as scene editing, inpainting, outpainting, and photo-realistic street view synthesis.
This is the official repository for PrITTI.
- [May 2026] Pre-trained models, training, inference & evaluation code released.
- [Feb 2026] PrITTI is accepted to CVPR 2026, see you in Denver!
Add the following to your ~/.bashrc:
export PRITTI_WORKSPACE="$HOME/pritti_workspace"
export KITTI360_DATASET="$HOME/pritti_workspace/dataset"
export PRITTI_EXP_ROOT="$HOME/pritti_workspace/exp"
export PRITTI_CACHE_ROOT="$HOME/pritti_workspace/cache"
export PRITTI_DEVKIT_ROOT="$HOME/pritti_workspace/pritti"Then reload your shell:
source ~/.bashrcmkdir $PRITTI_WORKSPACE
cd $PRITTI_WORKSPACE
git clone https://github.com/autonomousvision/pritti.git && cd pritticonda env create --name pritti -f environment.yaml
conda activate pritti
pip install -r requirements.txt
pip install git+https://github.com/raniatze/kitti360Scripts.git
wget https://anaconda.org/pytorch3d/pytorch3d/0.7.8/download/linux-64/pytorch3d-0.7.8-py39_cu121_pyt241.tar.bz2 -O pytorch3d.tar.bz2
conda install pytorch3d.tar.bz2
rm pytorch3d.tar.bz2
pip install -e .mkdir $KITTI360_DATASET && cd $KITTI360_DATASET
gdown https://drive.google.com/uc?id=1_yIKHQZj1E1V2jogsDHAEsolKOylpA8U
unzip dataset.zip && mv dataset/* . && rmdir dataset && rm dataset.zipRun the following in order:
cd $PRITTI_DEVKIT_ROOT
bash scripts/preprocessing/preprocessing.sh
bash scripts/preprocessing/samples_labeling.sh
bash scripts/preprocessing/dataset_statistics.shTo visually verify the preprocessed dataset, you can render a random subset of ground truth samples:
bash scripts/preprocessing/samples_visualization.shScreenshots will be saved under $PRITTI_EXP_ROOT/exp/preprocessing/samples_visualization/<TIMESTAMP>/visualizations/, where <TIMESTAMP> is the script's launch time, formatted as YYYY.MM.DD.HH.MM.SS.
We release the pre-trained LVAE and LDM (DiT-B) checkpoints on Hugging Face: raniatze/pritti-checkpoints. If you want to skip training and jump straight to Inference, follow the steps below.
Add the following to your ~/.bashrc (these match the released checkpoints) and reload your shell:
export LVAE_TIMESTAMP="2025.06.03.17.23.30"
export LVAE_EPOCH="299"
export LVAE_STEP="580200"source ~/.bashrc# LVAE checkpoint
LVAE_DIR=$PRITTI_EXP_ROOT/exp/training_lvae_model/training_lvae_model/$LVAE_TIMESTAMP/checkpoints
mkdir -p $LVAE_DIR
huggingface-cli download raniatze/pritti-checkpoints lvae.ckpt --local-dir $LVAE_DIR
mv $LVAE_DIR/lvae.ckpt $LVAE_DIR/epoch=$LVAE_EPOCH-step=$LVAE_STEP.ckpt
# LDM (DiT-B) checkpoint
LDM_DIR=$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/training_dit_b_model/$LVAE_TIMESTAMP
mkdir -p $LDM_DIR
huggingface-cli download raniatze/pritti-checkpoints --include "ldm_b/*" --local-dir $LDM_DIR
mv $LDM_DIR/ldm_b $LDM_DIR/checkpointYou can now skip directly to the Inference section.
PrITTI is trained in two stages: a Layout Variational Autoencoder followed by a Latent Diffusion Model.
Train the LVAE
Run:bash scripts/training/autoencoder/training_autoencoder.shThe trained checkpoint will be saved under:
$PRITTI_EXP_ROOT/exp/training_lvae_model/training_lvae_model/<TIMESTAMP>/checkpoints/
Once training is complete, update your ~/.bashrc to match your trained checkpoint and reload:
export LVAE_TIMESTAMP="<TIMESTAMP>" # name of the training run folder (format: YYYY.MM.DD.HH.MM.SS)
export LVAE_EPOCH="<EPOCH>" # from the checkpoint filename: epoch=<EPOCH>-step=<STEP>.ckpt
export LVAE_STEP="<STEP>" # from the checkpoint filename: epoch=<EPOCH>-step=<STEP>.ckptsource ~/.bashrcCache LVAE Latents
Once LVAE_TIMESTAMP, LVAE_EPOCH, and LVAE_STEP are set in your environment, run the following command to cache the latent representations used for training the diffusion model:
bash scripts/training/autoencoder/latent_caching.sh2.1 Train the Diffusion Model
Run:
bash scripts/training/diffusion/training_diffusion.shBy default this trains a DiT-B model. To select a different model size, set DIFFUSION_MODEL to one of dit_s_model, dit_b_model, dit_l_model, or dit_xl_model.
The checkpoint will be saved under:
$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/training_dit_b_model/$LVAE_TIMESTAMP/
Reconstruct Samples
Run:
bash scripts/training/autoencoder/samples_caching.shReconstructed samples will be saved under:
$PRITTI_EXP_ROOT/exp/training_lvae_model/samples_caching/$LVAE_TIMESTAMP/samples_cache/
(Optional) Visualize Reconstructed Samples
Run:
bash scripts/training/autoencoder/samples_visualization.shVisualizations will be saved under:
$PRITTI_EXP_ROOT/exp/training_lvae_model/samples_visualization/$LVAE_TIMESTAMP/visualizations/
Generate Samples
Run:
bash scripts/training/diffusion/samples_caching.shGenerated samples will be saved under:
$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/samples_caching/$LVAE_TIMESTAMP/samples_cache/
(Optional) Visualize Generated Samples
Run:
bash scripts/training/diffusion/samples_visualization.shVisualizations will be saved under:
$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/samples_visualization/$LVAE_TIMESTAMP/visualizations/
We report two families of metrics: reconstruction metrics for the LVAE, and generation metrics for the LDM.
Compute LVAE reconstruction metrics
Run:
bash scripts/metrics/reconstruction/pritti_metrics.shThis loads the LVAE checkpoint, reconstructs the validation split, and reports separate metrics for the ground raster map and the primitives (the latter using Omni3D).
Outputs:
$PRITTI_EXP_ROOT/exp/training_lvae_model/reconstruction_metrics/$LVAE_TIMESTAMP/
βββ raster_results.csv # ground raster map reconstruction (per-sample + mean row)
βββ vector_results.csv # primitives reconstruction (per-sample + mean row)
PrITTI's generation metrics are computed on top-down 2D semantic maps rendered from both reference and generated 3D scenes. The pipeline runs in six steps:
1. Render reference semantic maps
Renders the reference scenes into top-down semantic maps:
bash scripts/preprocessing/semantic_maps_rendering.shThe rendered reference maps are written under $PRITTI_CACHE_ROOT/semantic_cache/.
Note: Open3D's offscreen renderer occasionally yields all-black or all-white images for otherwise valid scenes; these are detected and skipped without writing the
.gzfile. You may therefore need to re-run this script several times to render the full reference set. Already-rendered samples are detected and skipped automatically.
2. Select reference scenes via farthest-point sampling
Farthest-point sampling selects NUM_SAMPLES reference scenes (default 1000) that are maximally spaced apart, used to build the evaluation batch reported in the main paper:
bash scripts/preprocessing/reference_sampling_fps.shThis writes the selected subset to $PRITTI_CACHE_ROOT/semantic_cache/reference_batch_fps_<NUM_SAMPLES>.txt.
Supplementary: we additionally report an ablation using distance-based reference sampling, where reference scenes are filtered by a minimum spatial distance threshold
DISTANCE_THRESHOLD(default10m). Runbash scripts/preprocessing/reference_sampling_distance.shto reproduce this alternative; it writesreference_batch_distance_<DISTANCE_THRESHOLD>m.txtto the same directory.
3. Generate 50K samples without classifier-free guidance
For generation metrics, we generate 50K samples with classifier-free guidance disabled (guidance_scale=1.0). Edit scripts/training/diffusion/samples_caching.sh to reflect this:
# In scripts/training/diffusion/samples_caching.sh
GENERATED_SAMPLES_CACHE_SIZE=50000
GUIDANCE_SCALE=1.0Then run:
bash scripts/training/diffusion/samples_caching.shGenerated samples are saved under $PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/samples_caching/$LVAE_TIMESTAMP/samples_cache/.
4. Render generated semantic maps
Renders the 50K generated scenes from step 3 into top-down semantic maps:
bash scripts/training/diffusion/semantic_maps_rendering.shThe rendered maps are saved under $PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/semantic_maps_rendering/$LVAE_TIMESTAMP/semantic_cache/.
Note: Open3D's offscreen renderer occasionally yields all-black or all-white images for otherwise valid scenes; these are detected and skipped without writing the
.gzfile. You may therefore need to re-run this script several times to render the full set of generated samples. Already-rendered samples are detected and skipped automatically.
5. Build paired evaluation batches
Pairs the FPS-selected reference subset (from step 2) with an equal number of randomly sampled generated maps, and saves them as .npz arrays for downstream metric computation:
bash scripts/metrics/generation/generation_metrics_fps.shOutputs:
$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/generation_metrics/$LVAE_TIMESTAMP/
βββ ref_batch_fps_<NUM_SAMPLES>.npz
βββ samples_batch_fps_<NUM_SAMPLES>.npz
Supplementary: to reproduce the distance-based ablation, run
bash scripts/metrics/generation/generation_metrics_distance.sh. It addsref_batch_distance_<DISTANCE_THRESHOLD>m.npzandsamples_batch_distance_<DISTANCE_THRESHOLD>m.npzto the same output directory.
6. Compute generation metrics
The evaluator is adapted from OpenAI's guided-diffusion and runs in a separate conda environment to avoid TensorFlow/PyTorch dependency conflicts with the main pritti env.
Set up the metrics environment (one-time):
conda create -n metrics python=3.9
conda activate metrics
pip install 'tensorflow[and-cuda]'
pip install tqdm scipy requestsRun the evaluator:
conda activate metrics
bash scripts/metrics/generation/evaluator_fps.shInception Score, FID, sFID, Precision, and Recall are printed to stdout.
Supplementary: to evaluate the distance-based ablation, run:
bash scripts/metrics/generation/evaluator_distance.sh
If you find PrITTI useful, please consider giving us a star and citing our paper:
@inproceedings{Tze2026PrITTI,
author = {Tze, Christina Ourania and Dauner, Daniel and Liao, Yiyi and Tsishkou, Dzmitry and Geiger, Andreas},
title = {PrITTI: Primitive-based Generation of Controllable and Editable 3D Semantic Scenes},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026},
}