Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

ICML 2026

Authors: Nan Bao, Yifan Zhao, Wenzhuang Wang, Jia Li

Environment Setup

We use two separate environments:

Main environment for core training and inference.

conda create -n dsp python=3.10.20
conda activate dsp
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install datasets==4.8.5 pillow==12.2.0 accelerate==1.13.0 transformers==5.8.1 diffusers==0.38.0 safetensors==0.8.0rc0 tensorboard==2.20.0 opencv-python==4.13.0.92 einops==0.8.2 imagesize==2.0.0 peft==0.19.1 ttach==0.0.3 ftfy==6.3.1 albumentations==2.0.8

Evaluation environment for MMDetection/MMEngine compatibility. It is used for evaluation with MMDetection/MMEngine due to strict version constraints, and also supports YOLO-based evaluation.

conda create -n dsp-eval python=3.10.20
conda activate dsp-eval
conda install mkl==2023.1.0 numpy==1.26.4
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install mmengine==0.10.7 tqdm==4.67.3 shapely==2.1.2 scipy==1.15.3 terminaltables==3.1.10 ultralytics==8.4.50 pycocotools==2.0.11 https://download.openmmlab.com/mmcv/dist/cu121/torch2.1.0/mmcv-2.1.0-cp310-cp310-manylinux1_x86_64.whl "numpy<2.0.0" "setuptools<70.0.0"

Set Environment Variables

Set the root path of this project:

export DSP_PROJECT_DIR=/path/to/DSP # replace with the actual path

It is recommended to add this line to ~/.bashrc or ~/.zshrc for persistence.

Pretrained Models Preparation

We use several pretrained models as external dependencies. Please download them manually from the following sources:

After downloading, organize the pretrained weights under ./pretrained as follows:

pretrained
├── stable-diffusion-v1-5
│   └── ...
├── clip-vit-large-patch14
│   └── ...
├── dinov2_vitl14_pretrain.pth
└── ViT-B-16.pt

You may either copy or symlink the files. We recommend using symbolic links:

ln -s /path/to/stable-diffusion-v1-5 ./pretrained/stable-diffusion-v1-5
ln -s /path/to/clip-vit-large-patch14 ./pretrained/clip-vit-large-patch14
ln -s /path/to/dinov2_vitl14_pretrain.pth ./pretrained/dinov2_vitl14_pretrain.pth
ln -s /path/to/ViT-B-16.pt ./pretrained/ViT-B-16.pt

Data Preparation

We use several public datasets. Please download them manually from the following sources:
- DIOR
- RUOD
- ExDark

Unzip the downloaded datasets and organize the external dataset directories as follows:

DIOR-VOC
├── Annotations
│   ├── Horizontal_Bounding_Boxes
│   └── Oriented_Bounding_Boxes
└── VOC2007
    ├── ImageSets
    │   ├── Layout
    │   ├── Main
    │   └── Segmentation
    └── JPEGImages

RUOD
├── Environment_pic
│   ├── blur
│   ├── color
│   └── light
├── Environmet_ANN
├── RUOD_ANN
└── RUOD_pic
    ├── test
    └── train

ExDark
├── annos
├── imageclasslist.txt
└── images

Run data preprocessing scripts located in ./scripts/data_process, after updating all hard-coded paths (e.g., /path/to/DIOR_VOC, /path/to/RUOD, /path/to/ExDark) in the scripts to match the local setup. Execute them in order.

The preprocessing outputs will be generated under ./data with the following structure:

data
├── DIOR
│   ├── dior_emb.pt
│   ├── images -> /path/to/DIOR-VOC/VOC2007/JPEGImages
│   ├── metadatas
│   └── patches
├── EXDARK
│   ├── exdark_emb.pt
│   ├── images
│   ├── metadatas
│   └── patches
└── RUOD
    ├── images -> /path/to/RUOD/RUOD_pic
    ├── metadatas
    ├── patches
    └── ruod_emb.pt

Training and Inference

We provide three example configurations in ./configs: dsp-dior.yaml, dsp-ruod.yaml, and dsp-exdark.yaml.

Argument Description:

config: configuration file for model and dataset setup.

metaseed: seed generator identifier for deterministic sampling.

num_seed: number of sampling seeds for few-shot evaluation.

k_shot: number of samples per category in few-shot setting.

run_id: identifier for different runs.

gpu_ids: GPU device indices for execution.

iter: number of bootstrap iterations for FID.

Base Phase Training

bash train_base.sh --config "dsp-dior"
bash train_base.sh --config "dsp-ruod"
bash train_base.sh --config "dsp-exdark"

Novel Phase Training

bash train_novel.sh --config "dsp-dior" --metaseed "aaa" --num_seed 50 --k_shot "5" --run_id "1" --gpu_ids "0,1,2,3"
bash train_novel.sh --config "dsp-ruod" --metaseed "aaa" --num_seed 50 --k_shot "5" --run_id "1" --gpu_ids "0,1,2,3"
bash train_novel.sh --config "dsp-exdark" --metaseed "aaa" --num_seed 50 --k_shot "5" --run_id "1" --gpu_ids "0,1,2,3"

Inference

bash infer.sh --config "dsp-dior" --metaseed "aaa" --num_seed 50 --k_shot "5" --run_id "1" --ckpt "100" --gpu_ids "0,1,2,3" --max_infer_size 50
bash infer.sh --config "dsp-ruod" --metaseed "aaa" --num_seed 50 --k_shot "5" --run_id "1" --ckpt "100" --gpu_ids "0,1,2,3" --max_infer_size 50
bash infer.sh --config "dsp-exdark" --metaseed "aaa" --num_seed 50 --k_shot "5" --run_id "1" --ckpt "100" --gpu_ids "0,1,2,3" --max_infer_size 50

Evaluation

Preparation

Download the YOLO and Faster R-CNN weights from this link. Place them under ./pretrained. The expected directory structure is as follows:

pretrained
├── evaluation
│   ├── mmdet
│   │   ├── faster_rcnn_r50_fpn_1x-dior
│   │   │   └── epoch_12.pth
│   │   ├── faster_rcnn_r50_fpn_1x-exdark
│   │   │   └── epoch_12.pth
│   │   └── faster_rcnn_r50_fpn_1x-ruod
│   │       └── epoch_12.pth
│   └── yolo
│       └── best.pt
└── ... (pretrained models for training)

YOLO (mAP / AP50 / AP75)

Note: In yolo-wrapper-dior.sh, the --xml_folder path should be set to the DIOR annotation directory (/path/to/DIOR-VOC/Annotations/Horizontal_Bounding_Boxes).

cd $DSP_PROJECT_DIR/scripts/evaluation/yoloscore-dior
bash yolo-wrapper-dior.sh --config "dsp-dior" --metaseed "aaa" --num_seed 50 --ckpt "100" --k_shot "5" --run_id "1" --gpu_ids 0

Faster R-CNN (mAP / AP50 / AP75)

cd $DSP_PROJECT_DIR/scripts/evaluation/FasterRCNN_score-mmdet
bash test-wrapper-dior.sh --config "dsp-dior" --metaseed "aaa" --num_seed 50 --ckpt "100" --k_shot "5" --run_id "1" --gpu_ids 0
bash test-wrapper-ruod.sh --config "dsp-ruod" --metaseed "aaa" --num_seed 50 --ckpt "100" --k_shot "5" --run_id "1" --gpu_ids 0
bash test-wrapper-exdark.sh --config "dsp-exdark" --metaseed "aaa" --num_seed 50 --ckpt "100" --k_shot "5" --run_id "1" --gpu_ids 0

Bootstrap FID

cd $DSP_PROJECT_DIR/scripts/evaluation/bootstrap_fid
python boot_fid-dior.py --config dsp-dior -run_id 1 -num_seeds 50 --iter 50 --k_shot 5
python boot_fid-ruod.py --config dsp-ruod -run_id 1 -num_seeds 50 --iter 50 --k_shot 5
python boot_fid-exdark.py --config dsp-exdark -run_id 1 -num_seeds 50 --iter 50 --k_shot 5

Bootstrap FID results will be saved under ./metrics/BootstrapFID.

Detection Metric Summarization

cd $DSP_PROJECT_DIR/scripts/evaluation/summarize
bash summarize-wrapper.sh --config "dsp-dior" --k_shot "5" --run_id "1" --ckpt "100" --metaseed "aaa" --num_seed 50
bash summarize-wrapper.sh --config "dsp-ruod" --k_shot "5" --run_id "1" --ckpt "100" --metaseed "aaa" --num_seed 50
bash summarize-wrapper.sh --config "dsp-exdark" --k_shot "5" --run_id "1" --ckpt "100" --metaseed "aaa" --num_seed 50

Detection evaluation results (mAP / AP50 / AP75, YOLO and Faster R-CNN) will be summarized in ./metrics.

Acknowledgement

Our work is based on stable diffusion, diffusers, CLIP, DINOv2, CC-Diff, MIGC, GradCAM, and kmeans_pytorch. Thanks for these great projects!

Citation

If you find our work useful for your research, please cite the following paper.

@inproceedings{
    bao2026envisioning,
    title={Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation},
    author={Bao, Nan and Zhao, Yifan and Wang, Wenzhuang and Li, Jia},
    booktitle={Forty-third International Conference on Machine Learning},
    year={2026},
    url={https://openreview.net/forum?id=Jva4wVEySO}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
databuilders		databuilders
datamodules		datamodules
figs		figs
fonts		fonts
models		models
scripts		scripts
utils		utils
variants		variants
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
infer.sh		infer.sh
main.py		main.py
train_base.sh		train_base.sh
train_novel.sh		train_novel.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

Environment Setup

Set Environment Variables

Pretrained Models Preparation

Data Preparation

Training and Inference

Base Phase Training

Novel Phase Training

Inference

Evaluation

Preparation

YOLO (mAP / AP50 / AP75)

Faster R-CNN (mAP / AP50 / AP75)

Bootstrap FID

Detection Metric Summarization

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

Environment Setup

Set Environment Variables

Pretrained Models Preparation

Data Preparation

Training and Inference

Base Phase Training

Novel Phase Training

Inference

Evaluation

Preparation

YOLO (mAP / AP50 / AP75)

Faster R-CNN (mAP / AP50 / AP75)

Bootstrap FID

Detection Metric Summarization

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages