Smaller But Better: Unifying Layout Generation with Smaller Large Language Models
International Journal of Computer Vision (IJCV), 2025
⭐Official code of the LGGPT model.
LGGPT is an LLM-based model that unifies arbitrary tasks and multiple domains of layout generation. It is developed based on GPT2-XL (1.5B parameters), which proves that LLM of such a small scale can achieve competitive layout generation performance across various tasks and multiple layout domains.
git clone https://github.com/NiceRingNode/LGGPT.git
cd LGGPT
conda create -n lggpt python=3.8.16
conda activate lggpt
pip install -r requirements.txtDownload five layout datasets: PubLayNet, Rico, Magazine, WiSe, SPaSe. Create a new folder termed data-raw, unzipping these datasets in the data-raw with the following format:
data-raw
├── publaynet-image
│ ├── train.json
│ ├── val.json
│ └── ...
├── rico_dataset_v0.1_semantic_annotations
│ └── semantic_annotations
├── MagLayout/layoutdata
│ ├── annotations
│ └── ...
├── WiSe
│ ├── labels
│ └── ...
├── SPaSe
│ ├── labels
│ └── ...Run the process.sh to preprocess the dataset.
bash preprocess.shThe training data are stored in the data/unified folder, and the testing data for PubLayNet, Rico, and Magazine are stored in the data/publaynet, data/rico, and data/magazine folders. The directories are as follows.
data
├── unified
│ └── train.txt
├── publaynet
│ ├── val_prompt.txt
│ ├── val.txt
│ └── train.txt
├── rico
│ ├── test_prompt.txt
│ ├── test.txt
│ └── train.txt
├── magazine
│ ├── test_prompt.txt
│ ├── test.txt
│ └── train.txt
├── slide
│ ├── test.txt
│ └── train.txt
We also provide a well-processed training data used in the training of the paper, which can be downloaded from the these links (Baidu Cloud/Google Drive) and directly used for training. However, the testing data must be explicitly processed and prepared.
Before training, please complete the following two preparation steps:
First, download the model's pretrained weights from 🤗HuggingFace (e.g., GPT2-XL) and place it at the local folder (e.g., gpt2-xl). Specify the path to this folder in the config file. (e.g., config/gpt2.yaml).
Second, modify the GPT2LMHeadModel implementation in the transformers library. Locate the source code file under the path /your/path/to/miniconda3/envs/lggpt/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py, change
shift_logits = lm_logits[..., :, :-1].contiguous()
shift_labels = labels[..., 1:].contiguous()to
shift_logits = lm_logits[..., :, :].contiguous()
shift_labels = labels[..., :].contiguous()Then you are free to run the training code as:
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 train.py \
--dataset 'unified' \
--notes 'training for unified data' \
--name 'LGGPT' \
--config 'config/gpt2.yaml'One can specify the running devices using CUDA_VISIBLE_DEVICES and adjust the number of GPU devices with nproc_per_node accordingly.
python test.py \
--weights weights/gpt2unify-20250218-121251/checkpoint-23000/pytorch_model.bin \
--dataset publaynet \
--cond C \
--temp 1.0 \
--gpu 1The model is trained with unified data across various tasks and multiple domains, while it is is evaluated independently for each task and dataset. The of parameters for testing are as follows:
weights: The checkpoint's path.dataset: The dataset used for testing, including:publaynet,rico, andmagazine.cond: The task type for testing. Available task types include:Cfor Completion,Tfor Gen-T,T-Sfor Gen-TS,T-L-Rfor Relation,Rfor Refinement,Ufor Gen-U,U-Pfor Gen-UP,C-R-Afor Completion-Refinement,T-S-Pfor Gen-TPS,P-S-Rfor Gen-PS-Refinement,B-Rfor Gen-Arb-Refinement. See the paper for detailed task definitions.temp: Temperature parameter used in inference.temp > 0indicates performing sampling during layout generation, whiletemp = 0.indicates that sampling is not performed.gpu: GPU used for testing.
If the testing was interrupted, one can resume it through:
python test.py \
--weights weights/LGGPT-20250218-121251/checkpoint-23000/pytorch_model.bin \
--dataset publaynet \
--cond C \
--temp 1.0 \
--gpu 1 \
--rmp weights/LGGPT-20250218-121251/checkpoint-23000/C-sample59-metrics.pthThe rmp parameter specifies the path to the previous evaluation results that are automatically stored.
@article{lggpt2025zhang,
title={{Smaller But Better: Unifying Layout Generation with Smaller Large Language Models}},
author={Zhang, Peirong and Zhang, Jiaxin and Cao, Jiahuan and Li, Hongliang and Jin, Lianwen},
journal={International Journal of Computer Vision (IJCV)},
volume={133},
pages={3891–3917},
year={2025}
}Peirong Zhang: eeprzhang@mail.scut.edu.cn
Copyright 2025, Deep Learning and Vision Computing (DLVC) Lab, South China China University of Technology. http://www.dlvc-lab.net.
