🌠LGGPT

Smaller But Better: Unifying Layout Generation with Smaller Large Language Models

International Journal of Computer Vision (IJCV), 2025

⭐Official code of the LGGPT model.

🌊Introduction

LGGPT is an LLM-based model that unifies arbitrary tasks and multiple domains of layout generation. It is developed based on GPT2-XL (1.5B parameters), which proves that LLM of such a small scale can achieve competitive layout generation performance across various tasks and multiple layout domains.

The framework of LGGPT

🌏Environment

git clone https://github.com/NiceRingNode/LGGPT.git
cd LGGPT
conda create -n lggpt python=3.8.16
conda activate lggpt
pip install -r requirements.txt

⚒️Data Preparation

Download five layout datasets: PubLayNet, Rico, Magazine, WiSe, SPaSe. Create a new folder termed data-raw, unzipping these datasets in the data-raw with the following format:

data-raw
├── publaynet-image
│   ├── train.json
│   ├── val.json
│   └── ...
├── rico_dataset_v0.1_semantic_annotations
│   └── semantic_annotations
├── MagLayout/layoutdata
│   ├── annotations
│   └── ...
├── WiSe
│   ├── labels
│   └── ...
├── SPaSe
│   ├── labels
│   └── ...

Run the process.sh to preprocess the dataset.

bash preprocess.sh

The training data are stored in the data/unified folder, and the testing data for PubLayNet, Rico, and Magazine are stored in the data/publaynet, data/rico, and data/magazine folders. The directories are as follows.

data
├── unified
│   └── train.txt
├── publaynet
│   ├── val_prompt.txt
│   ├── val.txt
│   └── train.txt
├── rico
│   ├── test_prompt.txt
│   ├── test.txt
│   └── train.txt
├── magazine
│   ├── test_prompt.txt
│   ├── test.txt
│   └── train.txt
├── slide
│   ├── test.txt
│   └── train.txt

We also provide a well-processed training data used in the training of the paper, which can be downloaded from the these links (Baidu Cloud/Google Drive) and directly used for training. However, the testing data must be explicitly processed and prepared.

🚀Train

Before training, please complete the following two preparation steps:

First, download the model's pretrained weights from 🤗HuggingFace (e.g., GPT2-XL) and place it at the local folder (e.g., gpt2-xl). Specify the path to this folder in the config file. (e.g., config/gpt2.yaml).

Second, modify the GPT2LMHeadModel implementation in the transformers library. Locate the source code file under the path /your/path/to/miniconda3/envs/lggpt/lib/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py, change

shift_logits = lm_logits[..., :, :-1].contiguous()
shift_labels = labels[..., 1:].contiguous()

to

shift_logits = lm_logits[..., :, :].contiguous()
shift_labels = labels[..., :].contiguous()

Then you are free to run the training code as:

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 train.py \
  --dataset 'unified' \
  --notes 'training for unified data' \
  --name 'LGGPT' \
  --config 'config/gpt2.yaml'

One can specify the running devices using CUDA_VISIBLE_DEVICES and adjust the number of GPU devices with nproc_per_node accordingly.

🥂Test

python test.py \
  --weights weights/gpt2unify-20250218-121251/checkpoint-23000/pytorch_model.bin \
  --dataset publaynet \
  --cond C \
  --temp 1.0 \
  --gpu 1

The model is trained with unified data across various tasks and multiple domains, while it is is evaluated independently for each task and dataset. The of parameters for testing are as follows:

weights: The checkpoint's path.
dataset: The dataset used for testing, including: publaynet, rico, and magazine.
cond: The task type for testing. Available task types include: C for Completion, T for Gen-T, T-S for Gen-TS, T-L-R for Relation, R for Refinement, U for Gen-U, U-P for Gen-UP, C-R-A for Completion-Refinement, T-S-P for Gen-TPS, P-S-R for Gen-PS-Refinement, B-R for Gen-Arb-Refinement. See the paper for detailed task definitions.
temp: Temperature parameter used in inference. temp > 0 indicates performing sampling during layout generation, while temp = 0. indicates that sampling is not performed.
gpu: GPU used for testing.

If the testing was interrupted, one can resume it through:

python test.py \
  --weights weights/LGGPT-20250218-121251/checkpoint-23000/pytorch_model.bin \
  --dataset publaynet \
  --cond C \
  --temp 1.0 \
  --gpu 1 \
  --rmp weights/LGGPT-20250218-121251/checkpoint-23000/C-sample59-metrics.pth

The rmp parameter specifies the path to the previous evaluation results that are automatically stored.

📋Citation

@article{lggpt2025zhang,
  title={{Smaller But Better: Unifying Layout Generation with Smaller Large Language Models}},
  author={Zhang, Peirong and Zhang, Jiaxin and Cao, Jiahuan and Li, Hongliang and Jin, Lianwen},
  journal={International Journal of Computer Vision (IJCV)},
  volume={133},
  pages={3891–3917},
  year={2025}
}

📟Cotact

Peirong Zhang: eeprzhang@mail.scut.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
asset		asset
config		config
datasets		datasets
pretrained		pretrained
process		process
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fid.py		fid.py
metrics.py		metrics.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌠LGGPT

🌊Introduction

🌏Environment

⚒️Data Preparation

🚀Train

🥂Test

📋Citation

📟Cotact

📑Copyright

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌠LGGPT

🌊Introduction

🌏Environment

⚒️Data Preparation

🚀Train

🥂Test​​

📋Citation

📟Cotact

📑Copyright

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

🥂Test

Packages