1School of Software, Shandong University Β Β Β 2School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) Β Β Β
3School of Data Science, City University of Hong Kong Β Β Β
4Department of Computer Science and Engineering, Southern University of Science and TechnologyΒ Β Β
βΒ Corresponding authorΒ Β
Official Implementation: A novel network designed to tackle the phenomenon of visually similar but attribute-unrelated samples in Composed Image Retrieval (CIR) by learning attribute-based neighbor relations.
Welcome to the official repository for COMBINER (Composed Image Retrieval Guided by Attribute-based Neighbor Relations).
Existing CIR approaches often overlook cases where images appear visually alike yet differ in attributes, potentially undermining both multimodal feature fusion and similarity modeling. COMBINER tackles these obstacles by introducing a unified representation of cross-modal features based on attribute prototypes. By utilizing adaptive semantic disentanglement, unified prototype-based composition, and dual relations modeling, COMBINER accurately understands the semantic relations among samples and achieves State-of-the-Art (SOTA) performance across multiple benchmark datasets.
Figure 1. Example of (a) Pairwise Relations, (b) Neighbor Relations, and (c) Visually Similar Images in Relations Modeling. In this figure,
Figure 2. Schematic of our proposed similarity measure method based on attribute prototypes.
- [2026-05-06] π We officially release the main codes and framework of COMBINER!
- [2026-04-30] π COMBINER has been accepted by TIP 2026!
Our framework introduces three core modules to overcome attribute-level semantic entanglement and cross-modal inconsistency:
- π Adaptive Semantic Disentanglement (ASD): Capable of adaptively disentangling attribute features based on multimodal primitive features, addressing the entanglement in attribute-level semantics.
- π Unified Prototype-based Composition (UPC): Constructs Cross-modal Unified Prototypes (CUP) and serves as a shared dictionary to eliminate modal heterogeneity and facilitate multimodal feature composition.
- π§© Dual Relations Modeling (DRM): Mines both supervised pairwise relations and unsupervised neighbor relations based on attribute similarity, effectively gathering visually similar and attribute-related samples while pushing away attribute-unrelated distractors.
- π SOTA Performance: Demonstrates superior retrieval accuracy and achieves remarkable improvements across both fashion-domain (FashionIQ, Shoes) and open-domain (CIRR) datasets.
COMBINER consistently outperforms existing baselines (e.g., DQU-CIR, SPRC, SADN) across all standard metrics on three major benchmarks.
(Evaluated using Recall@K)
(Evaluated using R@K and R_subset@K)
- π Introduction
- π’ News
- β¨ Key Features
- ποΈ Architecture
- π Experiment Results
- π Repository Structure
- π Installation
- π Data Preparation
- πββοΈ Quick Start
- π€ Acknowledgements
- βοΈ Contact
- π Citation
- π Related Projects
- π«‘ Support & Contributing
Our codebase is highly modular. Here is a brief overview of the core files:
COMBINER/
βββ cirr_test_submission.py# π CIRR submission file generator
βββ datasets_openclip.py # π Dataset loader and preprocessing
βββ model.py # π§ COMBINER model architecture and forward pass
βββ test.py # π§ͺ Evaluation/Test entry point
βββ train.py # π Training entry point
βββ utils.py # π οΈ Utility functions (metrics, helper methods)
βββ README.md # π Documentation
1. Clone the repository
git clone https://github.com/iLearn-Lab/TIP26-COMBINER.git
cd COMBINER2. Setup Environment We recommend using Conda to manage your environment:
conda create -n combiner_env python=3.8.10
conda activate combiner_env
# Install PyTorch (Ensure it matches your CUDA version. Tested on PyTorch 2.0.0, NVIDIA A40 48G)
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118COMBINER is evaluated on FashionIQ, Shoes, and CIRR. Please download the datasets from their official sources and arrange them as follows.
Download the Shoes dataset following the instructions in the official repository.
After downloading the dataset, ensure that the folder structure matches the following:
βββ Shoes
β βββ captions_shoes.json
β βββ eval_im_names.txt
β βββ relative_captions_shoes.json
β βββ train_im_names.txt
β βββ [womens_athletic_shoes | womens_boots | ...]
| | βββ [0 | 1]
| | βββ [img_womens_athletic_shoes_375.jpg | descr_womens_athletic_shoes_734.txt | ...]
Download the FashionIQ dataset following the instructions in the official repository.
After downloading the dataset, ensure that the folder structure matches the following:
βββ FashionIQ
β βββ captions
| | βββ cap.dress.[train | val | test].json
| | βββ cap.toptee.[train | val | test].json
| | βββ cap.shirt.[train | val | test].json
β βββ image_splits
| | βββ split.dress.[train | val | test].json
| | βββ split.toptee.[train | val | test].json
| | βββ split.shirt.[train | val | test].json
β βββ dress
| | βββ [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]
β βββ shirt
| | βββ [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]
β βββ toptee
| | βββ [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]
Download the CIRR dataset following the instructions in the official repository.
After downloading the dataset, ensure that the folder structure matches the following:
βββ CIRR
β βββ train
| | βββ [0 | 1 | 2 | ...]
| | | βββ [train-10108-0-img0.png | train-10108-0-img1.png | ...]
β βββ dev
| | βββ [dev-0-0-img0.png | dev-0-0-img1.png | ...]
β βββ test1
| | βββ [test1-0-0-img0.png | test1-0-0-img1.png | ...]
β βββ cirr
| | βββ captions
| | | βββ cap.rc2.[train | val | test1].json
| | βββ image_splits
| | | βββ split.rc2.[train | val | test1].json
Train COMBINER on Shoes, FashionIQ, or CIRR using the train.py script.
General Training Command:
python3 train.py \
--model_dir ./checkpoints/ \
--dataset {shoes, fashioniq, cirr} \
--cirr_path "path/to/CIRR" \
--fashioniq_path "path/to/FashionIQ" \
--shoes_path "path/to/Shoes"To generate the predictions file for uploading to the CIRR Evaluation Server using our model, please execute the following command:
python src/cirr_test_submission.py model_path(Where model_path is the path to the COMBINER model checkpoint on CIRR, e.g., "checkpoints/COMBINER_CIRR.pt")
This project builds upon recent advancements in Composed Image Retrieval and Vision-Language pre-training. We express our sincere gratitude to the open-source community for their contributions.
If you have any questions, feel free to open an issue or reach out to us at:
lizixu.cs@gmail.com
If you find our work or this code useful in your research, please consider leaving a StarβοΈ or Citingπ our paper π₯°. Your support is our greatest motivation!
@article{li2025combiner,
title={COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations},
author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Wen, Haokun and Song, Xuemeng and Nie, Liqiang},
journal={IEEE TIP},
year={2025}
}Ecosystem & Other Works from our Team
![]() TEMA (ACL'26) Paper | Project | Code |
![]() ConeSep (CVPR'26) Paper | Project | Code | Blog Post (Chinese) |
![]() Air-Know (CVPR'26) Paper | Project | Code | Blog Post (Chinese) |
![]() INTENT (AAAI'26) Paper | Project | Code |
![]() HABIT (AAAI'26) Paper | Project | Code |
![]() ReTrack (AAAI'26) Paper | Project | Code | |
![]() HUD (ACM MM'25) Paper | Project | Code | |
![]() OFFSET (ACM MM'25) Paper | Project | Code |
![]() ENCODER (AAAI'25) Paper | Project | Code |
We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:
- Open an Issue for discussions or bug reports.
- Submit a Pull Request to improve the codebase.















