[TIP 2026] COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations

Zixu Li¹, Yupeng Hu^1✉, Zhiwei Chen¹, Haokun Wen^2,3, Xuemeng Song⁴, Liqiang Nie²

¹School of Software, Shandong University
²School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
³School of Data Science, City University of Hong Kong
⁴Department of Computer Science and Engineering, Southern University of Science and Technology
^✉Corresponding author

Official Implementation: A novel network designed to tackle the phenomenon of visually similar but attribute-unrelated samples in Composed Image Retrieval (CIR) by learning attribute-based neighbor relations.

📌 Introduction

Welcome to the official repository for COMBINER (Composed Image Retrieval Guided by Attribute-based Neighbor Relations).

Existing CIR approaches often overlook cases where images appear visually alike yet differ in attributes, potentially undermining both multimodal feature fusion and similarity modeling. COMBINER tackles these obstacles by introducing a unified representation of cross-modal features based on attribute prototypes. By utilizing adaptive semantic disentanglement, unified prototype-based composition, and dual relations modeling, COMBINER accurately understands the semantic relations among samples and achieves State-of-the-Art (SOTA) performance across multiple benchmark datasets.

Figure 1. Example of (a) Pairwise Relations, (b) Neighbor Relations, and (c) Visually Similar Images in Relations Modeling. In this figure, $Q$ denotes the multimodal query, $T$ denotes the target image, and $C$ denotes the candidate image. Fig. 1(c) illustrates the traditional neighbor relations modeling methodology brings both candidate images $C_1$ and $C_2$ close to $Q_1$. However, $C_2$ is visually similar but attribute-unrelated with $Q_1$ (“carpet” does not match the query “bedding”). Therefore, $C_2$ should not be brought close to $Q_1$.

Figure 2. Schematic of our proposed similarity measure method based on attribute prototypes.

⬆ Back to top

📢 News

[2026-05-06] 🚀 We officially release the main codes and framework of COMBINER!
[2026-04-30] 🎉 COMBINER has been accepted by TIP 2026!

⬆ Back to top

✨ Key Features

Our framework introduces three core modules to overcome attribute-level semantic entanglement and cross-modal inconsistency:

🔍 Adaptive Semantic Disentanglement (ASD): Capable of adaptively disentangling attribute features based on multimodal primitive features, addressing the entanglement in attribute-level semantics.
🔗 Unified Prototype-based Composition (UPC): Constructs Cross-modal Unified Prototypes (CUP) and serves as a shared dictionary to eliminate modal heterogeneity and facilitate multimodal feature composition.
🧩 Dual Relations Modeling (DRM): Mines both supervised pairwise relations and unsupervised neighbor relations based on attribute similarity, effectively gathering visually similar and attribute-related samples while pushing away attribute-unrelated distractors.
🏆 SOTA Performance: Demonstrates superior retrieval accuracy and achieves remarkable improvements across both fashion-domain (FashionIQ, Shoes) and open-domain (CIRR) datasets.

⬆ Back to top

🏗️ Architecture

Figure 3. The overall framework of COMBINER. It consists of (a) Adaptive Semantic Disentanglement, (b) Unified Prototype-based Composition, and (c) Dual Relations Modeling.

⬆ Back to top

📊 Experiment Results

COMBINER consistently outperforms existing baselines (e.g., DQU-CIR, SPRC, SADN) across all standard metrics on three major benchmarks.

1. FashionIQ & Shoes Datasets

(Evaluated using Recall@K)

2. CIRR Dataset

(Evaluated using R@K and R_subset@K)

⬆ Back to top

📑 Table of Contents

📌 Introduction
📢 News
✨ Key Features
🏗️ Architecture
📊 Experiment Results
📂 Repository Structure
🚀 Installation
📂 Data Preparation
- Shoes
- FashionIQ
- CIRR
🏃‍♂️ Quick Start
- 1. Training the Model
- 2. Test for CIRR
🤝 Acknowledgements
✉️ Contact
📝 Citation
🔗 Related Projects
🫡 Support & Contributing

📂 Repository Structure

Our codebase is highly modular. Here is a brief overview of the core files:

COMBINER/
├── cirr_test_submission.py# 📄 CIRR submission file generator
├── datasets_openclip.py   # 📚 Dataset loader and preprocessing
├── model.py               # 🧠 COMBINER model architecture and forward pass
├── test.py                # 🧪 Evaluation/Test entry point
├── train.py               # 🚀 Training entry point
├── utils.py               # 🛠️ Utility functions (metrics, helper methods)
└── README.md              # 📝 Documentation

🚀 Installation

1. Clone the repository

git clone https://github.com/iLearn-Lab/TIP26-COMBINER.git
cd COMBINER

2. Setup Environment We recommend using Conda to manage your environment:

conda create -n combiner_env python=3.8.10
conda activate combiner_env

# Install PyTorch (Ensure it matches your CUDA version. Tested on PyTorch 2.0.0, NVIDIA A40 48G)
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

📂 Data Preparation

COMBINER is evaluated on FashionIQ, Shoes, and CIRR. Please download the datasets from their official sources and arrange them as follows.

Shoes

Download the Shoes dataset following the instructions in the official repository.

After downloading the dataset, ensure that the folder structure matches the following:

├── Shoes
│   ├── captions_shoes.json
│   ├── eval_im_names.txt
│   ├── relative_captions_shoes.json
│   ├── train_im_names.txt
│   ├── [womens_athletic_shoes | womens_boots | ...]
|   |   ├── [0 | 1]
|   |   ├── [img_womens_athletic_shoes_375.jpg | descr_womens_athletic_shoes_734.txt | ...]

FashionIQ

Download the FashionIQ dataset following the instructions in the official repository.

After downloading the dataset, ensure that the folder structure matches the following:

├── FashionIQ
│   ├── captions
|   |   ├── cap.dress.[train | val | test].json
|   |   ├── cap.toptee.[train | val | test].json
|   |   ├── cap.shirt.[train | val | test].json

│   ├── image_splits
|   |   ├── split.dress.[train | val | test].json
|   |   ├── split.toptee.[train | val | test].json
|   |   ├── split.shirt.[train | val | test].json

│   ├── dress
|   |   ├── [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]

│   ├── shirt
|   |   ├── [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]

│   ├── toptee
|   |   ├── [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]

CIRR

Download the CIRR dataset following the instructions in the official repository.

After downloading the dataset, ensure that the folder structure matches the following:

├── CIRR
│   ├── train
|   |   ├── [0 | 1 | 2 | ...]
|   |   |   ├── [train-10108-0-img0.png | train-10108-0-img1.png | ...]

│   ├── dev
|   |   ├── [dev-0-0-img0.png | dev-0-0-img1.png | ...]

│   ├── test1
|   |   ├── [test1-0-0-img0.png | test1-0-0-img1.png | ...]

│   ├── cirr
|   |   ├── captions
|   |   |   ├── cap.rc2.[train | val | test1].json
|   |   ├── image_splits
|   |   |   ├── split.rc2.[train | val | test1].json

🏃‍♂️ Quick Start

1. Training the Model

Train COMBINER on Shoes, FashionIQ, or CIRR using the train.py script.

General Training Command:

python3 train.py \
    --model_dir ./checkpoints/ \
    --dataset {shoes, fashioniq, cirr} \
    --cirr_path "path/to/CIRR" \
    --fashioniq_path "path/to/FashionIQ" \
    --shoes_path "path/to/Shoes"

2. Test for CIRR

To generate the predictions file for uploading to the CIRR Evaluation Server using our model, please execute the following command:

python src/cirr_test_submission.py model_path

(Where model_path is the path to the COMBINER model checkpoint on CIRR, e.g., "checkpoints/COMBINER_CIRR.pt")

🤝 Acknowledgements

This project builds upon recent advancements in Composed Image Retrieval and Vision-Language pre-training. We express our sincere gratitude to the open-source community for their contributions.

✉️ Contact

If you have any questions, feel free to open an issue or reach out to us at: lizixu.cs@gmail.com ☺️

📝⭐️ Citation

If you find our work or this code useful in your research, please consider leaving a Star⭐️ or Citing📝 our paper 🥰. Your support is our greatest motivation!

@article{li2025combiner,
  title={COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations},
  author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Wen, Haokun and Song, Xuemeng and Nie, Liqiang},
  journal={IEEE TIP},
  year={2025}
}

🔗 Related Projects

Ecosystem & Other Works from our Team

TEMA (ACL'26) Paper \| Project \| Code	ConeSep (CVPR'26) Paper \| Project \| Code \| Blog Post (Chinese)	Air-Know (CVPR'26) Paper \| Project \| Code \| Blog Post (Chinese)
INTENT (AAAI'26) Paper \| Project \| Code	HABIT (AAAI'26) Paper \| Project \| Code	ReTrack (AAAI'26) Paper \| Project \| Code \|
HUD (ACM MM'25) Paper \| Project \| Code \|	OFFSET (ACM MM'25) Paper \| Project \| Code	ENCODER (AAAI'25) Paper \| Project \| Code

🫡 Support & Contributing

We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:

Open an Issue for discussions or bug reports.
Submit a Pull Request to improve the codebase.

⬆ Back to top

TEMA (ACL'26) Paper \| Project \| Code	ConeSep (CVPR'26) Paper \| Project \| Code \| Blog Post (Chinese)	Air-Know (CVPR'26) Paper \| Project \| Code \| Blog Post (Chinese)
INTENT (AAAI'26) Paper \| Project \| Code	HABIT (AAAI'26) Paper \| Project \| Code	ReTrack (AAAI'26) Paper \| Project \| Code \|
HUD (ACM MM'25) Paper \| Project \| Code \|	OFFSET (ACM MM'25) Paper \| Project \| Code	ENCODER (AAAI'25) Paper \| Project \| Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[TIP 2026] COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations

📌 Introduction

📢 News

✨ Key Features

🏗️ Architecture

📊 Experiment Results

1. FashionIQ & Shoes Datasets

2. CIRR Dataset

📑 Table of Contents

📂 Repository Structure

🚀 Installation

📂 Data Preparation

Shoes

FashionIQ

CIRR

🏃‍♂️ Quick Start

1. Training the Model

2. Test for CIRR

🤝 Acknowledgements

✉️ Contact

📝⭐️ Citation

🔗 Related Projects

🫡 Support & Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
README.md		README.md
cirr_test_submission.py		cirr_test_submission.py
data_utils.py		data_utils.py
datasets_openclip.py		datasets_openclip.py
model.py		model.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

[TIP 2026] COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations

📌 Introduction

📢 News

✨ Key Features

🏗️ Architecture

📊 Experiment Results

1. FashionIQ & Shoes Datasets

2. CIRR Dataset

📑 Table of Contents

📂 Repository Structure

🚀 Installation

📂 Data Preparation

Shoes

FashionIQ

CIRR

🏃‍♂️ Quick Start

1. Training the Model

2. Test for CIRR

🤝 Acknowledgements

✉️ Contact

📝⭐️ Citation

🔗 Related Projects

🫡 Support & Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages