Skip to content

iLearn-Lab/TIP26-COMBINER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

[TIP 2026] COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations

1School of Software, Shandong University Β Β Β 
2School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen) Β Β Β 
3School of Data Science, City University of Hong Kong Β Β Β 
4Department of Computer Science and Engineering, Southern University of Science and TechnologyΒ Β Β 
βœ‰Β Corresponding authorΒ Β 

IEEE TIP 2026 arXiv page Author Page PyTorch Python stars

Official Implementation: A novel network designed to tackle the phenomenon of visually similar but attribute-unrelated samples in Composed Image Retrieval (CIR) by learning attribute-based neighbor relations.

πŸ“Œ Introduction

Welcome to the official repository for COMBINER (Composed Image Retrieval Guided by Attribute-based Neighbor Relations).

Existing CIR approaches often overlook cases where images appear visually alike yet differ in attributes, potentially undermining both multimodal feature fusion and similarity modeling. COMBINER tackles these obstacles by introducing a unified representation of cross-modal features based on attribute prototypes. By utilizing adaptive semantic disentanglement, unified prototype-based composition, and dual relations modeling, COMBINER accurately understands the semantic relations among samples and achieves State-of-the-Art (SOTA) performance across multiple benchmark datasets.

COMBINER intro1

Figure 1. Example of (a) Pairwise Relations, (b) Neighbor Relations, and (c) Visually Similar Images in Relations Modeling. In this figure, $Q$ denotes the multimodal query, $T$ denotes the target image, and $C$ denotes the candidate image. Fig. 1(c) illustrates the traditional neighbor relations modeling methodology brings both candidate images $C_1$ and $C_2$ close to $Q_1$. However, $C_2$ is visually similar but attribute-unrelated with $Q_1$ (β€œcarpet” does not match the query β€œbedding”). Therefore, $C_2$ should not be brought close to $Q_1$.

COMBINER intro1

Figure 2. Schematic of our proposed similarity measure method based on attribute prototypes.

⬆ Back to top

πŸ“’ News

  • [2026-05-06] πŸš€ We officially release the main codes and framework of COMBINER!
  • [2026-04-30] πŸŽ‰ COMBINER has been accepted by TIP 2026!

⬆ Back to top

✨ Key Features

Our framework introduces three core modules to overcome attribute-level semantic entanglement and cross-modal inconsistency:

  • πŸ” Adaptive Semantic Disentanglement (ASD): Capable of adaptively disentangling attribute features based on multimodal primitive features, addressing the entanglement in attribute-level semantics.
  • πŸ”— Unified Prototype-based Composition (UPC): Constructs Cross-modal Unified Prototypes (CUP) and serves as a shared dictionary to eliminate modal heterogeneity and facilitate multimodal feature composition.
  • 🧩 Dual Relations Modeling (DRM): Mines both supervised pairwise relations and unsupervised neighbor relations based on attribute similarity, effectively gathering visually similar and attribute-related samples while pushing away attribute-unrelated distractors.
  • πŸ† SOTA Performance: Demonstrates superior retrieval accuracy and achieves remarkable improvements across both fashion-domain (FashionIQ, Shoes) and open-domain (CIRR) datasets.

⬆ Back to top

πŸ—οΈ Architecture

COMBINER architecture

Figure 3. The overall framework of COMBINER. It consists of (a) Adaptive Semantic Disentanglement, (b) Unified Prototype-based Composition, and (c) Dual Relations Modeling.

⬆ Back to top

πŸ“Š Experiment Results

COMBINER consistently outperforms existing baselines (e.g., DQU-CIR, SPRC, SADN) across all standard metrics on three major benchmarks.

1. FashionIQ & Shoes Datasets

(Evaluated using Recall@K)

FashionIQ and Shoes Results FashionIQ and Shoes Results

2. CIRR Dataset

(Evaluated using R@K and R_subset@K)

CIRR Results

⬆ Back to top


πŸ“‘ Table of Contents


πŸ“‚ Repository Structure

Our codebase is highly modular. Here is a brief overview of the core files:

COMBINER/
β”œβ”€β”€ cirr_test_submission.py# πŸ“„ CIRR submission file generator
β”œβ”€β”€ datasets_openclip.py   # πŸ“š Dataset loader and preprocessing
β”œβ”€β”€ model.py               # 🧠 COMBINER model architecture and forward pass
β”œβ”€β”€ test.py                # πŸ§ͺ Evaluation/Test entry point
β”œβ”€β”€ train.py               # πŸš€ Training entry point
β”œβ”€β”€ utils.py               # πŸ› οΈ Utility functions (metrics, helper methods)
└── README.md              # πŸ“ Documentation

πŸš€ Installation

1. Clone the repository

git clone https://github.com/iLearn-Lab/TIP26-COMBINER.git
cd COMBINER

2. Setup Environment We recommend using Conda to manage your environment:

conda create -n combiner_env python=3.8.10
conda activate combiner_env

# Install PyTorch (Ensure it matches your CUDA version. Tested on PyTorch 2.0.0, NVIDIA A40 48G)
pip install torch==2.0.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

πŸ“‚ Data Preparation

COMBINER is evaluated on FashionIQ, Shoes, and CIRR. Please download the datasets from their official sources and arrange them as follows.

Shoes

Download the Shoes dataset following the instructions in the official repository.

After downloading the dataset, ensure that the folder structure matches the following:

β”œβ”€β”€ Shoes
β”‚   β”œβ”€β”€ captions_shoes.json
β”‚   β”œβ”€β”€ eval_im_names.txt
β”‚   β”œβ”€β”€ relative_captions_shoes.json
β”‚   β”œβ”€β”€ train_im_names.txt
β”‚   β”œβ”€β”€ [womens_athletic_shoes | womens_boots | ...]
|   |   β”œβ”€β”€ [0 | 1]
|   |   β”œβ”€β”€ [img_womens_athletic_shoes_375.jpg | descr_womens_athletic_shoes_734.txt | ...]

FashionIQ

Download the FashionIQ dataset following the instructions in the official repository.

After downloading the dataset, ensure that the folder structure matches the following:

β”œβ”€β”€ FashionIQ
β”‚   β”œβ”€β”€ captions
|   |   β”œβ”€β”€ cap.dress.[train | val | test].json
|   |   β”œβ”€β”€ cap.toptee.[train | val | test].json
|   |   β”œβ”€β”€ cap.shirt.[train | val | test].json

β”‚   β”œβ”€β”€ image_splits
|   |   β”œβ”€β”€ split.dress.[train | val | test].json
|   |   β”œβ”€β”€ split.toptee.[train | val | test].json
|   |   β”œβ”€β”€ split.shirt.[train | val | test].json

β”‚   β”œβ”€β”€ dress
|   |   β”œβ”€β”€ [B000ALGQSY.jpg | B000AY2892.jpg | B000AYI3L4.jpg |...]

β”‚   β”œβ”€β”€ shirt
|   |   β”œβ”€β”€ [B00006M009.jpg | B00006M00B.jpg | B00006M6IH.jpg | ...]

β”‚   β”œβ”€β”€ toptee
|   |   β”œβ”€β”€ [B0000DZQD6.jpg | B000A33FTU.jpg | B000AS2OVA.jpg | ...]

CIRR

Download the CIRR dataset following the instructions in the official repository.

After downloading the dataset, ensure that the folder structure matches the following:

β”œβ”€β”€ CIRR
β”‚   β”œβ”€β”€ train
|   |   β”œβ”€β”€ [0 | 1 | 2 | ...]
|   |   |   β”œβ”€β”€ [train-10108-0-img0.png | train-10108-0-img1.png | ...]

β”‚   β”œβ”€β”€ dev
|   |   β”œβ”€β”€ [dev-0-0-img0.png | dev-0-0-img1.png | ...]

β”‚   β”œβ”€β”€ test1
|   |   β”œβ”€β”€ [test1-0-0-img0.png | test1-0-0-img1.png | ...]

β”‚   β”œβ”€β”€ cirr
|   |   β”œβ”€β”€ captions
|   |   |   β”œβ”€β”€ cap.rc2.[train | val | test1].json
|   |   β”œβ”€β”€ image_splits
|   |   |   β”œβ”€β”€ split.rc2.[train | val | test1].json

πŸƒβ€β™‚οΈ Quick Start

1. Training the Model

Train COMBINER on Shoes, FashionIQ, or CIRR using the train.py script.

General Training Command:

python3 train.py \
    --model_dir ./checkpoints/ \
    --dataset {shoes, fashioniq, cirr} \
    --cirr_path "path/to/CIRR" \
    --fashioniq_path "path/to/FashionIQ" \
    --shoes_path "path/to/Shoes"

2. Test for CIRR

To generate the predictions file for uploading to the CIRR Evaluation Server using our model, please execute the following command:

python src/cirr_test_submission.py model_path

(Where model_path is the path to the COMBINER model checkpoint on CIRR, e.g., "checkpoints/COMBINER_CIRR.pt")

🀝 Acknowledgements

This project builds upon recent advancements in Composed Image Retrieval and Vision-Language pre-training. We express our sincere gratitude to the open-source community for their contributions.

βœ‰οΈ Contact

If you have any questions, feel free to open an issue or reach out to us at: lizixu.cs@gmail.com ☺️

πŸ“β­οΈ Citation

If you find our work or this code useful in your research, please consider leaving a Star⭐️ or CitingπŸ“ our paper πŸ₯°. Your support is our greatest motivation!

@article{li2025combiner,
  title={COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations},
  author={Li, Zixu and Hu, Yupeng and Chen, Zhiwei and Wen, Haokun and Song, Xuemeng and Nie, Liqiang},
  journal={IEEE TIP},
  year={2025}
}

πŸ”— Related Projects

Ecosystem & Other Works from our Team

TEMA
TEMA (ACL'26)
Paper | Project | Code
ConeSep
ConeSep (CVPR'26)
Paper | Project | Code | Blog Post (Chinese)
Air-Know
Air-Know (CVPR'26)
Paper | Project | Code | Blog Post (Chinese)
INTENT
INTENT (AAAI'26)
Paper | Project | Code
HABIT
HABIT (AAAI'26)
Paper | Project | Code
ReTrack
ReTrack (AAAI'26)
Paper | Project | Code |
HUD
HUD (ACM MM'25)
Paper | Project | Code |
OFFSET
OFFSET (ACM MM'25)
Paper | Project | Code
ENCODER
ENCODER (AAAI'25)
Paper | Project | Code

🫑 Support & Contributing

We welcome all forms of contributions! If you have any questions, ideas, or find a bug, please feel free to:

  • Open an Issue for discussions or bug reports.
  • Submit a Pull Request to improve the codebase.

⬆ Back to top

About

[TIP 2026] Official repository of TIP 2026 - COMBINER: Composed Image Retrieval Guided by Attribute-based Neighbor Relations.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages