MoIR: Multi-modal Information Router

Paper: https://arxiv.org/abs/2604.16264

Our lab: https://alregib.ece.gatech.edu/

Reference implementation of MoIR: Multi-modal Information Router, an information-level fusion method for mitigating modality dominance in Vision-Language Models.

MoIR sits between modality-specific encoders and the LLM decoder. It uses a truncated SVD on each modality's token sequence to identify less-informative channels and then routes complementary information from the other modality into those channels through learnable, per-channel gates. By rebalancing information before fusion, MoIR shifts modality dominance through the information availability of inputs rather than purely through attention.

                    +-------------------------------------+
                    |             LLM Decoder             |
                    +-------------------+-----------------+
                                        ^
                          [text tokens] | [vision tokens]
                                        |
                    +-------------------+-----------------+
                    |       MoIR (Information Router)     |
                    |   - per-channel SVD informativeness |
                    |   - bottom-k' channel selection     |
                    |   - learnable α gates per channel   |
                    +-------------------+-----------------+
                          ^                          ^
                          |                          |
                  text encoder                vision encoder

Repository Layout

MoIR/
├── moir/
│   ├── __init__.py
│   ├── router.py        # MoIR module (Section 3.2 - 3.3 of the paper)
│   ├── inject.py        # Injection into LLaVA-1.5 and Qwen2.5-VL
│   ├── model.py         # Backbone loaders + LoRA wiring
│   ├── dataloader.py    # ScienceQA / VizWiz / MMBench-Video datasets
│   ├── metrics.py       # MDI, AEI, Effective Rank (Δ_I, Δ_T)
│   ├── trainer.py       # Generic LoRA + MoIR training loop
│   └── utils.py         # HF cache, seeding, prompt builders
├── configs/
│   ├── scienceqa_llava7b.yaml
│   ├── scienceqa_llava13b.yaml
│   ├── vizwiz_llava7b.yaml
│   └── mmbench_video_qwen.yaml
├── scripts/
│   ├── train_scienceqa.sh
│   ├── train_vizwiz.sh
│   └── train_mmbench_video.sh
├── train.py             # train entry point
├── inference.py         # generate predictions
├── eval.py              # MDI / AEI / EffRank evaluation
└── main.py              # train + eval pipeline

Installation

pip install -r requirements.txt

Usage

Train MoIR on ScienceQA with LLaVA-1.5-7B (attention-layer LoRA placement):

python train.py --config configs/scienceqa_llava7b.yaml --placement llm_attn

Evaluate the trained checkpoint (MDI / AEI / Rank Δ):

python eval.py --config configs/scienceqa_llava7b.yaml \
               --checkpoint outputs/scienceqa_llava7b/llm_attn/final

Run inference only:

python inference.py --config configs/scienceqa_llava7b.yaml \
                    --checkpoint outputs/scienceqa_llava7b/llm_attn/final \
                    --output predictions.jsonl

End-to-end (train then evaluate):

python main.py --config configs/scienceqa_llava7b.yaml --placement llm_attn

Hyperparameters

The defaults follow the paper:

Parameter	Value
LoRA rank `r`	16
LoRA `α`	32
LoRA dropout	0.05
Optimizer	AdamW
Learning rate	2e-4
Weight decay	0
Batch size	8 (effective via grad accumulation)
Epochs	10
Exchange ratio `k'`	0.10
Routing gate init `α`	0.5
Max sequence length	2048

Citation

@article{kim2026moir,
  title  = {Information Router for Mitigating Modality Dominance in Vision-Language Models},
  author = {Kim, Seulgi and Prabhushankar, Mohit and AlRegib, Ghassan},
  year   = {2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoIR: Multi-modal Information Router

Paper: https://arxiv.org/abs/2604.16264

Our lab: https://alregib.ece.gatech.edu/

Repository Layout

Installation

Usage

Hyperparameters

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
moir		moir
scripts		scripts
README.md		README.md
eval.py		eval.py
inference.py		inference.py
main.py		main.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

MoIR: Multi-modal Information Router

Paper: https://arxiv.org/abs/2604.16264

Our lab: https://alregib.ece.gatech.edu/

Repository Layout

Installation

Usage

Hyperparameters

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages