Skip to content

PRAISELab-PicusLab/_HI_Giuseppe-Mastellone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Multimodal Medical Fact-Checking with Information Bottleneck

This repository contains the code for the thesis:

"A Vision-Language Approach to Multimodal Fact-Checking in Healthcare"
Giuseppe Mastellone — University of Naples Federico II, 2026

The work introduces a three-head classifier for medical fact-checking on the MC-MMH dataset, a novel framework combining MultiCare and MM-Health through a data generation and validation piepline.

Repository Structure

├── dataset_generation/
│   ├── Swapper.py                         # Stochastic greedy image swap algorithm
│   ├── swap_quality_check.py              # Visual quality check for swapped pairs
│   ├── MM_Health_Claim_Generation.ipynb    # Claim generation — MM-Health
│   ├── MultiCare_Claim_Generation.ipynb # Claim generation — MultiCare
│   ├── Claim_Validation.ipynb             # NLI-based claim validation pipeline
│   └── NEW_DATA_UNIFICATION.py            # Merges MC + MMHL into train/val/test splits
│
├── training/
│   ├── embedding-extractor.ipynb                        # Token-level embedding extraction 
│   ├── Multi_Task_and_Modal_classifier.ipynb            # Classifier — final configuration
│   ├── Fakeddit-Classifier.ipynb                        # Training on Fakeddit 
│   └── vlm-zero-shot-evaluation.ipynb                   # VLM zero-shot baseline evaluation 
│
└── README.md

Dataset: MC-MMH

MC-MMH (MultiCare–MM-Health) is a multimodal medical fact-checking benchmark with four classes:

Class Description
TRUE AND CONCORDANT True claim, consistent image
FALSE AND CONCORDANT False claim, internally consistent
FALSE SWAPPED Claim from a different clinical context (image swapped)
FALSE TEXT Correct image, false textual report

Statistics: ~3,200 samples — Train 2,226 / Val 478 / Test 477 — 50% TRUE / 50% FALSE.


Dataset Generation Pipeline

1. Image swap (Swapper.py)

Generates FALSE_SWAPPED samples by pairing MultiCare images with semantically distant counterparts via a stochastic greedy algorithm based on BiomedCLIP cosine similarity (τ_min=0.10, τ_max=0.50, top-k=20).

# Visually inspect swap quality after running Swapper.py
python swap_quality_check.py \
    --json   /path/to/dataset.json \
    --report /path/to/swap_report.json \
    --images /path/to/images/ \
    --n 20 --seed 42

2. Claim generation — MultiCare (MultiCare_Claim_Generator.ipynb)

Generates claims for TRUE AND CONCORDANT, FALSE TEXT, and FALSE SWAPPED using MedGemma-4B with class-specific prompts.

3. Claim generation — MM-Health (MM-Health_Claim_Generator.ipynb)

Generates claims for TRUE AND CONCORDANT and FALSE AND CONCORDANT from MM-Health articles using MedGemma-4B.

4. NLI validation (Claim_Validation.ipynb)

Filters generated claims via Retrieve-and-Classify NLI:

  • Embedding model: pritamdeka/BioBERT-mnli-snli-scinli-scitail-mednli-stsb
  • NLI model: lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli

5. Dataset unification (NEW_DATA_UNIFICATION.py)

Merges MultiCare and MM-Health validated claims into stratified train/val/test splits (70%/15%/15%).

Model

Architecture

  • Visual encoder: MedSigLIP-448 (T_v=1024, D_v=1152, no CLS token)
  • Textual encoder: MedEmbed-base (T_t=256, D_t=768)
  • Fusion: CrossAttentionFusion (per head, 4 attention heads)
  • IB gate: per-head per-modality soft mask, sigmoid clamped to [0.05, 0.95]
  • Heads: FACT (TRUE/FALSE), ALIGN (CONCORDANT/DISCORDANT), TYPE (NONE/TXT_ERR/IMG_SWAP)
  • Loss: Focal + KL-IB (β=0.05) + InfoNCE (γ=0.10) + Border penalty (δ=0.1)
  • Decision routing: TYPE head override at threshold θ*=0.75 (val-calibrated)

Results (MC-MMH test set)

Metric Value
F1_fact 0.709
F1_align 0.725
F1_type 0.672
Composite C (val) 1.021
ECE (test) 0.095
R@1 Image→Text 0.828
Median Rank 1.0

Baseline comparison

Method F1_fact
Majority class 0.334
BiomedCLIP zero-shot 0.486
Unimodal image-only 0.627
Unimodal text-only 0.673
Late fusion 0.692
Proposed 0.709
Early fusion 0.740

Cross-domain transfer

Direction F1_fact Transfer gap
Fakeddit → MC-MMH 0.514 −0.196
MC-MMH → Fakeddit 0.304 −0.405

Notebooks

All notebooks support Google Colab and Kaggle environments. Update the path configuration cell (marked with # ← update) before running.

Requirements

pip install torch transformers open_clip_torch sentence-transformers \
            scikit-learn numpy pandas tqdm pillow huggingface_hub \
            accelerate bitsandbytes datasets

A HuggingFace token with access to gated models is required for MedGemma-4B and MedSigLIP-448:

from huggingface_hub import login
login(token="YOUR_HF_TOKEN")

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors