Our paper, Align then Refine: Text-Guided 3D Prostate Lesion Segmentation, has been accepted for presentation at the 48th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE EMBC 2026).
This README provides instructions for using the customized training and inference pipeline in this repository, including multi-encoder design, text conditioning, and attention-based refinement.
Main scripts used in this repo:
tools/train/run_text_single.sh: base text trainer runtools/train/run_text_attn_best.sh: attention run with tuned defaults + pretrained checkpointtools/infer/run_text_test.sh: inference + optional evaluationtools/eval/compute_segmentation_metrics.py: extra metric report
Tools are organized by purpose:
tools/train: training and finetuning launcherstools/infer: prediction/inference launcherstools/eval: postprocessing and metric utilities
cd <PROJECT_ROOT>
# 1) Create the base nnU-Net v2 environment (follow the official nnU-Net setup)
conda create -n nnunetv2_repro python=3.10 -y
conda activate nnunetv2_repro
# install this fork in editable mode
pip install -e .
# 2) Extra dependency used by this text-guided pipeline
# (install if not already present in your nnU-Net environment)
pip install open-clip-torchSet these before training or inference:
export nnUNet_raw=<NNUNET_RAW>
export nnUNet_preprocessed=<NNUNET_PREPROCESSED>
export nnUNet_results=<NNUNET_RESULTS>Examples below use:
- Dataset:
Dataset2203_picai_split - Config:
3d_fullres - Plans:
nnUNetPlans - Fold:
1
Adjust as needed.
# QUICK=0 is closer to full training
QUICK=0 bash tools/train/run_text_single.sh 0 1 Dataset2203_picai_split 3d_fullres nnUNetPlans tverskyArgs:
<GPU_ID> [FOLD] [DATASET] [CONFIG] [PLANS] [LOSS][LOSS]:dice | tversky | focal_tversky(mapped internally to*_topk)
Common env vars:
QUICK(1debug/faster,0fuller run)NNUNET_TRAINER,NNUNET_PRETRAINED_WEIGHTSNNUNET_RESULTS_TAG/NNUNET_RESULTS_DIRNNUNET_ITERS_PER_EPOCH,NNUNET_VAL_ITERSNNUNET_TEXT_PROMPTS,NNUNET_TEXT_MODEL,NNUNET_TEXT_EMBED_DIMNNUNET_TEXT_MODULATION(none|film|gate)NNUNET_USE_ALIGNMENT_HEAD,NNUNET_RETURN_HEATMAPNNUNET_LAMBDA_ALIGN,NNUNET_LAMBDA_HEATNNUNET_AUX_WARMUP_EPOCHS,NNUNET_AUX_RAMP_EPOCHS
Loss options for run_text_single.sh (6th argument):
dicetverskyfocal_tversky
Examples:
# Dice + TopKCE
QUICK=0 bash tools/train/run_text_single.sh 0 1 Dataset2203_picai_split 3d_fullres nnUNetPlans dice
# Tversky + TopKCE
QUICK=0 bash tools/train/run_text_single.sh 0 1 Dataset2203_picai_split 3d_fullres nnUNetPlans tversky
# Focal-Tversky + TopKCE
QUICK=0 bash tools/train/run_text_single.sh 0 1 Dataset2203_picai_split 3d_fullres nnUNetPlans focal_tverskyNotes:
tools/train/run_text_attn_best.shlaunches withtversky.- You can still override the internal loss selection via env, e.g.
NNUNET_TEXT_LOSS=dice_topkbefore launch.
Expected output model folder:
<NNUNET_RESULTS>/
Dataset2203_picai_split/
nnUNetTrainerMultiEncoderUNetText__nnUNetPlans__3d_fullres/fold_1
# auto-loads fold checkpoint_best.pth from Stage A if present
QUICK=0 bash tools/train/run_text_attn_best.sh 0 1 Dataset2203_picai_split 3d_fullres nnUNetPlansArgs:
<GPU_ID> [FOLD] [DATASET] [CONFIG] [PLANS] [PRETRAINED_CKPT]
Key env vars:
NNUNET_CROSS_GAMMA_INIT(default0.10)NNUNET_CROSS_ALPHA(default0.12)NNUNET_CROSS_TAU(default0.44)ATTN_WARMUP_EPOCHS(default0)BASE_LR_REFINER(default5e-4)NNUNET_PRETRAINED_WEIGHTS(override preload checkpoint)NNUNET_RESULTS_TAG(output experiment tag)
bash tools/infer/run_text_test.sh <INPUT_DIR> <OUTPUT_DIR> 0 <MODEL_PATH>INPUT_DIR: nnUNet-style input images directoryOUTPUT_DIR: prediction output directory0: GPU id
Args:
<INPUT_DIR> <OUTPUT_DIR> [GPU_ID] [MODEL_PATH][MODEL_PATH]: optional model path. This can be a checkpoint file, afold_*directory, or a model/results directory.
Key env vars:
NNUNET_DATASET,NNUNET_CONFIG,NNUNET_PLANSNNUNET_TEST_TRAINER,NNUNET_FOLD,NNUNET_CHECKPOINT_FILENNUNET_SKIP_EVAL,NNUNET_GT_DIR
python tools/eval/compute_segmentation_metrics.py \
--pred-dir <OUTPUT_DIR> \
--gt-dir <GT_DIR> \
--output-dir <METRIC_OUT_DIR>This project is built on top of the excellent nnU-Net framework:
We gratefully acknowledge and thank the nnU-Net authors and contributors for open-sourcing and maintaining this powerful toolkit.
If you find this work useful, please cite:
@misc{sun2026alignrefinetextguided3d,
title={Align then Refine: Text-Guided 3D Prostate Lesion Segmentation},
author={Cuiling Sun and Linkai Peng and Adam Murphy and Elif Keles and Hiten D. Patel and Ashley Ross and Frank Miller and Baris Turkbey and Andrea Mia Bejar and Halil Ertugrul Aktas and Gorkem Durak and Ulas Bagci},
year={2026},
eprint={2604.18713},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.18713},
}If you use our codebase, please also cite:
@article{isensee2021nnu,
title = {nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation},
author = {Isensee, Fabian and Jaeger, Paul F. and Kohl, Simon A. A. and Petersen, Jens and Maier-Hein, Klaus H.},
journal = {Nature Methods},
volume = {18},
number = {2},
pages = {203--211},
year = {2021},
doi = {10.1038/s41592-020-01008-z}
}