fix submodule; update README (data preparation, training)

function2-llx · function2-llx · commit b746fe42c08d · 2025-03-10T20:14:19.000+08:00
diff --git a/README.md b/README.md
@@ -8,12 +8,30 @@ cd MMMM
 mamba env create -f environment.yaml
 BUILD_MONAI=1 pip install --no-build-isolation -e third-party/LuoLib/third-party/MONAI
 mamba activate mmmm
+mkdir -p $CONDA_PREFIX/etc/conda/activate.d
 echo \
-"export PYTHONPATH=$PWD:$PWD/third-party/LuoLib
+"export PYTHONPATH=$PWD:$PYTHONPATH
 export BUILD_MONAI=1" \
 >> $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
-
 ```
-, and build xformers from source / install pre-built wheel.
 
-Manually modify `pad_nd` according to: https://github.com/Project-MONAI/MONAI/issues/7842
+## Data Preparation
+
+Download the datasets (MIMIC-CXR, CT-RATE, etc.) and extract them to `data/origin/<data type>/<dataset name>`, where `<data type>` can be `local` for image datasets with localization (bounding boxes, segmentation) annotations and `vision-language` for VQA and radiology report datasets.
+
+Then execute pre-processing scripts for each dataset. For instance, for MIMIC-CXR, execute the script at `scripts/data/vl/MIMIC-CXR/MIMIC-CXR.py` to pre-process the data. After the pre-processing is finished, the pre-processed data are placed at `data/processed/vision-language/MIMIC-CXR`, where `<split>.json` specifies the data items for each split.
+
+## Training
+
+[THUDM/cogvlm-chat-hf](https://huggingface.co/THUDM/cogvlm-chat-hf) is used as the base VLM. 
+
+The example commands for running the three-stage training of VividMed are as follows. Please adapt your number of devices and batch size accordingly. Note that we disable `torch.compile` due the compatibility issue with our dependencies, enable it if you're ready to address it.
+
+```zsh
+# Stage 1: Visual Grounding Pre-training
+python scripts/cli.py fit -c conf/phase-vg/fit.yaml --compile false --data.dataloader.train_batch_size ... --trainer.accumulate_grad_batches ... --seed_everything $RANDOM --model.freeze_sam false --model.freeze_isam false
+# Stage 2: Medical Visual Instruction Tuning
+python scripts/cli.py fit -c conf/phase-vlm/fit.yaml --compile false --data.dataloader.train_batch_size ... --trainer.accumulate_grad_batches ... --seed_everything $RANDOM
+# Stage 3: Alignment (grounded report generate)
+python scripts/cli.py fit -c conf/phase-grg/fit.yaml --compile false --data.dataloader.train_batch_size ... --trainer.accumulate_grad_batches ... --seed_everything $RANDOM --model.freeze_sam false --model.freeze_isam false
+```
diff --git a/luolib b/luolib
@@ -1 +1 @@
-third-party/LuoLib/luolib
+third-party/LuoLib/src/luolib
diff --git a/third-party/LuoLib b/third-party/LuoLib
@@ -1 +1 @@
-Subproject commit 17f0d6db132aa6cb6248fe0a46e1ea8b0c15d137
+Subproject commit 26255a6915b75cf3a2b6ec9459a56a98b66bcc16

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-third-party/LuoLib/luolib`
	`1`	`+third-party/LuoLib/src/luolib`