update READMEs

ChristianHinge · ChristianHinge · commit 3e3aa3e59b19 · 2026-03-26T02:48:51.000+01:00
diff --git a/README.md b/README.md
@@ -121,9 +121,9 @@ A pre-built Docker image is available for download (see [website](https://bic-ma
 **Direct Python usage:**
 
 ```bash
-python src/baseline/model.py <input_dir> <output_ct.nii.gz>
+python src/baseline/predict.py <features_dir> <output_ct.nii.gz>
 # Example:
-python src/baseline/model.py data/sub-000/features/ results/sub-000/ct_pred.nii.gz
+python src/baseline/predict.py data/sub-000/features/ results/sub-000/ct_pred.nii.gz
 ```
 
 ---
@@ -241,20 +241,28 @@ Five metrics compare predicted PET and CT outputs against the ground truth:
 | Organ Bias (MARE) | `organ_bias` | Mean absolute relative error of mean SUV in 8 organs: brain, liver, spleen, heart, pancreas, muscle, adipose, extremities | TotalSegmentator organ labels |
 | CT MAE | `ct_mae` | Mean absolute error of attenuation coefficients (μ at 511 keV) between predicted and ground-truth CT after HU→μ conversion | Body mask, excluding ±4 cm around liver|
 
-**Run all metrics:**
+**Evaluate a single subject:**
 
 ```bash
-python src/evaluation/eval.py <subject_dir> <pred_pet.nii.gz> <pred_ct.nii.gz> -all
+python src/evaluation/eval_case.py \
+  --subject_path <subject_dir> \
+  --pred_pet <pred_pet.nii.gz> \
+  --pred_ct <pred_ct.nii.gz>
 ```
 
-**Run a single metric:**
+`--pred_pet` and `--pred_ct` are both optional — omit either to skip PET or CT metrics.
+Note: Brain Outlier Score is a dataset-level metric and requires multiple subjects (see below).
+
+**Evaluate a full dataset (matches challenge leaderboard):**
 
 ```bash
-python src/evaluation/eval.py <subject_dir> <pred_pet.nii.gz> <pred_ct.nii.gz> \
-  -specific_metric <metric>
-# <metric>: whole_body_mae | brain_outlier | organ_bias | ct_mae
+python src/evaluation/eval_dataset.py \
+  --dataset_path <dataset_dir> \
+  --pred_dir <predictions_dir>
 ```
 
+`<predictions_dir>` must contain one sub-folder per subject, each with `ct.nii.gz` and `pet.nii.gz`.
+
 ---
 
 ## 📬 Submission
diff --git a/src/baseline/README.md b/src/baseline/README.md
@@ -19,11 +19,11 @@ The baseline model has already been pretrained
 Run with Docker:
 
 ```bash
-docker run ghcr.io/bic-mac-challenge/baseline \
- --memory 120g \
+docker run --rm \
+  --memory 120g \
   -v /path/to/sub-XXX/features:/data/features:ro \
   -v /path/to/output:/data/output \
-  bic-mac-baseline
+  ghcr.io/bic-mac-challenge/baseline
 ```
 
 Or without Docker:
diff --git a/src/evaluation/README.md b/src/evaluation/README.md
@@ -38,32 +38,20 @@ derived from CT after HU→μ conversion.
 
 # Requirements
 
--   Python **3.10+**
+-   Python **3.12**
 -   [`uv`](https://github.com/astral-sh/uv) for environment and
     dependency management
 
 ------------------------------------------------------------------------
 
 # Installation
 
-Clone the repository:
+Clone the repository and install dependencies:
 
 ``` bash
 git clone <repository_url>
 cd <repository_folder>
-```
-
-Create and activate a virtual environment:
-
-``` bash
-uv venv
-source .venv/bin/activate
-```
-
-Install required dependencies:
-
-``` bash
-uv pip install numpy nibabel
+uv sync
 ```
 
 ------------------------------------------------------------------------
@@ -115,55 +103,72 @@ Both must:
 
 # Running the Evaluation
 
-Run the evaluation script with:
+There are two entry points: `eval_case.py` for a single subject and `eval_dataset.py` for a
+full dataset (this matches the challenge leaderboard computation, including the dataset-level
+Brain Outlier Score).
+
+## Single subject
 
 ``` bash
-python eval.py --subject_path <subject_path> --pred_pet <pred_pet> --pred_ct <pred_ct> [-all | -specific_metric <metric_name>]
+python eval_case.py \
+  --subject_path <subject_path> \
+  --pred_pet <pred_pet.nii.gz> \
+  --pred_ct <pred_ct.nii.gz>
 ```
 
-------------------------------------------------------------------------
+`--pred_pet` and `--pred_ct` are both optional — omit either to skip the corresponding metrics.
 
-# Arguments
+Note: Brain Outlier Score is a dataset-level metric and is not computed by `eval_case.py`.
 
-  Argument             Description
-  -------------------- ---------------------------------
-  `--subject_path`     Path to the subject directory
-  `--pred_pet`         Path to the predicted PET NIfTI
-  `--pred_ct`          Path to the predicted CT NIfTI
-  `-all`               Run all evaluation metrics
-  `-specific_metric`   Run only a single metric
+## Full dataset
+
+``` bash
+python eval_dataset.py \
+  --dataset_path <dataset_path> \
+  --pred_dir <predictions_dir>
+```
+
+`<predictions_dir>` must contain one sub-folder per subject, each with `ct.nii.gz` and `pet.nii.gz`.
 
 ------------------------------------------------------------------------
 
-# Example
+# Arguments
 
-Run all metrics:
+## `eval_case.py`
 
-``` bash
-python eval.py --subject_path /data/sub-000 --pred_pet /results/pred_pet.nii.gz --pred_ct /results/pred_ct.nii.gz -all
-```
+  Argument             Description
+  -------------------- ---------------------------------
+  `--subject_path`     Path to the subject directory (must contain `ct-label/` and `pet-label/`)
+  `--pred_pet`         Path to the predicted PET NIfTI (optional)
+  `--pred_ct`          Path to the predicted CT NIfTI (optional)
 
-Run only CT μ-MAE:
+## `eval_dataset.py`
 
-``` bash
-python eval.py --subject_path /data/sub-000 --pred_pet /results/pred_pet.nii.gz --pred_ct /results/pred_ct.nii.gz -specific_metric ct_mae
-```
+  Argument             Description
+  -------------------- -------------------------------------------------------
+  `--dataset_path`     Root directory containing subject folders with ground-truth labels
+  `--pred_dir`         Directory with one sub-folder per subject (each containing `ct.nii.gz` and `pet.nii.gz`)
+  `--subjects`         Optional explicit list of subject IDs (default: all sub-folders in pred_dir)
 
 ------------------------------------------------------------------------
 
-# Available Metrics
+# Example
 
-The following metrics can be executed individually:
+Evaluate a single subject:
 
-    whole_body_mae
-    brain_outlier
-    organ_bias
-    ct_mae
+``` bash
+python eval_case.py \
+  --subject_path /data/sub-000 \
+  --pred_pet /results/sub-000/pet.nii.gz \
+  --pred_ct /results/sub-000/ct.nii.gz
+```
 
-Example:
+Evaluate a full dataset:
 
 ``` bash
-python eval.py --subject_path <subject_path> --pred_pet <pred_pet> --pred_ct <pred_ct> -specific_metric whole_body_mae
+python eval_dataset.py \
+  --dataset_path /data/bic-mac/train \
+  --pred_dir /results/my_method
 ```
 
 ------------------------------------------------------------------------
@@ -174,9 +179,8 @@ python eval.py --subject_path <subject_path> --pred_pet <pred_pet> --pred_ct <pr
     Subject: sub-000
     ----------------------------------------------------
     Whole-body SUV MAE        : 0.124512
-    Brain Outlier Score       : 0.912341
     Organ Bias                : 6.382100%
-    CT μ-MAE                  : 0.000218
+    CT MAE                    : 0.000218
     ====================================================
 
 ------------------------------------------------------------------------