Skip to content

Commit cc21df6

Browse files
committed
Minor fix to ensure samples from all classes are present in spider_thorax validation set
1 parent 8fc258a commit cc21df6

2 files changed

Lines changed: 14 additions & 3 deletions

File tree

src/thunder/config/dataset/spider_thorax.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,10 @@ compatible_tasks:
1313
"adversarial_attack",
1414
]
1515

16-
nb_train_samples: 50413
17-
nb_val_samples: 12906
16+
nb_train_samples: 49562
17+
nb_val_samples: 13757
1818
nb_test_samples: 14988
19-
md5sum: "5d91551c7b0a4d82639411fcc2af847e"
19+
md5sum: "5d1aab1441a442b1cc4a6ed6c14d4718"
2020
image_sizes: [[224, 224]]
2121
mpp: 0.5
2222
cancer_type: skin

src/thunder/datasets/dataset/spider_thorax.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,17 @@ def create_splits_spider_thorax(base_folder: str, dataset_cfg: dict) -> None:
9797
val_slide_names = random.sample(
9898
unique_slide_names, int(0.2 * len(unique_slide_names))
9999
)
100+
101+
# Class 12 is only present on 4 slides in the spider_thorax dataset
102+
# (significantly less represented than other classes)
103+
# so we need to make sure to have one in val set and the 3 others in train set
104+
for slide_name in ["slide_0021", "slide_0048", "slide_0123", "slide_0178"]:
105+
# Looping over the only 4 slides containing class 12 samples
106+
if slide_name != "slide_0048" and slide_name in val_slide_names:
107+
val_slide_names.remove(slide_name)
108+
elif slide_name == "slide_0048" and slide_name not in val_slide_names:
109+
val_slide_names.append(slide_name)
110+
100111
val_mask = np.array([slide_name in val_slide_names for slide_name in slide_names])
101112

102113
(

0 commit comments

Comments
 (0)