Skip to content

Commit 9b232fd

Browse files
committed
Updated readme and leaderboard
1 parent 92803e6 commit 9b232fd

4 files changed

Lines changed: 127 additions & 51 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ We introduce **THUNDER**, a comprehensive benchmark designed to rigorously compa
2727

2828
## News
2929

30+
* **2025-10-06**: As requested (https://github.com/MICS-Lab/thunder/issues/1), a new *zero-shot classification* task to evaluate VLMs was included into THUNDER. Example command: `thunder benchmark keep spider_breast zero_shot_vlm`. See the dedicated zero-shot classification leaderboard [here](https://mics-lab.github.io/thunder/leaderboards/).
3031
* **2025-09-30**: Patch-level SPIDER datasets have been integrated into THUNDER. See the dedicated SPIDER leaderboard [here](https://mics-lab.github.io/thunder/leaderboards/).
3132
* **2025-09-18**: THUNDER was accepted to **NeurIPS 2025 Datasets & Benchmarks Track** as a **Spotlight** presentation!
3233

docs/javascripts/rank-table.js

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,22 @@ document$.subscribe(() => {
22
const tables = [
33

44
{
5-
id: '#rankTable',
5+
id: '#ranksumTable',
66
rankCol: 9,
77
radix: 10
88
},
99

1010
{
11-
id: '#rankTable2',
11+
id: '#spiderTable',
1212
rankCol: 12,
1313
radix: 10
14-
}
14+
},
15+
16+
{
17+
id: '#zeroshotTable',
18+
rankCol: 19,
19+
radix: 10
20+
},
1521
];
1622

1723
const range = (start, end) => Array.from({

docs/leaderboards.md

Lines changed: 54 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22
title: Leaderboards
33
---
44

5-
# 🏆 Rank-sum Leaderboard
5+
# Updates
6+
* **2025-09-01**: **[Rank-sum Leaderboard]** Segmentation results were updated (small changes based on improved performance for all models on the *ocelot* dataset only -> [Related commit](https://github.com/MICS-Lab/thunder/commit/5f6d6e7cdd6a1df5affed2dac47233f80ce5a205)). Segmentation and global rankings do no thus match exactly (a few small differences only) Table 4 from the current version of our [arXiv paper](https://arxiv.org/abs/2507.07860). The paper will be updated soon.
7+
* **2025-09-30**: **[SPIDER Leaderboard]** Four SPIDER datasets have been integrated into thunder. Results associated to them have not been integrated into the rank-sum leaderboard (only datasets presented in our [arXiv paper](https://arxiv.org/abs/2507.07860) are aggregated in the rank-sum learderboard), but we have instead created a leaderboard dedicated to SPIDER datasets below.
8+
* **2025-10-06**: **[Zero-shot VLM Classification Leaderboard]** A new zero-shot classification task was integrated into THUNDER. Results are presented in a dedicated leaderboard below.
69

7-
### Updates
8-
* **2025-09-01**: Segmentation results were updated (small changes based on improved performance for all models on the *ocelot* dataset only -> [Related commit](https://github.com/MICS-Lab/thunder/commit/5f6d6e7cdd6a1df5affed2dac47233f80ce5a205)). Segmentation and global rankings do no thus match exactly (a few small differences only) Table 4 from the current version of our [arXiv paper](https://arxiv.org/abs/2507.07860). The paper will be updated soon.
9-
* **2025-09-30**: Four SPIDER datasets have been integrated into thunder. Results associated to them have not been integrated into this rank-sum leaderboard (only datasets presented in our [arXiv paper](https://arxiv.org/abs/2507.07860) are aggregated in the rank-sum learderboard), but we have instead created a leaderboard dedicated to SPIDER datasets below.
10+
# 🏆 Rank-sum Leaderboard
1011

1112
<div class="table-responsive-sm">
12-
<table id="rankTable" class="table table-hover table-bordered table-sm nowrap">
13-
<caption>Lower Rank-sum = better overall performance</caption>
13+
<table id="ranksumTable" class="table table-hover table-bordered table-sm nowrap">
1414
<thead class="align-middle text-center">
1515
<tr>
1616
<th>Model</th>
@@ -55,18 +55,17 @@ title: Leaderboards
5555

5656
---
5757

58-
# 🏆 Spider Leaderboard
58+
# 🏆 SPIDER Leaderboard
5959

60-
The considered datasets are:
60+
F1-score on test sets of SPIDER datasets and average across datasets for the *knn* and *linear probing* tasks. The considered datasets are:
6161

6262
* *Br*: [SPIDER-Breast](https://huggingface.co/datasets/histai/SPIDER-breast)
6363
* *Co*: [SPIDER-Colorectal](https://huggingface.co/datasets/histai/SPIDER-colorectal)
6464
* *Sk*: [SPIDER-skin](https://huggingface.co/datasets/histai/SPIDER-skin)
6565
* *Th*: [SPIDER-thorax](https://huggingface.co/datasets/histai/SPIDER-thorax)
6666

6767
<div class="table-responsive-sm">
68-
<table id="rankTable2" class="table table-hover table-bordered table-sm nowrap">
69-
<caption>Performance of founcation models on the SPIDER datasets</caption>
68+
<table id="spiderTable" class="table table-hover table-bordered table-sm nowrap">
7069
<thead class="align-middle text-center">
7170
<tr>
7271
<th rowspan="2">Model</th>
@@ -95,7 +94,7 @@ The considered datasets are:
9594
<tr><td>CONCH</td><td>Histopathology</td><td>VLM</td><td>75.1 (15)</td><td>84.5 (14)</td><td>81.7 (15)</td><td>91.1 (14)</td><td>83.1 (15)</td><td>82.1 (13)</td><td>87.9 (13)</td><td>87.3 (13)</td><td>91.0 (15)</td><td>87.1 (14)</td></tr>
9695
<tr><td>CONCH&nbsp;1.5</td><td>Histopathology</td><td>VLM</td><td>75.9 (14)</td><td>84.2 (15)</td><td>83.3 (12)</td><td>91.4 (10)</td><td>83.7 (14)</td><td>81.6 (14)</td><td>87.4 (15)</td><td>87.0 (15)</td><td>92.1 (14)</td><td>87.0 (15)</td></tr>
9796
<tr><td>KEEP</td><td>Histopathology</td><td>VLM</td><td>79.8 (9)</td><td>87.2 (8)</td><td>87.2 (9)</td><td>93.1 (4)</td><td>86.9 (9)</td><td>85.6 (11)</td><td>89.7 (9)</td><td>89.3 (10)</td><td>93.8 (10)</td><td>89.6 (10)</td></tr>
98-
<tr><td>MSUK</td><td>Histopathology</td><td>VLM</td><td>77.2 (12)</td><td>85.7 (11)</td><td>82.5 (14)</td><td>91.1 (13)</td><td>84.1 (13)</td><td>80.6 (15)</td><td>87.9 (14)</td><td>87.6 (12)</td><td>93.3 (12)</td><td>87.4 (13)</td></tr>
97+
<tr><td>MUSK</td><td>Histopathology</td><td>VLM</td><td>77.2 (12)</td><td>85.7 (11)</td><td>82.5 (14)</td><td>91.1 (13)</td><td>84.1 (13)</td><td>80.6 (15)</td><td>87.9 (14)</td><td>87.6 (12)</td><td>93.3 (12)</td><td>87.4 (13)</td></tr>
9998
<tr><td>PLIP</td><td>Histopathology</td><td>VLM</td><td>69.4 (17)</td><td>79.9 (16)</td><td>74.4 (16)</td><td>86.4 (16)</td><td>77.5 (16)</td><td>77.1 (18)</td><td>84.7 (19)</td><td>82.1 (17)</td><td>88.6 (16)</td><td>83.1 (17)</td></tr>
10099
<tr><td>QUILTNET</td><td>Histopathology</td><td>VLM</td><td>69.9 (16)</td><td>77.7 (19)</td><td>73.4 (17)</td><td>85.3 (17)</td><td>76.6 (17)</td><td>77.0 (19)</td><td>82.9 (21)</td><td>81.2 (20)</td><td>88.5 (18)</td><td>82.4 (19)</td></tr>
101100
<tr><td>DINOv2-B</td><td>Natural-image</td><td>VM</td><td>64.0 (20)</td><td>77.5 (20)</td><td>70.4 (20)</td><td>78.1 (20)</td><td>72.5 (20)</td><td>76.0 (20)</td><td>83.9 (20)</td><td>80.1 (21)</td><td>87.6 (21)</td><td>81.9 (21)</td></tr>
@@ -107,4 +106,47 @@ The considered datasets are:
107106
<tbody>
108107
</table>
109108
</div>
110-
109+
110+
---
111+
112+
# 🏆 Zero-shot VLM Classification Leaderboard
113+
114+
F1-score on test sets of all supported datasets and average across datasets for the *zero-shot classification* task. Only VLM models with publicly released patch-level text encoders are included.
115+
116+
<div class="table-responsive-sm">
117+
<table id="zeroshotTable" class="table table-hover table-bordered table-sm nowrap pivot w-100" style="width:100%">
118+
<thead class="align-middle text-center">
119+
<tr>
120+
<th>Model</th>
121+
<th>Domain</th>
122+
<th>Type</th>
123+
<th>bach</th>
124+
<th>bracs</th>
125+
<th>break_his</th>
126+
<th>ccrcc</th>
127+
<th>crc</th>
128+
<th>esca</th>
129+
<th>mhist</th>
130+
<th>patch_camelyon</th>
131+
<th>tcga_crc_msi</th>
132+
<th>tcga_tils</th>
133+
<th>tcga_uniform</th>
134+
<th>wilds</th>
135+
<th>spider_breast</th>
136+
<th>spider_colorectal</th>
137+
<th>spider_skin</th>
138+
<th>spider_thorax</th>
139+
<th>Avg</th>
140+
</tr>
141+
</thead>
142+
<tbody>
143+
<tr><td>CONCH</td><td>Histopathology</td><td>VLM</td><td>56.1 (3)</td><td>37.9 (2)</td><td>53.6 (1)</td><td>56.9 (2)</td><td>51.8 (4)</td><td>40.1 (1)</td><td>60.8 (2)</td><td>57.8 (3)</td><td>21.6 (4)</td><td>47.4 (5)</td><td>37.9 (2)</td><td>83.2 (2)</td><td>30.7 (3)</td><td>31.4 (3)</td><td>35.1 (3)</td><td>43.0 (3)</td><td>46.6 (3)</td></tr>
144+
<tr><td>KEEP</td><td>Histopathology</td><td>VLM</td><td>63.4 (1)</td><td>34.2 (3)</td><td>45.0 (3)</td><td>69.1 (1)</td><td>80.6 (1)</td><td>33.3 (2)</td><td>41.3 (7)</td><td>71.4 (1)</td><td>15.5 (6)</td><td>55.5 (2)</td><td>44.9 (1)</td><td>89.4 (1)</td><td>37.7 (1)</td><td>44.4 (1)</td><td>60.7 (1)</td><td>51.8 (2)</td><td>52.4 (1)</td></tr>
145+
<tr><td>MUSK</td><td>Histopathology</td><td>VLM</td><td>62.5 (2)</td><td>38.6 (1)</td><td>52.6 (2)</td><td>50.7 (3)</td><td>57.9 (3)</td><td>25.9 (4)</td><td>63.8 (1)</td><td>53.5 (5)</td><td>22.7 (3)</td><td>50.1 (3)</td><td>32.4 (3)</td><td>71.2 (3)</td><td>36.3 (2)</td><td>36.2 (2)</td><td>48.5 (2)</td><td>55.2 (1)</td><td>47.4 (2)</td></tr>
146+
<tr><td>PLIP</td><td>Histopathology</td><td>VLM</td><td>42.7 (4)</td><td>25.9 (5)</td><td>25.6 (5)</td><td>38.2 (5)</td><td>61.4 (2)</td><td>31.5 (3)</td><td>53.9 (5)</td><td>46.1 (7)</td><td>16.3 (5)</td><td>64.1 (1)</td><td>10.3 (5)</td><td>51.0 (6)</td><td>14.6 (4)</td><td>28.3 (4)</td><td>23.5 (4)</td><td>22.6 (4)</td><td>34.7 (4)</td></tr>
147+
<tr><td>QUILTNET</td><td>Histopathology</td><td>VLM</td><td>30.3 (5)</td><td>29.1 (4)</td><td>37.5 (4)</td><td>24.2 (6)</td><td>44.2 (5)</td><td>14.2 (5)</td><td>57.1 (4)</td><td>65.8 (2)</td><td>50.4 (1)</td><td>47.6 (4)</td><td>12.3 (4)</td><td>44.3 (7)</td><td>13.9 (5)</td><td>25.0 (5)</td><td>18.5 (5)</td><td>19.7 (5)</td><td>33.4 (5)</td></tr>
148+
<tr><td>CLIP-B/32</td><td>Natural-image</td><td>VLM</td><td>13.2 (7)</td><td>7.5 (7)</td><td>18.7 (6)</td><td>21.8 (7)</td><td>24.4 (7)</td><td>9.8 (7)</td><td>42.2 (6)</td><td>48.1 (6)</td><td>13.9 (7)</td><td>21.6 (7)</td><td>2.0 (7)</td><td>56.8 (5)</td><td>3.9 (7)</td><td>6.1 (7)</td><td>4.3 (7)</td><td>5.5 (6)</td><td>18.7 (7)</td></tr>
149+
<tr><td>CLIP-L/14</td><td>Natural-image</td><td>VLM</td><td>27.1 (6)</td><td>19.2 (6)</td><td>5.8 (7)</td><td>40.6 (4)</td><td>41.1 (6)</td><td>10.4 (6)</td><td>58.0 (3)</td><td>55.6 (4)</td><td>49.4 (2)</td><td>25.4 (6)</td><td>7.4 (6)</td><td>70.2 (4)</td><td>7.3 (6)</td><td>16.0 (6)</td><td>6.6 (6)</td><td>5.4 (7)</td><td>27.8 (6)</td></tr>
150+
<tbody>
151+
</table>
152+
</div>

docs/stylesheets/extra.css

Lines changed: 63 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -13,27 +13,27 @@
1313
width: 2rem;
1414
}
1515

16-
:is(#rankTable, #rankTable2) {
16+
:is(#ranksumTable, #spiderTable, #zeroshotTable) {
1717
border: 1px solid #ede9f7;
1818
}
1919

20-
:is(#rankTable, #rankTable2).table {
20+
:is(#ranksumTable, #spiderTable, #zeroshotTable).table {
2121
width: 100%;
2222
font-size: 0.6rem;
2323
}
2424

25-
:is(#rankTable, #rankTable2).table thead th {
25+
:is(#ranksumTable, #spiderTable, #zeroshotTable).table thead th {
2626
background: #5330a5;
2727
color: white;
2828
border: 1px solid #ede9f7;
2929
text-align: center;
3030
vertical-align: middle;
31-
padding: 0.6rem 0.75rem;
31+
padding: 0.55rem 0.75rem;
3232
font-weight: 600;
3333
}
3434

35-
:is(#rankTable, #rankTable2).table tbody td,
36-
:is(#rankTable, #rankTable2).table tbody th {
35+
:is(#ranksumTable, #spiderTable, #zeroshotTable).table tbody td,
36+
:is(#ranksumTable, #spiderTable, #zeroshotTable).table tbody th {
3737
border: 1px solid #ede9f7;
3838
vertical-align: middle;
3939
padding: 0.55rem 0.75rem;
@@ -42,7 +42,7 @@
4242
}
4343

4444
/* Captions */
45-
:is(#rankTable, #rankTable2) caption {
45+
:is(#ranksumTable, #spiderTable, #zeroshotTable) caption {
4646
caption-side: bottom;
4747
text-align: center;
4848
font-size: 0.6rem;
@@ -51,29 +51,29 @@
5151
}
5252

5353
/* ─── DataTables wrappers (search box area) ────────────────────────── */
54-
:is(#rankTable_wrapper, #rankTable2_wrapper) {
54+
:is(#ranksumTable_wrapper, #spiderTable_wrapper, #zeroshotTable_wrapper) {
5555
display: flex;
5656
flex-direction: column;
5757
align-items: center;
5858
}
5959

60-
:is(#rankTable_filter, #rankTable2_filter) {
60+
:is(#ranksumTable_filter, #spiderTable_filter, #zeroshotTable_filter) {
6161
width: 100%;
6262
float: none;
6363
text-align: center;
6464
padding: 0;
6565
margin-block: 0.75rem;
6666
}
6767

68-
:is(#rankTable_filter, #rankTable2_filter) > label {
68+
:is(#ranksumTable_filter, #spiderTable_filter, #zeroshotTable_filter) > label {
6969
display: block;
7070
width: 100%;
7171
margin: 0;
7272
position: relative;
7373
}
7474

7575
/* Search input */
76-
:is(#rankTable_filter, #rankTable2_filter) input[type="search"] {
76+
:is(#ranksumTable_filter, #spiderTable_filter, #zeroshotTable_filter) input[type="search"] {
7777
padding: 0.5rem 1rem 0.5rem 2.5rem;
7878
border: 2px solid #000000;
7979
border-radius: 9999px;
@@ -86,21 +86,21 @@
8686
margin: 0;
8787
}
8888

89-
:is(#rankTable_filter, #rankTable2_filter) input[type="search"]:hover,
90-
:is(#rankTable_filter, #rankTable2_filter) input[type="search"]:focus {
89+
:is(#ranksumTable_filter, #spiderTable_filter, #zeroshotTable_filter) input[type="search"]:hover,
90+
:is(#ranksumTable_filter, #spiderTable_filter, #zeroshotTable_filter) input[type="search"]:focus {
9191
outline: none;
9292
border-color: #5433a6;
9393
background: #fff;
9494
box-shadow: 0 0 0 0.18rem rgba(43, 133, 255, 0.25);
9595
}
9696

97-
:is(#rankTable_filter, #rankTable2_filter) input[type="search"]::placeholder {
97+
:is(#ranksumTable_filter, #spiderTable_filter, #zeroshotTable_filter) input[type="search"]::placeholder {
9898
color: #adb5bd;
9999
opacity: 1;
100100
}
101101

102102
/* Magnifier icon inside the label */
103-
:is(#rankTable_filter, #rankTable2_filter) > label::before {
103+
:is(#ranksumTable_filter, #spiderTable_filter, #zeroshotTable_filter) > label::before {
104104
content: "";
105105
position: absolute;
106106
top: 50%;
@@ -114,10 +114,10 @@
114114
opacity: 0.7;
115115
}
116116

117-
:is(#rankTable_filter, #rankTable2_filter)
117+
:is(#ranksumTable_filter, #spiderTable_filter, #zeroshotTable_filter)
118118
input[type="search"]:focus
119119
+ *::before,
120-
:is(#rankTable_filter, #rankTable2_filter)
120+
:is(#ranksumTable_filter, #spiderTable_filter, #zeroshotTable_filter)
121121
input[type="search"]:hover
122122
+ *::before {
123123
opacity: 1;
@@ -153,49 +153,76 @@
153153
font-weight: 600;
154154
}
155155

156-
/* ─── Make rankTable2 more compact ─────────────────────────────────── */
157-
#rankTable2.table {
156+
div.dtsp-panesContainer {
157+
margin: 0px !important;
158+
}
159+
160+
:is(#ranksumTable, #spiderTable, #zeroshotTable).table {
161+
width: auto !important;
162+
}
163+
164+
/* ─── spiderTable specific ─────────────────────────────────── */
165+
#spiderTable.table {
158166
font-size: 0.60rem;
159167
}
160168

161-
#rankTable2.table thead th {
169+
#spiderTable.table thead th {
162170
font-size: 0.60rem;
163171
padding: 0.60rem 0.3rem !important;
164172
}
165173

166-
#rankTable2.table tbody td,
167-
#rankTable2.table tbody th {
174+
#spiderTable.table tbody td,
175+
#spiderTable.table tbody th {
168176
padding: 0.4rem 0.3rem !important;
169177
white-space: nowrap;
170178
}
171-
#rankTable2_filter input[type="search"] {
179+
#spiderTable_filter input[type="search"] {
172180
font-size: 0.8rem;
173181
padding: 0.4rem 0.8rem 0.4rem 2.2rem;
174182
}
175183

176-
#rankTable2_wrapper .dataTables_info,
177-
#rankTable2_wrapper .dataTables_paginate,
178-
#rankTable2_wrapper .dtsp-panes-container,
179-
#rankTable2_wrapper .dtsp-titleRow {
184+
#spiderTable_wrapper .dataTables_info,
185+
#spiderTable_wrapper .dataTables_paginate,
186+
#spiderTable_wrapper .dtsp-panes-container,
187+
#spiderTable_wrapper .dtsp-titleRow {
180188
font-size: 0.8rem;
181189
}
182190

183-
#rankTable2,
184-
#rankTable2.table thead th,
185-
#rankTable2.table tbody td,
186-
#rankTable2.table tbody th {
191+
#spiderTable,
192+
#spiderTable.table thead th,
193+
#spiderTable.table tbody td,
194+
#spiderTable.table tbody th {
187195
border-width: 1px;
188196
}
189197

190-
#rankTable2.table thead tr:nth-child(2) > th {
198+
#spiderTable.table thead tr:nth-child(2) > th {
191199
background: #ccbeef !important;
192200
color: #000000 !important;
193201
}
194202

195-
div.dtsp-panesContainer {
196-
margin: 0px !important;
203+
/* ─── zeroshotTable specific ─────────────────────────────────── */
204+
#zeroshotTable th,
205+
#zeroshotTable td {
206+
width: auto !important;
197207
}
208+
#zeroshotTable_wrapper > div.dtsp-panesContainer { width: 100% !important; }
198209

199-
:is(#rankTable, #rankTable2).table {
200-
width: auto !important;
210+
#zeroshotTable.table.dataTable.pivot {
211+
display: grid;
212+
grid-template-columns: auto auto;
213+
}
214+
215+
#zeroshotTable.table.dataTable.pivot thead {
216+
display: block;
217+
}
218+
219+
#zeroshotTable.table.dataTable.pivot tbody {
220+
display: grid;
221+
grid-template-columns: repeat(7, auto);
222+
}
223+
224+
#zeroshotTable.table.dataTable.pivot tr,
225+
#zeroshotTable.table.dataTable.pivot th,
226+
#zeroshotTable.table.dataTable.pivot td {
227+
display: block;
201228
}

0 commit comments

Comments
 (0)