Commit 3e92541
committed
feat: add scoring notes for all datasets
Show "How is this scored?" on dataset pages (after results table)
and run detail pages (under judge verdict). Each dataset explains
its grading method: BEAM uses per-rubric-item scoring from the paper,
LLM-judged datasets use pass/fail with gold references, MCQ datasets
use exact match.1 parent fb63307 commit 3e92541
8 files changed
Lines changed: 90 additions & 69 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
5 | 6 | | |
6 | 7 | | |
7 | 8 | | |
| |||
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
| 16 | + | |
15 | 17 | | |
16 | 18 | | |
17 | 19 | | |
18 | 20 | | |
19 | 21 | | |
20 | 22 | | |
21 | 23 | | |
| 24 | + | |
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
25 | 28 | | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
| 32 | + | |
29 | 33 | | |
30 | 34 | | |
31 | 35 | | |
32 | 36 | | |
33 | 37 | | |
34 | 38 | | |
35 | 39 | | |
| 40 | + | |
36 | 41 | | |
37 | 42 | | |
38 | 43 | | |
| |||
43 | 48 | | |
44 | 49 | | |
45 | 50 | | |
| 51 | + | |
46 | 52 | | |
47 | 53 | | |
48 | 54 | | |
| |||
55 | 61 | | |
56 | 62 | | |
57 | 63 | | |
| 64 | + | |
58 | 65 | | |
59 | 66 | | |
60 | 67 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
This file was deleted.
This file was deleted.
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
40 | | - | |
| 39 | + | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
483 | 483 | | |
484 | 484 | | |
485 | 485 | | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
486 | 492 | | |
487 | 493 | | |
488 | 494 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
34 | 35 | | |
35 | 36 | | |
36 | 37 | | |
37 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
38 | 42 | | |
39 | 43 | | |
40 | 44 | | |
| |||
341 | 345 | | |
342 | 346 | | |
343 | 347 | | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
344 | 352 | | |
345 | 353 | | |
346 | 354 | | |
| |||
0 commit comments