Skip to content

Commit 5f38b74

Browse files
committed
Exclude zero-MCP MCP-config tasks from official browser
1 parent 691c472 commit 5f38b74

File tree

7 files changed

+128
-882
lines changed

7 files changed

+128
-882
lines changed

docs/official_results/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Official Results Browser
22

3-
This bundle is generated from `runs/analysis/` and includes only valid scored tasks (`passed`/`failed` with numeric reward).
3+
This bundle is generated from `runs/analysis/` and includes only valid scored tasks (`passed`/`failed` with numeric reward) that pass config-specific validity checks.
44

5-
Generated: `2026-03-05T21:30:17.060820+00:00`
5+
Generated: `2026-03-05T22:59:58.640809+00:00`
66

77
## Local Browse
88

@@ -17,16 +17,16 @@ Historical reruns/backfills remain available in `data/official_results.json` und
1717

1818
| Suite | Config | Valid Tasks | Min Required | Mean Reward | Pass Rate | Coverage |
1919
|---|---|---:|---:|---:|---:|---|
20-
| [csb_org_compliance](suites/csb_org_compliance.md) | `baseline-local-artifact` | 18 | 54 | 0.247 | 0.889 | FLAG: below minimum |
21-
| [csb_org_compliance](suites/csb_org_compliance.md) | `mcp-remote-artifact` | 54 | 54 | 0.295 | 0.889 | ok |
20+
| [csb_org_compliance](suites/csb_org_compliance.md) | `baseline-local-artifact` | 18 | 53 | 0.247 | 0.889 | FLAG: below minimum |
21+
| [csb_org_compliance](suites/csb_org_compliance.md) | `mcp-remote-artifact` | 53 | 53 | 0.298 | 0.887 | ok |
2222
| [csb_org_crossorg](suites/csb_org_crossorg.md) | `baseline-local-artifact` | 15 | 45 | 0.196 | 0.667 | FLAG: below minimum |
2323
| [csb_org_crossorg](suites/csb_org_crossorg.md) | `mcp-remote-artifact` | 45 | 45 | 0.200 | 0.667 | ok |
2424
| [csb_org_crossrepo](suites/csb_org_crossrepo.md) | `baseline-local-artifact` | 14 | 42 | 0.312 | 1.000 | FLAG: below minimum |
2525
| [csb_org_crossrepo](suites/csb_org_crossrepo.md) | `mcp-remote-artifact` | 42 | 42 | 0.285 | 0.976 | ok |
2626
| [csb_org_crossrepo_tracing](suites/csb_org_crossrepo_tracing.md) | `baseline-local-artifact` | 22 | 62 | 0.351 | 0.727 | FLAG: below minimum |
2727
| [csb_org_crossrepo_tracing](suites/csb_org_crossrepo_tracing.md) | `mcp-remote-artifact` | 62 | 62 | 0.356 | 0.758 | ok |
28-
| [csb_org_domain](suites/csb_org_domain.md) | `baseline-local-artifact` | 20 | 60 | 0.351 | 0.950 | FLAG: below minimum |
29-
| [csb_org_domain](suites/csb_org_domain.md) | `mcp-remote-artifact` | 60 | 60 | 0.338 | 0.900 | ok |
28+
| [csb_org_domain](suites/csb_org_domain.md) | `baseline-local-artifact` | 20 | 58 | 0.351 | 0.950 | FLAG: below minimum |
29+
| [csb_org_domain](suites/csb_org_domain.md) | `mcp-remote-artifact` | 58 | 58 | 0.331 | 0.897 | ok |
3030
| [csb_org_incident](suites/csb_org_incident.md) | `baseline-local-artifact` | 20 | 58 | 0.502 | 0.900 | FLAG: below minimum |
3131
| [csb_org_incident](suites/csb_org_incident.md) | `mcp-remote-artifact` | 58 | 58 | 0.569 | 0.948 | ok |
3232
| [csb_org_migration](suites/csb_org_migration.md) | `baseline-local-artifact` | 26 | 77 | 0.325 | 0.846 | FLAG: below minimum |
@@ -65,15 +65,15 @@ Historical reruns/backfills remain available in `data/official_results.json` und
6565
| Run | Suite | Config | Valid Tasks | Mean Reward | Pass Rate |
6666
|---|---|---|---:|---:|---:|
6767
| [csb_org/csb_org_compliance](runs/csb_org-csb_org_compliance.md) | `csb_org_compliance` | `baseline-local-artifact` | 54 | 0.280 | 0.889 |
68-
| [csb_org/csb_org_compliance](runs/csb_org-csb_org_compliance.md) | `csb_org_compliance` | `mcp-remote-artifact` | 54 | 0.295 | 0.889 |
68+
| [csb_org/csb_org_compliance](runs/csb_org-csb_org_compliance.md) | `csb_org_compliance` | `mcp-remote-artifact` | 53 | 0.298 | 0.887 |
6969
| [csb_org/csb_org_crossorg](runs/csb_org-csb_org_crossorg.md) | `csb_org_crossorg` | `baseline-local-artifact` | 45 | 0.175 | 0.667 |
7070
| [csb_org/csb_org_crossorg](runs/csb_org-csb_org_crossorg.md) | `csb_org_crossorg` | `mcp-remote-artifact` | 45 | 0.200 | 0.667 |
7171
| [csb_org/csb_org_crossrepo](runs/csb_org-csb_org_crossrepo.md) | `csb_org_crossrepo` | `baseline-local-artifact` | 42 | 0.309 | 1.000 |
7272
| [csb_org/csb_org_crossrepo](runs/csb_org-csb_org_crossrepo.md) | `csb_org_crossrepo` | `mcp-remote-artifact` | 42 | 0.285 | 0.976 |
7373
| [csb_org/csb_org_crossrepo_tracing](runs/csb_org-csb_org_crossrepo_tracing.md) | `csb_org_crossrepo_tracing` | `baseline-local-artifact` | 63 | 0.324 | 0.683 |
7474
| [csb_org/csb_org_crossrepo_tracing](runs/csb_org-csb_org_crossrepo_tracing.md) | `csb_org_crossrepo_tracing` | `mcp-remote-artifact` | 62 | 0.356 | 0.758 |
7575
| [csb_org/csb_org_domain](runs/csb_org-csb_org_domain.md) | `csb_org_domain` | `baseline-local-artifact` | 60 | 0.355 | 0.933 |
76-
| [csb_org/csb_org_domain](runs/csb_org-csb_org_domain.md) | `csb_org_domain` | `mcp-remote-artifact` | 60 | 0.338 | 0.900 |
76+
| [csb_org/csb_org_domain](runs/csb_org-csb_org_domain.md) | `csb_org_domain` | `mcp-remote-artifact` | 58 | 0.331 | 0.897 |
7777
| [csb_org/csb_org_incident](runs/csb_org-csb_org_incident.md) | `csb_org_incident` | `baseline-local-artifact` | 58 | 0.487 | 0.862 |
7878
| [csb_org/csb_org_incident](runs/csb_org-csb_org_incident.md) | `csb_org_incident` | `mcp-remote-artifact` | 58 | 0.569 | 0.948 |
7979
| [csb_org/csb_org_migration](runs/csb_org-csb_org_migration.md) | `csb_org_migration` | `baseline-local-artifact` | 77 | 0.381 | 0.870 |

0 commit comments

Comments
 (0)