Skip to content

Commit e100ad4

Browse files
sjarmakclaude
andcommitted
feat: promote 4 Daytona batch runs (debug/design/feature/refactor) + regenerate export
Promotes debug_haiku_20260301_021540 (full, 11x2 configs), design_haiku_20260301_022406 (full, 20x2 configs), feature_haiku_20260301_023333 (partial, 8+6), refactor_haiku_20260301_023530 (partial, 10+10). MANIFEST: 804 tasks across 78 suite/config combos. Export: 203 runs, 969 scored tasks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e4f81bd commit e100ad4

File tree

233 files changed

+280473
-14921
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

233 files changed

+280473
-14921
lines changed

docs/official_results/README.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This bundle is generated from `runs/official/` and includes only valid scored tasks (`passed`/`failed` with numeric reward).
44

5-
Generated: `2026-03-01T02:35:22.323313+00:00`
5+
Generated: `2026-03-01T03:00:46.624386+00:00`
66

77
## Local Browse
88

@@ -19,14 +19,14 @@ Historical reruns/backfills remain available in `data/official_results.json` und
1919
|---|---|---:|---:|---:|---:|---|
2020
| [ccb_build](suites/ccb_build.md) | `baseline-local-direct` | 23 | 23 | 0.580 | 0.783 | ok |
2121
| [ccb_build](suites/ccb_build.md) | `mcp-remote-direct` | 20 | 23 | 0.592 | 0.800 | FLAG: below minimum |
22-
| [ccb_debug](suites/ccb_debug.md) | `baseline-local-direct` | 16 | 20 | 0.746 | 1.000 | FLAG: below minimum |
23-
| [ccb_debug](suites/ccb_debug.md) | `mcp-remote-direct` | 16 | 20 | 0.565 | 0.688 | FLAG: below minimum |
24-
| [ccb_design](suites/ccb_design.md) | `baseline-local-direct` | 20 | 20 | 0.642 | 0.950 | ok |
25-
| [ccb_design](suites/ccb_design.md) | `mcp-remote-direct` | 33 | 20 | 0.731 | 1.000 | ok |
22+
| [ccb_debug](suites/ccb_debug.md) | `baseline-local-direct` | 16 | 20 | 0.739 | 1.000 | FLAG: below minimum |
23+
| [ccb_debug](suites/ccb_debug.md) | `mcp-remote-direct` | 16 | 20 | 0.559 | 0.688 | FLAG: below minimum |
24+
| [ccb_design](suites/ccb_design.md) | `baseline-local-direct` | 20 | 20 | 0.766 | 1.000 | ok |
25+
| [ccb_design](suites/ccb_design.md) | `mcp-remote-direct` | 33 | 20 | 0.741 | 1.000 | ok |
2626
| [ccb_document](suites/ccb_document.md) | `baseline-local-direct` | 20 | 20 | 0.890 | 1.000 | ok |
2727
| [ccb_document](suites/ccb_document.md) | `mcp-remote-direct` | 44 | 20 | 0.841 | 1.000 | ok |
28-
| [ccb_feature](suites/ccb_feature.md) | `baseline-local-direct` | 20 | 20 | 0.680 | 0.950 | ok |
29-
| [ccb_feature](suites/ccb_feature.md) | `mcp-remote-direct` | 20 | 20 | 0.617 | 0.850 | ok |
28+
| [ccb_feature](suites/ccb_feature.md) | `baseline-local-direct` | 20 | 20 | 0.667 | 0.950 | ok |
29+
| [ccb_feature](suites/ccb_feature.md) | `mcp-remote-direct` | 20 | 20 | 0.615 | 0.850 | ok |
3030
| [ccb_fix](suites/ccb_fix.md) | `baseline-local-direct` | 25 | 25 | 0.450 | 0.600 | ok |
3131
| [ccb_fix](suites/ccb_fix.md) | `mcp-remote-direct` | 70 | 25 | 0.572 | 0.714 | ok |
3232
| [ccb_mcp_compliance](suites/ccb_mcp_compliance.md) | `baseline-local-artifact` | 1 | 28 | 0.375 | 1.000 | FLAG: below minimum |
@@ -67,8 +67,8 @@ Historical reruns/backfills remain available in `data/official_results.json` und
6767
| [ccb_mcp_security](suites/ccb_mcp_security.md) | `baseline-local-direct` | 10 | 25 | 0.524 | 0.800 | FLAG: below minimum |
6868
| [ccb_mcp_security](suites/ccb_mcp_security.md) | `mcp-remote-artifact` | 6 | 25 | 0.792 | 1.000 | FLAG: below minimum |
6969
| [ccb_mcp_security](suites/ccb_mcp_security.md) | `mcp-remote-direct` | 25 | 25 | 0.719 | 1.000 | ok |
70-
| [ccb_refactor](suites/ccb_refactor.md) | `baseline-local-direct` | 20 | 20 | 0.791 | 0.950 | ok |
71-
| [ccb_refactor](suites/ccb_refactor.md) | `mcp-remote-direct` | 20 | 20 | 0.737 | 0.950 | ok |
70+
| [ccb_refactor](suites/ccb_refactor.md) | `baseline-local-direct` | 20 | 20 | 0.808 | 0.950 | ok |
71+
| [ccb_refactor](suites/ccb_refactor.md) | `mcp-remote-direct` | 20 | 20 | 0.695 | 0.950 | ok |
7272
| [ccb_secure](suites/ccb_secure.md) | `baseline-local-direct` | 20 | 20 | 0.669 | 0.950 | ok |
7373
| [ccb_secure](suites/ccb_secure.md) | `mcp-remote-direct` | 24 | 20 | 0.637 | 0.917 | ok |
7474
| [ccb_test](suites/ccb_test.md) | `baseline-local-direct` | 20 | 20 | 0.480 | 0.750 | ok |
@@ -319,8 +319,12 @@ Historical reruns/backfills remain available in `data/official_results.json` und
319319
| [debug_haiku_20260228_230648](runs/debug_haiku_20260228_230648.md) | `ccb_debug` | `mcp-remote-direct` | 2 | 1.000 | 1.000 |
320320
| [debug_haiku_20260228_231033](runs/debug_haiku_20260228_231033.md) | `ccb_debug` | `baseline-local-direct` | 11 | 0.857 | 1.000 |
321321
| [debug_haiku_20260228_231033](runs/debug_haiku_20260228_231033.md) | `ccb_debug` | `mcp-remote-direct` | 10 | 0.804 | 1.000 |
322+
| [debug_haiku_20260301_021540](runs/debug_haiku_20260301_021540.md) | `ccb_debug` | `baseline-local-direct` | 11 | 0.847 | 1.000 |
323+
| [debug_haiku_20260301_021540](runs/debug_haiku_20260301_021540.md) | `ccb_debug` | `mcp-remote-direct` | 11 | 0.813 | 1.000 |
322324
| [design_haiku_20260223_124652](runs/design_haiku_20260223_124652.md) | `ccb_design` | `baseline-local-direct` | 13 | 0.770 | 1.000 |
323325
| [design_haiku_20260223_124652](runs/design_haiku_20260223_124652.md) | `ccb_design` | `mcp-remote-direct` | 20 | 0.718 | 1.000 |
326+
| [design_haiku_20260301_022406](runs/design_haiku_20260301_022406.md) | `ccb_design` | `baseline-local-direct` | 20 | 0.766 | 1.000 |
327+
| [design_haiku_20260301_022406](runs/design_haiku_20260301_022406.md) | `ccb_design` | `mcp-remote-direct` | 20 | 0.734 | 1.000 |
324328
| [document_haiku_20260223_164240](runs/document_haiku_20260223_164240.md) | `ccb_document` | `baseline-local-direct` | 19 | 0.851 | 1.000 |
325329
| [document_haiku_20260223_164240](runs/document_haiku_20260223_164240.md) | `ccb_document` | `mcp-remote-direct` | 20 | 0.822 | 1.000 |
326330
| [document_haiku_20260226_013910](runs/document_haiku_20260226_013910.md) | `ccb_document` | `baseline-local-direct` | 1 | 1.000 | 1.000 |
@@ -336,6 +340,8 @@ Historical reruns/backfills remain available in `data/official_results.json` und
336340
| [feature_haiku_20260228_231035](runs/feature_haiku_20260228_231035.md) | `ccb_feature` | `mcp-remote-direct` | 4 | 0.208 | 0.500 |
337341
| [feature_haiku_20260228_231041](runs/feature_haiku_20260228_231041.md) | `ccb_feature` | `baseline-local-direct` | 4 | 0.557 | 1.000 |
338342
| [feature_haiku_20260228_231043](runs/feature_haiku_20260228_231043.md) | `ccb_feature` | `baseline-local-direct` | 6 | 0.283 | 0.667 |
343+
| [feature_haiku_20260301_023333](runs/feature_haiku_20260301_023333.md) | `ccb_feature` | `baseline-local-direct` | 8 | 0.835 | 1.000 |
344+
| [feature_haiku_20260301_023333](runs/feature_haiku_20260301_023333.md) | `ccb_feature` | `mcp-remote-direct` | 6 | 0.867 | 1.000 |
339345
| [feature_haiku_vscode_rerun_20260301_023018](runs/feature_haiku_vscode_rerun_20260301_023018.md) | `ccb_feature` | `baseline-local-direct` | 1 | 0.500 | 1.000 |
340346
| [refactor_haiku_20260228_210652](runs/refactor_haiku_20260228_210652.md) | `ccb_refactor` | `baseline-local-direct` | 1 | 0.750 | 1.000 |
341347
| [refactor_haiku_20260228_210652](runs/refactor_haiku_20260228_210652.md) | `ccb_refactor` | `mcp-remote-direct` | 1 | 0.790 | 1.000 |
@@ -345,6 +351,8 @@ Historical reruns/backfills remain available in `data/official_results.json` und
345351
| [refactor_haiku_20260228_231045](runs/refactor_haiku_20260228_231045.md) | `ccb_refactor` | `baseline-local-direct` | 4 | 0.463 | 1.000 |
346352
| [refactor_haiku_20260301_010758](runs/refactor_haiku_20260301_010758.md) | `ccb_refactor` | `baseline-local-direct` | 20 | 0.791 | 0.950 |
347353
| [refactor_haiku_20260301_010758](runs/refactor_haiku_20260301_010758.md) | `ccb_refactor` | `mcp-remote-direct` | 20 | 0.737 | 0.950 |
354+
| [refactor_haiku_20260301_023530](runs/refactor_haiku_20260301_023530.md) | `ccb_refactor` | `baseline-local-direct` | 10 | 0.950 | 1.000 |
355+
| [refactor_haiku_20260301_023530](runs/refactor_haiku_20260301_023530.md) | `ccb_refactor` | `mcp-remote-direct` | 10 | 0.717 | 0.900 |
348356
| [secure_haiku_20260223_232545](runs/secure_haiku_20260223_232545.md) | `ccb_secure` | `baseline-local-direct` | 20 | 0.669 | 0.950 |
349357
| [secure_haiku_20260223_232545](runs/secure_haiku_20260223_232545.md) | `ccb_secure` | `mcp-remote-direct` | 18 | 0.705 | 1.000 |
350358
| [secure_haiku_20260224_011825](runs/secure_haiku_20260224_011825.md) | `ccb_secure` | `mcp-remote-direct` | 2 | 0.500 | 0.500 |

0 commit comments

Comments
 (0)