Commit 4ff6274
feat: add ccb_test variance run for 4 gap tasks, close 3/4 gaps
- numpy-array-sum-perf-001, pandas-groupby-perf-001, sklearn-kmeans-perf-001
now have >= 3 valid runs in both configs
- curl-security-review-001 MCP has systemic RewardFileNotFoundError
(verifier bug, not coverage issue — needs fix)
- 177/178 SDLC tasks now have >= 3 valid scored runs in both configs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 07735ed commit 4ff6274
File tree
176 files changed
+79909
-287
lines changed- docs/official_results
- audits
- data
- runs
- suites
- tasks
- runs/official
- test_haiku_20260301_192246
- baseline-local-direct
- ccb_test_curl-security-review-001_baseline-local-direct
- curl-security-review-001__cvVb8oC
- agent
- command-0
- command-1
- setup
- verifier
- ccb_test_numpy-array-sum-perf-001_baseline-local-direct
- numpy-array-sum-perf-001__8qobdVm
- agent
- command-0
- command-1
- setup
- verifier
- ccb_test_pandas-groupby-perf-001_baseline-local-direct
- pandas-groupby-perf-001__KQ9nVyd
- agent
- command-0
- command-1
- setup
- verifier
- ccb_test_sklearn-kmeans-perf-001_baseline-local-direct
- sklearn-kmeans-perf-001__4VbnGbP
- agent
- command-0
- command-1
- setup
- verifier
- mcp-remote-direct
- ccb_test_curl-security-review-001_mcp-remote-direct
- sgonly_curl-security-review-001__aS3p2nG
- agent
- command-0
- command-1
- setup
- verifier
- ccb_test_numpy-array-sum-perf-001_mcp-remote-direct
- sgonly_numpy-array-sum-perf-001__CEsBDgF
- agent
- command-0
- command-1
- setup
- verifier
- ccb_test_pandas-groupby-perf-001_mcp-remote-direct
- sgonly_pandas-groupby-perf-001__HLa8NHS
- agent
- command-0
- command-1
- setup
- verifier
- ccb_test_sklearn-kmeans-perf-001_mcp-remote-direct
- sgonly_sklearn-kmeans-perf-001__NKMQxRx
- agent
- command-0
- command-1
- setup
- verifier
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
176 files changed
+79909
-287
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
72 | | - | |
| 72 | + | |
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| |||
398 | 398 | | |
399 | 399 | | |
400 | 400 | | |
| 401 | + | |
| 402 | + | |
401 | 403 | | |
402 | 404 | | |
403 | 405 | | |
| |||
Lines changed: 861 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 2483 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 4594 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 1007 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 2184 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 2779 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 1449 additions & 0 deletions
Large diffs are not rendered by default.
0 commit comments