You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: update READMEs and configs for ccb_feature/ccb_refactor split
Update README.md and benchmarks/README.md with new suite tables
(9 SDLC suites, 199 tasks, 294 total). Update task counts and
suite references in 6 config wrappers, _common.sh, and
ground_truth_files.json.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
**Combined catalog total: 251 tasks** (170 SDLC + 81 MCP-unique). Of these, 212 are fully paired (baseline + MCP results) in official runs; the remaining 39 MCP-unique tasks have MCP results but are missing baselines.
102
+
**Combined catalog total: 294 tasks** (199 SDLC across 9 suites + 95 MCP-unique across 11 suites).
102
103
103
104
Both baseline and MCP-Full agents have access to **all repos** in each task's fixture. The only difference is the method: baseline reads code locally, MCP-Full uses Sourcegraph MCP tools (local code is truncated). This ensures we measure whether MCP tools help agents work better — not whether MCP can access repos the baseline can't.
104
105
@@ -110,7 +111,7 @@ See [docs/MCP_UNIQUE_TASKS.md](docs/MCP_UNIQUE_TASKS.md) for the full task syste
110
111
111
112
All benchmarks are evaluated across two paper-level configurations (Baseline vs MCP-Full). The concrete run config names differ by task type:
Legacy run directory names (`baseline`, `sourcegraph_full`, `artifact_full`) may still appear in historical outputs and are handled by analysis scripts.
@@ -130,7 +131,8 @@ See [docs/reference/CONFIGS.md](docs/reference/CONFIGS.md) for the canonical con
130
131
131
132
```
132
133
benchmarks/ # Task definitions organized by SDLC phase + MCP-unique
Copy file name to clipboardExpand all lines: benchmarks/README.md
+47-22Lines changed: 47 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# CodeContextBench Benchmarks
2
2
3
-
This directory contains SDLC-aligned suites plus MCP-unique org-scale retrieval suites. The canonical selected task catalog is in [`selected_benchmark_tasks.json`](../configs/selected_benchmark_tasks.json) (currently 251 selected tasks across 19 suites).
3
+
This directory contains SDLC-aligned suites plus MCP-unique org-scale retrieval suites. The canonical selected task catalog is in [`selected_benchmark_tasks.json`](../configs/selected_benchmark_tasks.json) (currently 294 selected tasks across 20 suites).
4
4
5
5
See [`docs/TASK_SELECTION.md`](../docs/TASK_SELECTION.md) for selection methodology.
6
6
@@ -13,12 +13,13 @@ See [`docs/TASK_SELECTION.md`](../docs/TASK_SELECTION.md) for selection methodol
0 commit comments