Skip to content

Commit 37b71f7

Browse files
committed
docs: refresh benchmark catalog docs and add technical report
1 parent 87d48f2 commit 37b71f7

File tree

5 files changed

+1295
-3
lines changed

5 files changed

+1295
-3
lines changed

benchmarks/README.md

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# CodeContextBench Benchmarks
22

3-
170 tasks organized into 8 suites aligned with the software development lifecycle (SDLC). Each suite targets a distinct phase of engineering work. The canonical task selection is in [`selected_benchmark_tasks.json`](../configs/selected_benchmark_tasks.json).
3+
This directory contains SDLC-aligned suites plus MCP-unique org-scale retrieval suites. The canonical selected task catalog is in [`selected_benchmark_tasks.json`](../configs/selected_benchmark_tasks.json) (currently 251 selected tasks across 19 suites).
44

55
See [`docs/TASK_SELECTION.md`](../docs/TASK_SELECTION.md) for selection methodology.
66

77
---
88

9-
## Suite Overview
9+
## SDLC Suite Overview
1010

1111
| Suite | SDLC Phase | Tasks | Description |
1212
|-------|-----------|------:|-------------|
@@ -22,6 +22,29 @@ See [`docs/TASK_SELECTION.md`](../docs/TASK_SELECTION.md) for selection methodol
2222

2323
---
2424

25+
## MCP-Unique Suite Overview (Selected Catalog)
26+
27+
These suites measure cross-repo discovery, tracing, and org-scale code intelligence use cases. Counts below reflect the current selected catalog in [`selected_benchmark_tasks.json`](../configs/selected_benchmark_tasks.json) (some suite directories may contain additional draft/deferred tasks that are not selected).
28+
29+
| Suite | Tasks | Description |
30+
|-------|------:|-------------|
31+
| `ccb_mcp_compliance` | 7 | Compliance, audit, and provenance workflows |
32+
| `ccb_mcp_crossorg` | 5 | Cross-org discovery and authoritative repo identification |
33+
| `ccb_mcp_crossrepo` | 1 | Legacy cross-repo discovery/tracing task (compatibility) |
34+
| `ccb_mcp_crossrepo_tracing` | 9 | Cross-repo dependency tracing and symbol resolution |
35+
| `ccb_mcp_domain` | 10 | Domain-specific lineage and analysis workflows |
36+
| `ccb_mcp_incident` | 11 | Incident debugging across services and repos |
37+
| `ccb_mcp_migration` | 7 | Framework and platform migrations across repos |
38+
| `ccb_mcp_onboarding` | 11 | Onboarding, architecture comprehension, API discovery |
39+
| `ccb_mcp_org` | 5 | Org-wide coding correctness tasks requiring broad context |
40+
| `ccb_mcp_platform` | 5 | Platform/devtools and tribal-knowledge discovery |
41+
| `ccb_mcp_security` | 10 | Vulnerability remediation and security analysis at org scale |
42+
| **Total MCP-Unique (selected)** | **81** | |
43+
44+
For suite taxonomy, authoring, and oracle evaluation details, see [`docs/MCP_UNIQUE_TASKS.md`](../docs/MCP_UNIQUE_TASKS.md).
45+
46+
---
47+
2548
## ccb_understand (20 tasks) — Requirements & Discovery
2649

2750
Codebase comprehension, natural-language Q&A, onboarding exercises, and knowledge recovery tasks.
@@ -166,6 +189,12 @@ Code review with injected defects, performance testing, and code search validati
166189
| `curl-security-review-001` | Code review: curl security |
167190
| `kafka-security-review-001` | Code review: Kafka security |
168191
| `sklearn-kmeans-perf-001` | Speed up K-means clustering |
192+
| `test-coverage-gap-001` | Analyze test coverage gaps: Envoy HTTP connection manager |
193+
| `test-coverage-gap-002` | Map test coverage gaps: Kafka consumer group coordinator |
194+
| `test-integration-001` | Write integration tests: Flipt evaluation API |
195+
| `test-integration-002` | Write integration tests: Navidrome media scanner |
196+
| `test-unitgen-go-001` | Generate unit tests: Kubernetes storage value package |
197+
| `test-unitgen-py-001` | Generate unit tests: Django cache middleware |
169198
| `terraform-code-review-001` | Code review: Terraform |
170199
| `vscode-code-review-001` | Code review: VS Code |
171200

@@ -178,6 +207,13 @@ API reference generation, architecture documentation, and migration guide creati
178207
| Task | Focus |
179208
|------|-------|
180209
| `cilium-api-doc-gen-001` | Cilium API reference generation |
210+
| `docgen-changelog-001` | Generate Terraform changelog |
211+
| `docgen-changelog-002` | Generate Flipt release notes |
212+
| `docgen-inline-001` | Generate Python docstrings for Django cache middleware |
213+
| `docgen-inline-002` | Generate Javadoc for Kafka record batch serialization |
214+
| `docgen-onboard-001` | Generate onboarding guide for Istio control plane |
215+
| `docgen-runbook-001` | Generate operational runbook for Prometheus TSDB compaction |
216+
| `docgen-runbook-002` | Generate troubleshooting runbook for Envoy connection pools |
181217
| `envoy-arch-doc-gen-001` | Envoy architecture documentation |
182218
| `envoy-migration-doc-gen-001` | Envoy migration guide generation |
183219
| `istio-arch-doc-gen-001` | Istio architecture documentation |
@@ -269,7 +305,7 @@ Each task follows this layout:
269305
## Running Benchmarks
270306

271307
```bash
272-
# Run all 170 tasks across 2 configs (Baseline + MCP-Full)
308+
# Run all selected tasks across 2 configs (currently 251 entries in selected_benchmark_tasks.json)
273309
bash configs/run_selected_tasks.sh
274310

275311
# Run a single SDLC phase

docs/AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Use this file when editing documentation or improving agent navigation.
66
- `docs/START_HERE_BY_TASK.md` - task-based routing (first stop for operations)
77
- `docs/ops/` - runbooks, indexes, troubleshooting, handoff templates
88
- `docs/reference/` - stable specs and policies (indexes/pointers first; migration can be gradual)
9+
- `docs/technical_reports/` - versioned white papers and technical report snapshots
910
- `docs/explanations/` - design rationale and context
1011
- `docs/archive/` - historical artifacts and non-canonical docs
1112

docs/CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Use this file when editing documentation or improving agent navigation.
66
- `docs/START_HERE_BY_TASK.md` - task-based routing (first stop for operations)
77
- `docs/ops/` - runbooks, indexes, troubleshooting, handoff templates
88
- `docs/reference/` - stable specs and policies (indexes/pointers first; migration can be gradual)
9+
- `docs/technical_reports/` - versioned white papers and technical report snapshots
910
- `docs/explanations/` - design rationale and context
1011
- `docs/archive/` - historical artifacts and non-canonical docs
1112

docs/ops/local_guides/docs.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ Use this file when editing documentation or improving agent navigation.
66
- `docs/START_HERE_BY_TASK.md` - task-based routing (first stop for operations)
77
- `docs/ops/` - runbooks, indexes, troubleshooting, handoff templates
88
- `docs/reference/` - stable specs and policies (indexes/pointers first; migration can be gradual)
9+
- `docs/technical_reports/` - versioned white papers and technical report snapshots
910
- `docs/explanations/` - design rationale and context
1011
- `docs/archive/` - historical artifacts and non-canonical docs
1112

0 commit comments

Comments
 (0)