Skip to content

Commit f9a3c2b

Browse files
sjarmakclaude
andcommitted
refactor: remove repo-set fixtures layer — Dockerfile is source of truth
The fixtures/ directory was an indirection layer designed for the mcp_only access model (now removed). With all repos cloned in every Dockerfile, the fixture JSON just duplicated what the Dockerfile already encodes, creating a sync burden that caused 9 bugs. Removed: - fixtures/repo_sets/ (5 JSON files + README) - schemas/repo_set_fixture.schema.json Updated: - docs/MCP_UNIQUE_TASKS.md: replaced fixture section with repo-set table, updated authoring guide to use copy-existing-task workflow - docs/EXTENSIBILITY.md: s/fixtures/Dockerfiles/ - README.md: removed fixtures/ from repo structure Scripts (generate_mcp_unique_tasks.py, curate_oracle.py) already handle missing fixtures gracefully — they log a warning and proceed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent b1bc59c commit f9a3c2b

10 files changed

Lines changed: 27 additions & 498 deletions

README.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,8 +77,6 @@ benchmarks/ # Task definitions organized by SDLC phase + MCP-unique
7777
ccb_mcp_onboarding/ # MCP-unique: onboarding & comprehension (3 tasks)
7878
ccb_mcp_crossorg/ # MCP-unique: cross-org discovery (2 tasks)
7979
ccb_mcp_platform/ # MCP-unique: platform knowledge (1 task)
80-
fixtures/ # Repo-set fixtures for MCP-unique tasks
81-
repo_sets/ # Polyrepo definitions (local vs MCP-only access)
8280
configs/ # Run configs and task selection
8381
_common.sh # Shared infra: token refresh, parallel execution, multi-account
8482
sdlc_suite_2config.sh # Generic SDLC runner (used by phase wrappers below)

docs/EXTENSIBILITY.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ python3 scripts/validate_tasks_preflight.py --suite ccb_mcp_<suite>
104104
- `task.toml` verification type must be `"test"` (Harbor standard)
105105
- `tests/eval.sh` must be executable (`chmod +x`)
106106
- Use `/tests/` paths inside eval.sh (Harbor uploads `tests/` to `/tests/`)
107-
- All repos in fixtures must be indexed in Sourcegraph
107+
- All repos cloned in Dockerfiles must be indexed in Sourcegraph
108108
- `scripts/ccb_metrics/oracle_checks.py` must be stdlib-only Python
109109

110110
**Directory structure:**

docs/MCP_UNIQUE_TASKS.md

Lines changed: 26 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,6 @@ cross-repo tasks — not whether MCP can access information the baseline can't.
3939
│ configs/use_case_registry.json (100 use cases) │
4040
│ │ │
4141
│ ▼ │
42-
│ fixtures/repo_sets/*.json (polyrepo fixtures) │
43-
│ │ │
44-
│ ▼ │
4542
│ scripts/generate_mcp_unique_tasks.py (task generator) │
4643
│ │ │
4744
│ ▼ │
@@ -84,21 +81,22 @@ Ten suites map to use case categories A-J:
8481
**Current starter pack: 14 tasks across 6 active suites.** See
8582
`configs/selected_mcp_unique_tasks.json` for the canonical list.
8683

87-
## Repo-Set Fixtures
84+
## Repo Sets
8885

89-
Each task uses a **repo-set fixture** defining which repos are local vs MCP-only:
86+
Each task's Dockerfile defines its repo set — all repos are cloned for
87+
baseline, truncated for MCP-Full. Common repo groupings across tasks:
9088

91-
| Fixture | Local Repo | MCP-Only Repos | Cross-Org |
92-
|---------|-----------|----------------|-----------|
93-
| `kubernetes-ecosystem` | kubernetes/kubernetes | kubernetes-client-go, kubernetes-api, etcd-io/etcd | Yes |
94-
| `nodejs-web-stack` | nodejs/node | expressjs-express, lodash, prisma-prisma | Yes |
95-
| `python-ml-stack` | scikit-learn/scikit-learn | numpy, pandas-dev/pandas, scipy | Yes |
96-
| `grafana-observability` | grafana/grafana | grafana-loki, grafana-mimir | No |
97-
| `multi-org-go` | kubernetes/kubernetes | etcd-io/etcd, grafana/grafana | Yes |
89+
| Repo Set | Repos | Cross-Org | Language |
90+
|----------|-------|-----------|----------|
91+
| Kubernetes ecosystem | kubernetes, client-go, api, etcd | Yes (k8s + etcd-io) | Go |
92+
| Node.js web stack | node, express, lodash, prisma | Yes (4 orgs) | JS/TS |
93+
| Python ML stack | scikit-learn, numpy, pandas, scipy | Yes (4 orgs) | Python |
94+
| Grafana observability | grafana, loki, mimir | No (all grafana) | Go/TS |
95+
| Multi-org Go | kubernetes, etcd, grafana | Yes (3 orgs) | Go |
9896

99-
Fixtures are in `fixtures/repo_sets/*.json` and validate against
100-
`schemas/repo_set_fixture.schema.json`. SG mirror repos (`sg-benchmarks/*`)
101-
are tracked in `configs/sg_mirror_revisions.json`.
97+
Repos not natively indexed in Sourcegraph use `sg-benchmarks` mirrors
98+
(e.g., `sg-benchmarks/kubernetes-client-go`). The Dockerfile is the
99+
source of truth for which repos a task uses and at what version.
102100

103101
## Task Authoring
104102

@@ -118,8 +116,8 @@ python3 scripts/generate_mcp_unique_tasks.py --category A
118116
python3 scripts/generate_mcp_unique_tasks.py --use-case-ids 1 --validate
119117
```
120118

121-
The generator reads `configs/use_case_registry.json` and `fixtures/repo_sets/`
122-
to fill `templates/mcp_unique_task/*.j2` templates.
119+
The generator reads `configs/use_case_registry.json` to fill
120+
`templates/mcp_unique_task/*.j2` templates.
123121

124122
### Worked Example: CCX-dep-trace-001
125123

@@ -369,21 +367,18 @@ Hybrid score = 0.6 × verifier_reward + 0.4 × rubric_score.
369367
### Add a Task to an Existing Category
370368

371369
1. Add the use case to `configs/use_case_registry.json` if not present
372-
2. Ensure a repo-set fixture exists in `fixtures/repo_sets/`
373-
3. Run the generator:
374-
```bash
375-
python3 scripts/generate_mcp_unique_tasks.py --use-case-ids <N> --curate-oracle --validate
376-
```
377-
4. Verify with the validity gate
378-
5. Add to `configs/selected_mcp_unique_tasks.json`
370+
2. Copy an existing task directory as a template (e.g., `ccx-dep-trace-001/`)
371+
3. Update the Dockerfile to clone all required repos at pinned versions
372+
4. Update `instruction.md`, `task_spec.json`, and `oracle_answer.json`
373+
5. Verify with the validity gate
374+
6. Add to `configs/selected_mcp_unique_tasks.json`
379375

380376
### Add a New Category (C, F, G, H, I, J)
381377

382378
1. Create the use case entries in `configs/use_case_registry.json`
383379
(set `oracle_type` from `"tbd"` to a real type)
384-
2. Create or reuse a repo-set fixture
385-
3. The suite directory `benchmarks/ccb_mcp_<suite>/` is created automatically
386-
by the generator
380+
2. Copy an existing task as a template, update Dockerfile with the required repos
381+
3. The suite directory `benchmarks/ccb_mcp_<suite>/` must be created manually
387382
4. Add the suite prefix to `DIR_PREFIX_TO_SUITE` in:
388383
- `scripts/aggregate_status.py`
389384
- `scripts/generate_manifest.py`
@@ -409,22 +404,17 @@ Wait for SG indexing (~hours), then verify:
409404
mcp__sourcegraph__keyword_search("repo:^github.com/sg-benchmarks/org-repo$")
410405
```
411406

412-
Record the SHA in `configs/sg_mirror_revisions.json`.
413-
414407
### Cross-Host (GitHub + GitLab) — Deferred
415408

416409
Cross-host support requires a multi-host Sourcegraph instance. The current
417-
design uses `cross_org` (different GitHub orgs) instead. To add cross-host:
410+
design uses cross-org (different GitHub orgs) instead. To add cross-host:
418411

419-
1. Add `host` field to repo objects in fixtures (currently only `github.com`)
420-
2. Update fixture schema `schemas/repo_set_fixture.schema.json`
421-
3. Add cross_host suite `ccb_mcp_crosshost` to `suiteMapping` in the PRD
422-
4. Ensure SG instance indexes the new host
412+
1. Create tasks with repos from multiple code hosts
413+
2. Ensure SG instance indexes all hosts
414+
3. Add `ccb_mcp_crosshost` suite to `DIR_PREFIX_TO_SUITE` mappings
423415

424416
## Design Decisions
425417

426-
These decisions are recorded in `ralph-mcp-unique/prd.json` under `designDecisions`:
427-
428418
- **Q1**: Use sg-benchmarks mirrors for 7 repos not natively indexed
429419
- **Q2**: Focus on org-scale quantity (3-20 repos), structured oracle, customer-framed prompts
430420
- **Q3**: Cross-org instead of cross-host (cross-host deferred until multi-host SG available)

fixtures/repo_sets/README.md

Lines changed: 0 additions & 60 deletions
This file was deleted.

fixtures/repo_sets/grafana-observability.json

Lines changed: 0 additions & 45 deletions
This file was deleted.

fixtures/repo_sets/kubernetes-ecosystem.json

Lines changed: 0 additions & 61 deletions
This file was deleted.

fixtures/repo_sets/multi-org-go.json

Lines changed: 0 additions & 43 deletions
This file was deleted.

0 commit comments

Comments
 (0)