@@ -39,9 +39,6 @@ cross-repo tasks — not whether MCP can access information the baseline can't.
3939│ configs/use_case_registry.json (100 use cases) │
4040│ │ │
4141│ ▼ │
42- │ fixtures/repo_sets/*.json (polyrepo fixtures) │
43- │ │ │
44- │ ▼ │
4542│ scripts/generate_mcp_unique_tasks.py (task generator) │
4643│ │ │
4744│ ▼ │
@@ -84,21 +81,22 @@ Ten suites map to use case categories A-J:
8481** Current starter pack: 14 tasks across 6 active suites.** See
8582` configs/selected_mcp_unique_tasks.json ` for the canonical list.
8683
87- ## Repo-Set Fixtures
84+ ## Repo Sets
8885
89- Each task uses a ** repo-set fixture** defining which repos are local vs MCP-only:
86+ Each task's Dockerfile defines its repo set — all repos are cloned for
87+ baseline, truncated for MCP-Full. Common repo groupings across tasks:
9088
91- | Fixture | Local Repo | MCP-Only Repos | Cross-Org |
92- | ---------| ----------- | ---------------- | - ----------|
93- | ` kubernetes- ecosystem` | kubernetes/kubernetes | kubernetes- client-go, kubernetes- api, etcd-io/etcd | Yes |
94- | ` nodejs- web- stack` | nodejs/ node | expressjs- express, lodash, prisma-prisma | Yes |
95- | ` python-ml- stack` | scikit-learn/scikit-learn | numpy, pandas-dev/pandas , scipy | Yes |
96- | ` grafana- observability` | grafana/grafana | grafana- loki, grafana- mimir | No |
97- | ` multi -org-go ` | kubernetes/kubernetes | etcd-io/etcd , grafana/grafana | Yes |
89+ | Repo Set | Repos | Cross-Org | Language |
90+ | ---------- | -------| -----------| ----------|
91+ | Kubernetes ecosystem | kubernetes, client-go, api, etcd | Yes (k8s + etcd -io) | Go |
92+ | Node.js web stack | node, express, lodash, prisma | Yes (4 orgs) | JS/TS |
93+ | Python ML stack | scikit-learn, numpy, pandas, scipy | Yes (4 orgs) | Python |
94+ | Grafana observability | grafana, loki, mimir | No (all grafana) | Go/TS |
95+ | Multi -org Go | kubernetes, etcd, grafana | Yes (3 orgs) | Go |
9896
99- Fixtures are in ` fixtures/repo_sets/*.json ` and validate against
100- ` schemas/repo_set_fixture.schema.json ` . SG mirror repos ( ` sg-benchmarks/* ` )
101- are tracked in ` configs/sg_mirror_revisions.json ` .
97+ Repos not natively indexed in Sourcegraph use ` sg-benchmarks ` mirrors
98+ (e.g., ` sg-benchmarks/kubernetes-client-go ` ). The Dockerfile is the
99+ source of truth for which repos a task uses and at what version .
102100
103101## Task Authoring
104102
@@ -118,8 +116,8 @@ python3 scripts/generate_mcp_unique_tasks.py --category A
118116python3 scripts/generate_mcp_unique_tasks.py --use-case-ids 1 --validate
119117```
120118
121- The generator reads ` configs/use_case_registry.json ` and ` fixtures/repo_sets/ `
122- to fill ` templates/mcp_unique_task/*.j2 ` templates.
119+ The generator reads ` configs/use_case_registry.json ` to fill
120+ ` templates/mcp_unique_task/*.j2 ` templates.
123121
124122### Worked Example: CCX-dep-trace-001
125123
@@ -369,21 +367,18 @@ Hybrid score = 0.6 × verifier_reward + 0.4 × rubric_score.
369367### Add a Task to an Existing Category
370368
3713691 . Add the use case to ` configs/use_case_registry.json ` if not present
372- 2 . Ensure a repo-set fixture exists in ` fixtures/repo_sets/ `
373- 3 . Run the generator:
374- ``` bash
375- python3 scripts/generate_mcp_unique_tasks.py --use-case-ids < N> --curate-oracle --validate
376- ```
377- 4 . Verify with the validity gate
378- 5 . Add to ` configs/selected_mcp_unique_tasks.json `
370+ 2 . Copy an existing task directory as a template (e.g., ` ccx-dep-trace-001/ ` )
371+ 3 . Update the Dockerfile to clone all required repos at pinned versions
372+ 4 . Update ` instruction.md ` , ` task_spec.json ` , and ` oracle_answer.json `
373+ 5 . Verify with the validity gate
374+ 6 . Add to ` configs/selected_mcp_unique_tasks.json `
379375
380376### Add a New Category (C, F, G, H, I, J)
381377
3823781 . Create the use case entries in ` configs/use_case_registry.json `
383379 (set ` oracle_type ` from ` "tbd" ` to a real type)
384- 2 . Create or reuse a repo-set fixture
385- 3 . The suite directory ` benchmarks/ccb_mcp_<suite>/ ` is created automatically
386- by the generator
380+ 2 . Copy an existing task as a template, update Dockerfile with the required repos
381+ 3 . The suite directory ` benchmarks/ccb_mcp_<suite>/ ` must be created manually
3873824 . Add the suite prefix to ` DIR_PREFIX_TO_SUITE ` in:
388383 - ` scripts/aggregate_status.py `
389384 - ` scripts/generate_manifest.py `
@@ -409,22 +404,17 @@ Wait for SG indexing (~hours), then verify:
409404mcp__sourcegraph__keyword_search(" repo:^github.com/sg-benchmarks/org-repo$" )
410405```
411406
412- Record the SHA in ` configs/sg_mirror_revisions.json ` .
413-
414407### Cross-Host (GitHub + GitLab) — Deferred
415408
416409Cross-host support requires a multi-host Sourcegraph instance. The current
417- design uses ` cross_org ` (different GitHub orgs) instead. To add cross-host:
410+ design uses cross-org (different GitHub orgs) instead. To add cross-host:
418411
419- 1 . Add ` host ` field to repo objects in fixtures (currently only ` github.com ` )
420- 2 . Update fixture schema ` schemas/repo_set_fixture.schema.json `
421- 3 . Add cross_host suite ` ccb_mcp_crosshost ` to ` suiteMapping ` in the PRD
422- 4 . Ensure SG instance indexes the new host
412+ 1 . Create tasks with repos from multiple code hosts
413+ 2 . Ensure SG instance indexes all hosts
414+ 3 . Add ` ccb_mcp_crosshost ` suite to ` DIR_PREFIX_TO_SUITE ` mappings
423415
424416## Design Decisions
425417
426- These decisions are recorded in ` ralph-mcp-unique/prd.json ` under ` designDecisions ` :
427-
428418- ** Q1** : Use sg-benchmarks mirrors for 7 repos not natively indexed
429419- ** Q2** : Focus on org-scale quantity (3-20 repos), structured oracle, customer-framed prompts
430420- ** Q3** : Cross-org instead of cross-host (cross-host deferred until multi-host SG available)
0 commit comments