Skip to content

Commit 5e5567f

Browse files
sjarmakclaude
andcommitted
Rename mcp-named scripts and add org-scale support to scaffold-task skill
- Rename generate_mcp_unique_tasks.py → generate_csb_org_tasks.py - Rename register_new_mcp_tasks.py → register_new_org_tasks.py - Rename validate_mcp_task_instance.py → validate_org_task_instance.py - Rename remirror_mcp_unique_repos.sh → remirror_org_repos.sh - Update all references in docs, registry, and scripts - Add org-scale task support to /scaffold-task skill: - New "Add org-scale task" and "Create new org-scale suite" modes - Org-specific templates: task.toml (with org_scale=true), multi-repo Dockerfile, artifact-based instruction.md, eval.sh, oracle_checks.py, task_spec.json - Org-specific registration entry format - Regenerate script registry and index Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 272128f commit 5e5567f

File tree

13 files changed

+407
-80
lines changed

13 files changed

+407
-80
lines changed

docs/EXTENSIBILITY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,13 +90,13 @@ cannot do. See `docs/ORG_TASKS.md` for the full authoring guide.
9090

9191
```bash
9292
# 1. Generate from use case registry
93-
python3 scripts/generate_mcp_unique_tasks.py --use-case-ids <N> --curate-oracle --validate
93+
python3 scripts/generate_csb_org_tasks.py --use-case-ids <N> --curate-oracle --validate
9494

9595
# 2. Register in selection file
9696
# configs/selected_benchmark_tasks.json
9797

9898
# 3. Validate
99-
python3 scripts/validate_mcp_task_instance.py --task-dir benchmarks/csb_org_<suite>/<task>
99+
python3 scripts/validate_org_task_instance.py --task-dir benchmarks/csb_org_<suite>/<task>
100100
python3 scripts/validate_tasks_preflight.py --suite csb_org_<suite>
101101
```
102102

docs/ORG_TASKS.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ cross-repo tasks — not whether MCP can access information the baseline can't.
3939
│ configs/use_case_registry.json (100 use cases) │
4040
│ │ │
4141
│ ▼ │
42-
│ scripts/generate_mcp_unique_tasks.py (task generator) │
42+
│ scripts/generate_csb_org_tasks.py (task generator) │
4343
│ │ │
4444
│ ▼ │
4545
│ benchmarks/csb_org_<suite>/<task>/ │
@@ -57,7 +57,7 @@ cross-repo tasks — not whether MCP can access information the baseline can't.
5757
│ │
5858
│ scripts/csb_metrics/retrieval.py (KPI extractor) │
5959
│ scripts/curate_oracle.py (oracle auto-curator)│
60-
│ scripts/validate_mcp_task_instance.py (validity gate) │
60+
│ scripts/validate_org_task_instance.py (validity gate) │
6161
└────────────────────────────────────────────────────────────┘
6262
```
6363

@@ -106,16 +106,16 @@ source of truth for which repos a task uses and at what version.
106106

107107
```bash
108108
# Generate a task for use case ID 1 (dry run to preview)
109-
python3 scripts/generate_mcp_unique_tasks.py --use-case-ids 1 --dry-run
109+
python3 scripts/generate_csb_org_tasks.py --use-case-ids 1 --dry-run
110110

111111
# Generate with oracle curation
112-
python3 scripts/generate_mcp_unique_tasks.py --use-case-ids 1 --curate-oracle
112+
python3 scripts/generate_csb_org_tasks.py --use-case-ids 1 --curate-oracle
113113

114114
# Generate all category A tasks
115-
python3 scripts/generate_mcp_unique_tasks.py --category A
115+
python3 scripts/generate_csb_org_tasks.py --category A
116116

117117
# Generate and validate
118-
python3 scripts/generate_mcp_unique_tasks.py --use-case-ids 1 --validate
118+
python3 scripts/generate_csb_org_tasks.py --use-case-ids 1 --validate
119119
```
120120

121121
The generator reads `configs/use_case_registry.json` to fill
@@ -138,7 +138,7 @@ print(uc['customer_prompt'])
138138

139139
**Step 2: Generate the task skeleton**
140140
```bash
141-
python3 scripts/generate_mcp_unique_tasks.py \
141+
python3 scripts/generate_csb_org_tasks.py \
142142
--use-case-ids 1 \
143143
--out benchmarks/ \
144144
--verbose
@@ -159,7 +159,7 @@ to discover all files matching the task's `seed_prompt`. It writes:
159159

160160
**Step 4: Validate the oracle (fail2pass gate)**
161161
```bash
162-
python3 scripts/validate_mcp_task_instance.py \
162+
python3 scripts/validate_org_task_instance.py \
163163
--task-dir benchmarks/csb_org_crossrepo_tracing/ccx-dep-trace-001 \
164164
--verbose
165165
```
@@ -471,7 +471,7 @@ See `configs/use_case_registry.json` — entries with
471471

472472
1. Set `"verification_modes": ["artifact", "direct"]` in the registry entry.
473473
2. Ensure the fixture has `local_checkout_repos` (direct mode needs full repo).
474-
3. Run `generate_mcp_unique_tasks.py` — creates `direct_verifier.sh` placeholder.
474+
3. Run `generate_csb_org_tasks.py` — creates `direct_verifier.sh` placeholder.
475475
4. Run `customize_mcp_skeletons.py` — generates dual-mode `test.sh` and copies
476476
parent verifier if the task has SDLC lineage.
477477
5. Manually curate `direct_verifier.sh` for task-specific verification logic.

docs/assets/blog/medium/post_codescalebench_engineering_diary.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ Oracle quality was gated before task inclusion with the fail2pass check.
6969

7070
Implementation anchors:
7171
- `configs/use_case_registry.json`
72-
- `scripts/generate_mcp_unique_tasks.py`
72+
- `scripts/generate_csb_org_tasks.py`
7373
- `scripts/customize_mcp_skeletons.py`
7474
- `docs/technical_reports/TECHNICAL_REPORT.md` (Section 5)
7575

docs/ops/SCRIPT_INDEX.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,8 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
6767
- `scripts/official_runs.py` - QA/validation script for official runs.
6868
- `scripts/quarantine_invalid_tasks.py` - QA/validation script for quarantine invalid tasks.
6969
- `scripts/validate_artifact_golden.py` - QA/validation script for validate artifact golden.
70-
- `scripts/validate_mcp_task_instance.py` - QA/validation script for validate mcp task instance.
7170
- `scripts/validate_official_integrity.py` - QA/validation script for validate official integrity.
71+
- `scripts/validate_org_task_instance.py` - QA/validation script for validate org task instance.
7272

7373
## Data Management
7474

@@ -97,13 +97,13 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
9797

9898
- `scripts/curate_oracle.py` - Task creation/selection script for curate oracle.
9999
- `scripts/customize_mcp_skeletons.py` - Task creation/selection script for customize mcp skeletons.
100+
- `scripts/generate_csb_org_tasks.py` - Task creation/selection script for generate csb org tasks.
100101
- `scripts/generate_dependeval_tasks.py` - Task creation/selection script for generate dependeval tasks.
101-
- `scripts/generate_mcp_unique_tasks.py` - Task creation/selection script for generate mcp unique tasks.
102102
- `scripts/generate_pytorch_expected_diffs.py` - Task creation/selection script for generate pytorch expected diffs.
103103
- `scripts/materialize_dependeval_repos.py` - Task creation/selection script for materialize dependeval repos.
104104
- `scripts/materialize_sdlc_suites.py` - Task creation/selection script for materialize sdlc suites.
105105
- `scripts/mine_bug_tasks.py` - Task creation/selection script for mine bug tasks.
106-
- `scripts/register_new_mcp_tasks.py` - Task creation/selection script for register new mcp tasks.
106+
- `scripts/register_new_org_tasks.py` - Task creation/selection script for register new org tasks.
107107
- `scripts/rename_tasks.py` - Task creation/selection script for rename tasks.
108108
- `scripts/select_benchmark_tasks.py` - Task creation/selection script for select benchmark tasks.
109109
- `scripts/select_dependeval_tasks.py` - Task creation/selection script for select dependeval tasks.
@@ -228,7 +228,7 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
228228
- `scripts/push_base_images_ghcr.sh` - Utility script for push base images ghcr.
229229
- `scripts/regenerate_artifact_dockerfiles.py` - Utility script for regenerate artifact dockerfiles.
230230
- `scripts/rehost_sweap_images.py` - Utility script for rehost sweap images.
231-
- `scripts/remirror_mcp_unique_repos.sh` - Utility script for remirror mcp unique repos.
231+
- `scripts/remirror_org_repos.sh` - Utility script for remirror org repos.
232232
- `scripts/rename_project.py` - Utility script for rename project.
233233
- `scripts/repair_h3_trajectories.py` [one_off] - Historical one-off script: repair h3 trajectories.
234234
- `scripts/rerun_crossrepo_2tasks.sh` [one_off] - Historical one-off script: rerun crossrepo 2tasks.

scripts/curate_oracle.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1099,7 +1099,7 @@ def main() -> int:
10991099
json.dump(log_data, f, indent=2)
11001100

11011101
if args.verify:
1102-
validator = project_root / "scripts" / "validate_mcp_task_instance.py"
1102+
validator = project_root / "scripts" / "validate_org_task_instance.py"
11031103
if validator.exists():
11041104
result = subprocess.run(
11051105
[sys.executable, str(validator), "--task-dir", str(task_dir), "--verbose"],
Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@
33
44
Reads configs/use_case_registry.json and fixtures/repo_sets/*.json, fills
55
templates from templates/csb-org/, and writes task directories under
6-
benchmarks/<mcp_suite>/<task_slug>/.
6+
benchmarks/<org_suite>/<task_slug>/.
77
88
Usage:
9-
python3 scripts/generate_mcp_unique_tasks.py --use-case-ids 1 4 10
10-
python3 scripts/generate_mcp_unique_tasks.py --category A
11-
python3 scripts/generate_mcp_unique_tasks.py --family cross-repo-dep-trace
12-
python3 scripts/generate_mcp_unique_tasks.py --all --dry-run
13-
python3 scripts/generate_mcp_unique_tasks.py --category A --curate-oracle --verbose
9+
python3 scripts/generate_csb_org_tasks.py --use-case-ids 1 4 10
10+
python3 scripts/generate_csb_org_tasks.py --category A
11+
python3 scripts/generate_csb_org_tasks.py --family cross-repo-dep-trace
12+
python3 scripts/generate_csb_org_tasks.py --all --dry-run
13+
python3 scripts/generate_csb_org_tasks.py --category A --curate-oracle --verbose
1414
"""
1515

1616
import argparse

scripts/generate_script_registry.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@
4747
"abc_criteria.py",
4848
"validate_official_integrity.py",
4949
"quarantine_invalid_tasks.py",
50-
"validate_mcp_task_instance.py",
50+
"validate_org_task_instance.py",
5151
"validate_artifact_golden.py",
5252
},
5353
"data_management": {
@@ -77,8 +77,8 @@
7777
"generate_pytorch_expected_diffs.py",
7878
"select_dependeval_tasks.py",
7979
"generate_dependeval_tasks.py",
80-
"generate_mcp_unique_tasks.py",
81-
"register_new_mcp_tasks.py",
80+
"generate_csb_org_tasks.py",
81+
"register_new_org_tasks.py",
8282
"materialize_dependeval_repos.py",
8383
"materialize_sdlc_suites.py",
8484
"curate_oracle.py",
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#!/usr/bin/env python3
2-
"""Register 20 new MCP-unique tasks in both selection files."""
2+
"""Register new org-scale tasks in both selection files."""
33

44
import json
55
import os

scripts/registry.json

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -650,6 +650,14 @@
650650
"language": "python",
651651
"summary": "Generation script for generate coverage gap configs."
652652
},
653+
{
654+
"name": "generate_csb_org_tasks.py",
655+
"path": "scripts/generate_csb_org_tasks.py",
656+
"category": "task_creation_selection",
657+
"status": "maintained",
658+
"language": "python",
659+
"summary": "Task creation/selection script for generate csb org tasks."
660+
},
653661
{
654662
"name": "generate_dependeval_tasks.py",
655663
"path": "scripts/generate_dependeval_tasks.py",
@@ -698,14 +706,6 @@
698706
"language": "python",
699707
"summary": "Rebuilds `MANIFEST.json` from on-disk run results."
700708
},
701-
{
702-
"name": "generate_mcp_unique_tasks.py",
703-
"path": "scripts/generate_mcp_unique_tasks.py",
704-
"category": "task_creation_selection",
705-
"status": "maintained",
706-
"language": "python",
707-
"summary": "Task creation/selection script for generate mcp unique tasks."
708-
},
709709
{
710710
"name": "generate_pytorch_expected_diffs.py",
711711
"path": "scripts/generate_pytorch_expected_diffs.py",
@@ -1171,12 +1171,12 @@
11711171
"summary": "Utility script for regenerate artifact dockerfiles."
11721172
},
11731173
{
1174-
"name": "register_new_mcp_tasks.py",
1175-
"path": "scripts/register_new_mcp_tasks.py",
1174+
"name": "register_new_org_tasks.py",
1175+
"path": "scripts/register_new_org_tasks.py",
11761176
"category": "task_creation_selection",
11771177
"status": "maintained",
11781178
"language": "python",
1179-
"summary": "Task creation/selection script for register new mcp tasks."
1179+
"summary": "Task creation/selection script for register new org tasks."
11801180
},
11811181
{
11821182
"name": "rehost_sweap_images.py",
@@ -1195,12 +1195,12 @@
11951195
"summary": "Analysis/comparison script for reliability analysis."
11961196
},
11971197
{
1198-
"name": "remirror_mcp_unique_repos.sh",
1199-
"path": "scripts/remirror_mcp_unique_repos.sh",
1198+
"name": "remirror_org_repos.sh",
1199+
"path": "scripts/remirror_org_repos.sh",
12001200
"category": "misc",
12011201
"status": "maintained",
12021202
"language": "shell",
1203-
"summary": "Utility script for remirror mcp unique repos."
1203+
"summary": "Utility script for remirror org repos."
12041204
},
12051205
{
12061206
"name": "rename_project.py",
@@ -1546,14 +1546,6 @@
15461546
"language": "python",
15471547
"summary": "Validation script for validate enterprise readiness."
15481548
},
1549-
{
1550-
"name": "validate_mcp_task_instance.py",
1551-
"path": "scripts/validate_mcp_task_instance.py",
1552-
"category": "qa_quality",
1553-
"status": "maintained",
1554-
"language": "python",
1555-
"summary": "QA/validation script for validate mcp task instance."
1556-
},
15571549
{
15581550
"name": "validate_official_integrity.py",
15591551
"path": "scripts/validate_official_integrity.py",
@@ -1570,6 +1562,14 @@
15701562
"language": "python",
15711563
"summary": "Validation script for validate on contextbench."
15721564
},
1565+
{
1566+
"name": "validate_org_task_instance.py",
1567+
"path": "scripts/validate_org_task_instance.py",
1568+
"category": "qa_quality",
1569+
"status": "maintained",
1570+
"language": "python",
1571+
"summary": "QA/validation script for validate org task instance."
1572+
},
15731573
{
15741574
"name": "validate_submission.py",
15751575
"path": "scripts/validate_submission.py",
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# These repos already exist on GitHub — we shallow-clone at the correct tag,
44
# create an orphan commit, and force-push to overwrite.
55
#
6-
# Usage: bash scripts/remirror_mcp_unique_repos.sh
6+
# Usage: bash scripts/remirror_org_repos.sh
77
set -euo pipefail
88

99
SG_ORG="sg-evals"

0 commit comments

Comments
 (0)