Skip to content

Commit 4417858

Browse files
sjarmakclaude
andcommitted
feat: US-011 - Task generator CLI
Implements scripts/generate_mcp_unique_tasks.py — CLI tool that reads the use case registry and repo-set fixtures, fills templates, and writes task directories to benchmarks/<mcp_suite>/<task_slug>/. Key features: - Filter by --use-case-ids, --category, --family, or --all - --dry-run shows what would be generated without writing files - --include-stubs generates oracle_type='tbd' entries (skipped by default) - --curate-oracle auto-populates oracle_answer.json via curate_oracle.py - --validate runs syntactic checks (bash -n, json.load, py_compile) post-generation - Copies oracle_checks.py to tests/ for Harbor /tests/oracle_checks.py access - Registry fallback: searches both configs/ and ralph-mcp-unique/configs/ Tested: --help, --dry-run, actual generation + validation for use_case_id=1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 9f82070 commit 4417858

File tree

3 files changed

+624
-1
lines changed

3 files changed

+624
-1
lines changed

ralph-mcp-unique/prd.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,7 @@
257257
"python3 scripts/generate_mcp_unique_tasks.py --help and --dry-run succeed"
258258
],
259259
"priority": 11,
260-
"passes": false,
260+
"passes": true,
261261
"notes": "The --curate-oracle flag is the main workflow: generate task skeleton, then auto-populate oracle. This is the zero-human-involvement path for task authoring."
262262
},
263263
{

ralph-mcp-unique/progress.txt

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,24 @@
160160
[2026-02-20 20:21:26 UTC] Iteration 1 complete
161161
[2026-02-20 20:21:28 UTC] Iteration 2 started
162162

163+
## 2026-02-20 - US-011: Task generator CLI
164+
- Created `scripts/generate_mcp_unique_tasks.py` with argparse CLI
165+
- Flags: --use-case-ids, --category, --family, --all (mutually exclusive filter group)
166+
- Additional flags: --out, --dry-run, --include-stubs, --curate-oracle, --validate, --verbose
167+
- Resolves mcp_suite to benchmarks/<ccb_mcp_*>/ directory; derives task_id (CCX-<short_family>-<NNN>) and task_slug (ccx-dep-trace-001)
168+
- Registry search: configs/use_case_registry.json + fallback to ralph-mcp-unique/configs/
169+
- build_template_vars() maps registry+fixture fields to all template variables
170+
- Generates: task.toml, instruction.md, environment/Dockerfile, environment/Dockerfile.sg_only, tests/eval.sh, tests/task_spec.json, tests/oracle_checks.py (copied)
171+
- validate_generated_task() checks task.toml fields, task_spec.json JSON, eval.sh bash -n, oracle_checks.py py_compile
172+
- Tested: --dry-run shows 3 tasks for IDs 1/4/10; actual generation+validation passes for ID=1
173+
- Files changed: `scripts/generate_mcp_unique_tasks.py` (new)
174+
- **Learnings for future iterations:**
175+
- Registry path: always search ralph-mcp-unique/configs/ as fallback
176+
- FAMILY_SHORT_NAME dict maps 10 task families to 2-3 word short codes
177+
- clone_commands: look up each local_checkout_repo in fixture repos[] to get revision for git clone --branch
178+
- oracle_checks.py is copied to tests/ so eval.sh can call it at /tests/oracle_checks.py in Harbor
179+
---
180+
163181
## 2026-02-20 - US-010: Task generation templates
164182
- Created `templates/mcp_unique_task/` with 7 files:
165183
- `task.toml.j2`: task metadata with mcp_suite, use_case_id, repo_set_id, mcp_unique=true

0 commit comments

Comments
 (0)