|
| 1 | +# MCP-Unique Task Templates |
| 2 | + |
| 3 | +Templates for generating Harbor-compatible MCP-unique benchmark task directories. |
| 4 | +All templates use Python `string.Template` syntax: `$variable` or `${variable}`. |
| 5 | +Literal `$` signs in bash code are escaped as `$$`. |
| 6 | + |
| 7 | +## Template Files |
| 8 | + |
| 9 | +| File | Purpose | |
| 10 | +|------|---------| |
| 11 | +| `task.toml.j2` | Harbor task metadata and verification config | |
| 12 | +| `instruction.md.j2` | Agent instruction with customer-framed prompt | |
| 13 | +| `eval.sh.j2` | Exit-code-first evaluator calling oracle_checks.py | |
| 14 | +| `task_spec.json.j2` | Full TaskSpec (oracle + evaluation checks) | |
| 15 | +| `Dockerfile.j2` | Baseline: clones local_checkout_repos | |
| 16 | +| `Dockerfile.sg_only.j2` | MCP-Full: no clone, marks /tmp/.sg_only_mode | |
| 17 | + |
| 18 | +## Template Variables |
| 19 | + |
| 20 | +### task.toml.j2 |
| 21 | + |
| 22 | +| Variable | Type | Example | Description | |
| 23 | +|----------|------|---------|-------------| |
| 24 | +| `$task_id` | string | `CCX-dep-trace-001` | Task identifier (CCX-<family>-<NNN>) | |
| 25 | +| `$task_description` | string | `Trace blast radius of client-go changes` | Short description | |
| 26 | +| `$primary_repo` | string | `kubernetes/kubernetes` | Main local repo (org/name) | |
| 27 | +| `$task_family` | string | `cross-repo-dep-trace` | Task family ID from registry | |
| 28 | +| `$language` | string | `go` | Primary programming language | |
| 29 | +| `$difficulty` | string | `medium` | Task difficulty: easy/medium/hard | |
| 30 | +| `$time_limit_sec` | int | `900` | Agent time limit in seconds | |
| 31 | +| `$mcp_suite` | string | `ccb_mcp_crossrepo_tracing` | CCB MCP suite name | |
| 32 | +| `$use_case_id` | int | `1` | Use case ID from registry (1-100) | |
| 33 | +| `$repo_set_id` | string | `kubernetes-ecosystem` | Fixture ID | |
| 34 | + |
| 35 | +### instruction.md.j2 |
| 36 | + |
| 37 | +| Variable | Type | Example | Description | |
| 38 | +|----------|------|---------|-------------| |
| 39 | +| `$task_title` | string | `Kubernetes Dependency Blast Radius` | Human-readable title | |
| 40 | +| `$customer_prompt` | string | `Find all repos that import...` | Customer-framed task prompt | |
| 41 | +| `$context_description` | string | `You are a platform engineer...` | Background/role context | |
| 42 | +| `$local_repo_description` | string | `The local /workspace contains kubernetes/kubernetes` | What's available locally | |
| 43 | +| `$mcp_repos_description` | string | `- sg-benchmarks/kubernetes-client-go...` | MCP-only repos bullet list | |
| 44 | +| `$evaluation_criteria` | string | `- Recall of affected repos...` | What scoring checks | |
| 45 | + |
| 46 | +### eval.sh.j2 |
| 47 | + |
| 48 | +| Variable | Type | Example | Description | |
| 49 | +|----------|------|---------|-------------| |
| 50 | +| `$task_id` | string | `CCX-dep-trace-001` | Task ID for logging | |
| 51 | + |
| 52 | +### task_spec.json.j2 |
| 53 | + |
| 54 | +| Variable | Type | Example | Description | |
| 55 | +|----------|------|---------|-------------| |
| 56 | +| `$task_id` | string | `CCX-dep-trace-001` | Task identifier | |
| 57 | +| `$task_family` | string | `cross-repo-dep-trace` | Task family | |
| 58 | +| `$use_case_id` | int | `1` | Use case ID | |
| 59 | +| `$category` | string | `A` | Category A-J | |
| 60 | +| `$mcp_suite` | string | `ccb_mcp_crossrepo_tracing` | Suite name | |
| 61 | +| `$user_story` | string | `As a platform engineer...` | PRD user story | |
| 62 | +| `$constraints_json` | JSON array | `["Must cite file paths"]` | Constraint list as JSON string | |
| 63 | +| `$success_definition` | string | `Agent identifies all affected repos` | Success criteria | |
| 64 | +| `$seed_prompt` | string | `Find all repos importing...` | Curation seed prompt | |
| 65 | +| `$repo_set_id` | string | `kubernetes-ecosystem` | Fixture ID | |
| 66 | +| `$required_files_json` | JSON array | `[]` | Oracle file list (empty until curated) | |
| 67 | +| `$required_symbols_json` | JSON array | `[]` | Oracle symbol list (empty until curated) | |
| 68 | +| `$dependency_chains_json` | JSON array | `[]` | Oracle chains (empty until curated) | |
| 69 | +| `$evaluation_modes_json` | JSON array | `["deterministic"]` | Evaluation modes | |
| 70 | +| `$evaluation_checks_json` | JSON array | `[{"type":"file_set_match",...}]` | Check configurations | |
| 71 | + |
| 72 | +### Dockerfile.j2 |
| 73 | + |
| 74 | +| Variable | Type | Example | Description | |
| 75 | +|----------|------|---------|-------------| |
| 76 | +| `$language_packages` | string | `golang-go` | apt packages for the language | |
| 77 | +| `$clone_commands` | string | `RUN git clone...` | Shell commands to clone local repos | |
| 78 | + |
| 79 | +### Dockerfile.sg_only.j2 |
| 80 | + |
| 81 | +| Variable | Type | Example | Description | |
| 82 | +|----------|------|---------|-------------| |
| 83 | +| `$task_id` | string | `CCX-dep-trace-001` | Task ID for Dockerfile comment | |
| 84 | + |
| 85 | +## Generation |
| 86 | + |
| 87 | +Templates are filled by `scripts/generate_mcp_unique_tasks.py`: |
| 88 | + |
| 89 | +```bash |
| 90 | +python3 scripts/generate_mcp_unique_tasks.py --use-case-ids 1 --out benchmarks/ |
| 91 | +python3 scripts/generate_mcp_unique_tasks.py --category A --dry-run |
| 92 | +python3 scripts/generate_mcp_unique_tasks.py --all --curate-oracle |
| 93 | +``` |
| 94 | + |
| 95 | +## Layout |
| 96 | + |
| 97 | +Generated task directories follow: |
| 98 | + |
| 99 | +``` |
| 100 | +benchmarks/<mcp_suite>/<task_slug>/ |
| 101 | +├── environment/ |
| 102 | +│ ├── Dockerfile (baseline: clones local repos) |
| 103 | +│ └── Dockerfile.sg_only |
| 104 | +├── tests/ |
| 105 | +│ ├── eval.sh |
| 106 | +│ ├── oracle_checks.py (copied from scripts/ccb_metrics/oracle_checks.py) |
| 107 | +│ ├── task_spec.json |
| 108 | +│ └── oracle_answer.json (populated by curate_oracle.py) |
| 109 | +├── task.toml |
| 110 | +└── instruction.md |
| 111 | +``` |
0 commit comments