Skip to content

Commit 7c931cb

Browse files
sjarmakclaude
andcommitted
feat: US-008 - Agent-based oracle curation tool
Implements scripts/curate_oracle.py — a Sourcegraph-powered oracle discovery tool that automatically generates exhaustive oracle_answer.json files for MCP-unique benchmark tasks. Key features: - Calls SG GraphQL API (stdlib urllib) for file and symbol search - Curates file_set_match, symbol_resolution, dependency_chain oracles - Incremental: merges new findings with existing oracle_answer.json - Rate limiting with exponential backoff on 429 responses - --verify mode runs validate_mcp_task_instance.py post-curation - --dry-run shows planned queries without API calls - Writes oracle_curation_log.json for full query auditability Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent d7aff64 commit 7c931cb

3 files changed

Lines changed: 788 additions & 1 deletion

File tree

ralph-mcp-unique/prd.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@
199199
"python3 -m py_compile scripts/curate_oracle.py succeeds"
200200
],
201201
"priority": 8,
202-
"passes": false,
202+
"passes": true,
203203
"notes": "This is the key automation that makes closed-world oracles feasible without human involvement. The tool should be thorough: for a file_set_match oracle, it should search EVERY repo in the fixture, not just the ones it expects to find results in. The curation log provides auditability. May not find 100% of items but should get close for well-scoped tasks. Run it, review the log, re-run if gaps are spotted."
204204
},
205205
{

ralph-mcp-unique/progress.txt

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,29 @@
160160
[2026-02-20 20:21:26 UTC] Iteration 1 complete
161161
[2026-02-20 20:21:28 UTC] Iteration 2 started
162162

163+
## 2026-02-20 - US-008: Agent-based oracle curation tool
164+
- Created `scripts/curate_oracle.py` (stdlib-only: urllib for SG API)
165+
- CLI: `--task-dir DIR`, `--task-spec PATH`, `--verify`, `--verbose`, `--dry-run`, `--max-results`
166+
- Implements SourcegraphClient class with graphql(), search_files(), search_symbols() methods
167+
- Oracle curation strategies: curate_file_set_match, curate_symbol_resolution, curate_dependency_chain, curate_provenance, curate_keyword_presence
168+
- Writes oracle_answer.json: {files, symbols, chains, chain, text, _metadata} compatible with oracle_checks.py
169+
- Writes oracle_curation_log.json: {task_id, sg_url, curation_entries, sg_request_log}
170+
- Incremental mode: merge_oracle_answers() deduplicates and merges new findings into existing oracle
171+
- Rate limiting: 0.25s between requests, 3-retry exponential backoff on 429/URLError
172+
- --verify: runs validate_mcp_task_instance.py on the curated oracle
173+
- --dry-run: shows planned queries without calling SG API
174+
- Project root discovery: walks up from task_dir AND from CWD to find fixtures/ directory
175+
- py_compile: OK, --help works, dry-run works, live SG API tested (requires SOURCEGRAPH_ACCESS_TOKEN)
176+
- Files changed: `scripts/curate_oracle.py` (new)
177+
- **Learnings for future iterations:**
178+
- SG GraphQL API: `/.api/graphql` with `Authorization: token {token}` header
179+
- File search: `query SearchFiles($query: String!)` with `... on FileMatch` fragment
180+
- Symbol search: prefix query with `type:symbol` for dedicated symbol search endpoint
181+
- Project root detection must try from CWD, not just from task_dir (temp files break otherwise)
182+
- oracle_answer.json "chain" (flat list) vs "chains" (array of chain objects) — both needed for oracle_checks.py compat
183+
- SG returns 403 without token, not 401 — handle gracefully as empty results
184+
---
185+
163186
## 2026-02-20 - US-003: SG indexing verification (completion)
164187
- Verified all 7 sg-benchmarks mirrors are now indexed in Sourcegraph via list_repos + keyword_search
165188
- sg-benchmarks/kubernetes-client-go ✓ (indexed, searchable)

0 commit comments

Comments
 (0)