Skip to content

Commit 46f42cc

Browse files
sjarmakclaude
andcommitted
feat: US-018 - MCP retrieval metrics extractor (baseline + MCP compatible)
Implements scripts/ccb_metrics/retrieval.py — stdlib-only oracle coverage extractor for both baseline and MCP-Full agent configs. Key functions: - load_oracle_items(task_spec_path): loads required_files, required_symbols, dependency_chain steps from task_spec.json artifacts.oracle - extract_retrieval_metrics(task_dir, oracle_items): parses trajectory.json or claude-code.txt to compute oracle_coverage, time_to_first_oracle_hit_ms, unique_repos/orgs_touched, and tool_call_counts split by MCP vs local MCP hit detection via read_file (repo+path), find_references/go_to_definition (repo+path+symbol). Local hit detection via Read (path suffix), Grep/Glob (symbol in pattern), Bash (path in cmd). Repo names normalized to strip github.com/ host prefix for consistent deduplication. 18 doctests pass; py_compile succeeds; CLI writes retrieval_metrics.json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 5969025 commit 46f42cc

File tree

3 files changed

+679
-1
lines changed

3 files changed

+679
-1
lines changed

ralph-mcp-unique/prd.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -569,7 +569,7 @@
569569
"python3 -m py_compile succeeds"
570570
],
571571
"priority": 18,
572-
"passes": false,
572+
"passes": true,
573573
"notes": "Key decision (Q10): oracle coverage counts items found via any method. This means baseline CAN score non-zero if it finds oracle items in local repos. The comparison is fair: both configs measured by same metric. MCP advantage shows up in coverage of mcp_only repos (baseline can't access them)."
574574
},
575575
{

ralph-mcp-unique/progress.txt

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -464,3 +464,30 @@
464464
- ccb_mcp_platform is a new suite directory — ensure aggregate_status.py and generate_manifest.py recognize it
465465
- criteria.json rubric uses AAA pattern: Accurate/Attributed/Actionable for each metric dimension
466466
---
467+
[2026-02-20 21:27:23 UTC] Iteration 8 no story markers found
468+
[2026-02-20 21:27:23 UTC] Iteration 8 complete
469+
[2026-02-20 21:27:25 UTC] Iteration 9 started
470+
471+
## 2026-02-20 - US-018: MCP retrieval metrics extractor (baseline + MCP compatible)
472+
- Created `scripts/ccb_metrics/retrieval.py` — stdlib-only Python library
473+
- `load_oracle_items(task_spec_path)` → list of {type, repo, path, symbol?} dicts from artifacts.oracle
474+
- Loads required_files, required_symbols, dependency_chain steps
475+
- `extract_retrieval_metrics(task_dir, oracle_items)` → dict with:
476+
- oracle_coverage (float 0.0-1.0), oracle_items_found, oracle_items_total
477+
- time_to_first_oracle_hit_ms (from first transcript timestamp to first hit)
478+
- unique_repos_touched, unique_orgs_touched (both normalized, host prefix stripped)
479+
- tool_call_counts, mcp_tool_counts, local_tool_counts
480+
- MCP hit detection: `read_file` (repo+path match), `find_references`/`go_to_definition` (repo+path+symbol)
481+
- Local hit detection: `Read` (path suffix match), `Grep`/`Glob` (symbol in pattern), `Bash` (path in cmd)
482+
- Parses both trajectory.json (primary) and claude-code.txt JSONL (fallback)
483+
- `_normalize_repo()`: strips github.com/ prefix from repo names in search queries for consistent dedup
484+
- Standalone CLI: `python3 retrieval.py --task-dir ... --task-spec ... --output ... --verbose`
485+
- 18 doctests all pass; py_compile succeeds
486+
- Files changed: `scripts/ccb_metrics/retrieval.py` (new)
487+
- **Learnings for future iterations:**
488+
- SG keyword_search queries use `repo:^github.com/org/repo$` format but MCP read_file uses `org/repo` — normalize on extraction
489+
- `_path_matches()` suffix matching is key for local tool hits (container absolute paths vs oracle relative paths)
490+
- time_to_first_oracle_hit_ms = 0.0 when first tool call IS the hit (not None)
491+
- tool_call_counts sorts by name for deterministic output
492+
- For dependency_chain steps, dedup against required_files before adding to avoid double-counting
493+
---

0 commit comments

Comments
 (0)