feat: US-018 - MCP retrieval metrics extractor (baseline + MCP compatible)

sjarmak · claude · sjarmak · commit 46f42ccb3ee7 · 2026-02-20T21:33:30.000Z
Implements scripts/ccb_metrics/retrieval.py — stdlib-only oracle coverage
extractor for both baseline and MCP-Full agent configs.

Key functions:
- load_oracle_items(task_spec_path): loads required_files, required_symbols,
  dependency_chain steps from task_spec.json artifacts.oracle
- extract_retrieval_metrics(task_dir, oracle_items): parses trajectory.json
  or claude-code.txt to compute oracle_coverage, time_to_first_oracle_hit_ms,
  unique_repos/orgs_touched, and tool_call_counts split by MCP vs local

MCP hit detection via read_file (repo+path), find_references/go_to_definition
(repo+path+symbol). Local hit detection via Read (path suffix), Grep/Glob
(symbol in pattern), Bash (path in cmd). Repo names normalized to strip
github.com/ host prefix for consistent deduplication.

18 doctests pass; py_compile succeeds; CLI writes retrieval_metrics.json.

Co-Authored-By: Claude Sonnet 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/ralph-mcp-unique/prd.json b/ralph-mcp-unique/prd.json
@@ -569,7 +569,7 @@
         "python3 -m py_compile succeeds"
       ],
       "priority": 18,
-      "passes": false,
+      "passes": true,
       "notes": "Key decision (Q10): oracle coverage counts items found via any method. This means baseline CAN score non-zero if it finds oracle items in local repos. The comparison is fair: both configs measured by same metric. MCP advantage shows up in coverage of mcp_only repos (baseline can't access them)."
     },
     {
diff --git a/ralph-mcp-unique/progress.txt b/ralph-mcp-unique/progress.txt
@@ -464,3 +464,30 @@
   - ccb_mcp_platform is a new suite directory — ensure aggregate_status.py and generate_manifest.py recognize it
   - criteria.json rubric uses AAA pattern: Accurate/Attributed/Actionable for each metric dimension
 ---
+[2026-02-20 21:27:23 UTC] Iteration 8 no story markers found
+[2026-02-20 21:27:23 UTC] Iteration 8 complete
+[2026-02-20 21:27:25 UTC] Iteration 9 started
+
+## 2026-02-20 - US-018: MCP retrieval metrics extractor (baseline + MCP compatible)
+- Created `scripts/ccb_metrics/retrieval.py` — stdlib-only Python library
+- `load_oracle_items(task_spec_path)` → list of {type, repo, path, symbol?} dicts from artifacts.oracle
+  - Loads required_files, required_symbols, dependency_chain steps
+- `extract_retrieval_metrics(task_dir, oracle_items)` → dict with:
+  - oracle_coverage (float 0.0-1.0), oracle_items_found, oracle_items_total
+  - time_to_first_oracle_hit_ms (from first transcript timestamp to first hit)
+  - unique_repos_touched, unique_orgs_touched (both normalized, host prefix stripped)
+  - tool_call_counts, mcp_tool_counts, local_tool_counts
+- MCP hit detection: `read_file` (repo+path match), `find_references`/`go_to_definition` (repo+path+symbol)
+- Local hit detection: `Read` (path suffix match), `Grep`/`Glob` (symbol in pattern), `Bash` (path in cmd)
+- Parses both trajectory.json (primary) and claude-code.txt JSONL (fallback)
+- `_normalize_repo()`: strips github.com/ prefix from repo names in search queries for consistent dedup
+- Standalone CLI: `python3 retrieval.py --task-dir ... --task-spec ... --output ... --verbose`
+- 18 doctests all pass; py_compile succeeds
+- Files changed: `scripts/ccb_metrics/retrieval.py` (new)
+- **Learnings for future iterations:**
+  - SG keyword_search queries use `repo:^github.com/org/repo$` format but MCP read_file uses `org/repo` — normalize on extraction
+  - `_path_matches()` suffix matching is key for local tool hits (container absolute paths vs oracle relative paths)
+  - time_to_first_oracle_hit_ms = 0.0 when first tool call IS the hit (not None)
+  - tool_call_counts sorts by name for deterministic output
+  - For dependency_chain steps, dedup against required_files before adding to avoid double-counting
+---
diff --git a/scripts/ccb_metrics/retrieval.py b/scripts/ccb_metrics/retrieval.py

Original file line number	Diff line number	Diff line change
`@@ -569,7 +569,7 @@`
`569`	`569`	`"python3 -m py_compile succeeds"`
`570`	`570`	`],`
`571`	`571`	`"priority": 18,`
`572`		`- "passes": false,`
	`572`	`+ "passes": true,`
`573`	`573`	`"notes": "Key decision (Q10): oracle coverage counts items found via any method. This means baseline CAN score non-zero if it finds oracle items in local repos. The comparison is fair: both configs measured by same metric. MCP advantage shows up in coverage of mcp_only repos (baseline can't access them)."`
`574`	`574`	`},`
`575`	`575`	`{`