feat: symbol ranking, smart snippets, and edit decision card (#40)

PatrickSys · web-flow · commit 03964b3f40cc · 2026-02-21T16:27:25.000+01:00
Cleaned up the edit decision card and sharpened search ranking.

When you search for a symbol name, the file that defines it now ranks above files
that just use it. Snippets include a scope header (// ClassName.methodName) so you
see context without reading extra lines. And the preflight response for edit intent
is now lean and actionable: ready, nextAction, patterns to follow/avoid, caller
coverage ("3/5 callers in results" so you know what you haven't looked at), and
concrete next steps in whatWouldHelp when you need more evidence.

Removed the internal fields (evidenceLock, riskLevel, confidence) that leaked into
the output. The decision card is stable by design — agents can rely on field names
staying put.

- SEARCH-01: definition-first boost (+15%) for EXACT_NAME intent
- SEARCH-01: symbol-level dedup (keeps highest-scoring chunk per symbolPath)
- SEARCH-02: scope headers on symbol-aware snippets
- PREF-01-04: clean decision card with ready, nextAction, patterns, impact, whatWouldHelp
- PREF-02: caller coverage tracking ("X/Y callers in results")
- PREF-03: concrete next-step recommendations when evidence is thin

Documentation updated to match the new output shape. No any types. 219 tests pass.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,29 @@
 
 ## [Unreleased]
 
+## [Unshipped - Phase 09] - High-Signal Search + Decision Card
+
+Cleaned up the edit decision card and sharpened search ranking for exact-name queries.
+
+### Added
+
+- **Definition-first ranking (SEARCH-01)**: For exact-name queries (PascalCase/camelCase), the file that *defines* a symbol now ranks above files that merely use it. Symbol-level dedup ensures multiple methods from the same class don't clog the top slots.
+- **Smart snippets with scope headers (SEARCH-02)**: When `includeSnippets: true`, code chunks from symbol-aware analysis include a scope comment header (`// ClassName.methodName`) before the snippet, giving structural context without extra disk reads.
+- **Clean decision card (PREF-01-04)**: The preflight response for `intent="edit"|"refactor"|"migrate"` is now a decision card: `ready`, `nextAction` (if not ready), `warnings`, `patterns` (do/avoid capped at 3), `bestExample` (top golden file), `impact` (caller coverage + top files), and `whatWouldHelp`. Internal fields like `evidenceLock`, `riskLevel`, `confidence` are no longer exposed.
+- **Impact coverage gating (PREF-02)**: When result files have known callers (from import graph), the card shows caller coverage: "X/Y callers in results". Low coverage (< 40% with > 3 total callers) triggers an epistemic stress alert.
+- **whatWouldHelp recommendations (PREF-03)**: When `ready=false`, concrete next steps appear: search more specifically, call `get_team_patterns`, search for uncovered callers, or check memories. Each is actionable in 1-2 sentences.
+
+### Changed
+
+- **Preflight shape**: `{ ready, reason?, ... }` → `{ ready, nextAction?, warnings?, patterns?, bestExample?, impact?, whatWouldHelp? }`. `reason` renamed to `nextAction` for clarity. No breaking changes to `ready` (stays top-level).
+
+### Fixed
+
+- Agents no longer parse unstable internal fields. Preflight output is stable by design.
+- Snippets now include scope context, reducing ambiguity for symbol-heavy edits.
+
+## [Unreleased]
+
 ### Added
 
 - **Index versioning (Phase 06)**: Index artifacts are versioned via `index-meta.json`. Mixed-version indexes are never served; version mismatches or corruption trigger automatic rebuild.
diff --git a/MOTIVATION.md b/MOTIVATION.md
@@ -49,7 +49,7 @@ Correct the agent once. Record the decision. From then on, it surfaces in search
 
 ### Evidence gating
 
-Before an edit, the agent gets a curated "preflight" check from three sources (code, patterns, memories). If evidence is thin or contradictory, the response tells the AI Agent to look for more evidence with a concrete next step. This is the difference between "confident assumption" and "informed decision."
+Before an edit, the response includes a decision card. `ready: true` means there's enough evidence from the codebase, patterns, and team memory to proceed. `ready: false` comes with `whatWouldHelp` — specific searches to run, specific files to check, or calls to `get_team_patterns` that would close the gap. The card also surfaces caller coverage: if you're editing a function that five files import but only two of them appear in your results, you know which ones you haven't looked at yet (`coverage: "2/5 callers in results"`). This is the difference between "confident assumption" and "informed decision."
 
 ### Guardrails via frozen eval + regressions
 
diff --git a/README.md b/README.md
@@ -122,14 +122,30 @@ This is where it all comes together. One call returns:
 - **Relationships** per result: `importedByCount` and `hasTests` (condensed) + **hints** (capped ranked callers, consumers, tests)
 - **Related memories**: up to 3 team decisions, gotchas, and failures matched to the query
 - **Search quality**: `ok` or `low_confidence` with confidence score and `hint` when low
-- **Preflight**: `ready` (boolean) + `reason` when evidence is thin. Pass `intent="edit"` to get the full preflight card. If search quality is low, `ready` is always `false`.
+- **Preflight**: `ready` (boolean) with decision card when `intent="edit"|"refactor"|"migrate"`. Shows `nextAction` (if not ready), `warnings`, `patterns` (do/avoid), `bestExample`, `impact` (caller coverage), and `whatWouldHelp` (next steps). If search quality is low, `ready` is always `false`.
 
 Snippets are opt-in (`includeSnippets: true`). Default output is lean — if the agent wants code, it calls `read_file`.
 
 ```json
 {
   "searchQuality": { "status": "ok", "confidence": 0.72 },
-  "preflight": { "ready": true },
+  "preflight": {
+    "ready": false,
+    "nextAction": "2 of 5 callers aren't in results — search for src/app.module.ts",
+    "patterns": {
+      "do": ["HttpInterceptorFn — 97%", "standalone components — 84%"],
+      "avoid": ["constructor injection — 3% (declining)"]
+    },
+    "bestExample": "src/auth/auth.interceptor.ts",
+    "impact": {
+      "coverage": "3/5 callers in results",
+      "files": ["src/app.module.ts", "src/boot.ts"]
+    },
+    "whatWouldHelp": [
+      "Search for src/app.module.ts to cover the main caller",
+      "Call get_team_patterns for auth/ injection patterns"
+    ]
+  },
   "results": [
     {
       "file": "src/auth/auth.interceptor.ts:1-20",
@@ -171,7 +187,7 @@ Record a decision once. It surfaces automatically in search results and prefligh
 
 | Tool                           | What it does                                                                                |
 | ------------------------------ | ------------------------------------------------------------------------------------------- |
-| `search_codebase`              | Hybrid search with enrichment + preflight + ranked relationship hints. Pass `intent="edit"` for edit readiness check. |
+| `search_codebase`              | Hybrid search + decision card. Pass `intent="edit"` to get `ready`, `nextAction`, patterns, caller coverage, and `whatWouldHelp`. |
 | `get_team_patterns`            | Pattern frequencies, golden files, conflict detection                                      |
 | `get_symbol_references`        | Find concrete references to a symbol (usageCount + top snippets + confidence + completeness) |
 | `remember`                     | Record a convention, decision, gotcha, or failure                                          |
diff --git a/docs/capabilities.md b/docs/capabilities.md
@@ -10,7 +10,7 @@ Technical reference for what `codebase-context` ships today. For the user-facing
 
 | Tool                    | Input                                                             | Output                                                                                                                                                                                                                  |
 | ----------------------- | ----------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `search_codebase`       | `query`, optional `intent`, `limit`, `filters`, `includeSnippets` | Ranked results (`file`, `summary`, `score`, `type`, `trend`, `patternWarning`, `relationships`, `hints`) + `searchQuality` (with `hint` when low confidence) + `preflight` ({ready, reason}). Hints capped at 3 per category. |
+| `search_codebase`       | `query`, optional `intent`, `limit`, `filters`, `includeSnippets` | Ranked results (`file`, `summary`, `score`, `type`, `trend`, `patternWarning`, `relationships`, `hints`) + `searchQuality` + decision card (`ready`, `nextAction`, `patterns`, `bestExample`, `impact`, `whatWouldHelp`) when `intent="edit"`. Hints capped at 3 per category. |
 | `get_team_patterns`     | optional `category`                                               | Pattern frequencies, trends, golden files, conflicts                                                                                                                                 |
 | `get_symbol_references` | `symbol`, optional `limit`                                        | Concrete symbol usage evidence: `usageCount` + top usage snippets + `confidence` ("syntactic") + `isComplete` boolean                                                                |
 | `remember`              | `type`, `category`, `memory`, `reason`                            | Persists to `.codebase-context/memory.json`                                                                                                                                          |
@@ -34,11 +34,13 @@ Ordered by execution:
 2. **Query expansion** — bounded domain term expansion for conceptual queries.
 3. **Dual retrieval** — keyword (Fuse.js) + semantic (local embeddings or OpenAI).
 4. **RRF fusion** — Reciprocal Rank Fusion (k=60) across all retrieval channels.
-5. **Structure-aware boosting** — import centrality, composition root boost, path overlap, definition demotion for action queries.
-6. **Contamination control** — test file filtering for non-test queries.
-7. **File deduplication** — best chunk per file.
-8. **Stage-2 reranking** — cross-encoder (`Xenova/ms-marco-MiniLM-L-6-v2`) triggers when the score between the top files are very close. CPU-only, top-10 bounded.
-9. **Result enrichment** — compact type (`componentType:layer`), pattern momentum (`trend` Rising/Declining only, Stable omitted), `patternWarning`, condensed relationships (`importedByCount`/`hasTests`), structured hints (capped callers/consumers/tests ranked by frequency), related memories (capped to 3), search quality assessment with `hint` when low confidence.
+5. **Definition-first boost** — for EXACT_NAME intent, results matching the symbol name get +15% score boost (e.g., defining file ranks above using files).
+6. **Structure-aware boosting** — import centrality, composition root boost, path overlap, definition demotion for action queries.
+7. **Contamination control** — test file filtering for non-test queries.
+8. **File deduplication** — best chunk per file.
+9. **Symbol-level deduplication** — within each `symbolPath` group, keep only the highest-scoring chunk (prevents duplicate methods from same class clogging results).
+10. **Stage-2 reranking** — cross-encoder (`Xenova/ms-marco-MiniLM-L-6-v2`) triggers when the score between the top files are very close. CPU-only, top-10 bounded.
+11. **Result enrichment** — compact type (`componentType:layer`), pattern momentum (`trend` Rising/Declining only, Stable omitted), `patternWarning`, condensed relationships (`importedByCount`/`hasTests`), structured hints (capped callers/consumers/tests ranked by frequency), scope header for symbol-aware snippets (`// ClassName.methodName`), related memories (capped to 3), search quality assessment with `hint` when low confidence.
 
 ### Defaults
 
@@ -47,29 +49,56 @@ Ordered by execution:
 - **Embedding model**: Granite (`ibm-granite/granite-embedding-30m-english`, 8192 token context) via `@huggingface/transformers` v3
 - **Vector DB**: LanceDB with cosine distance
 
-## Preflight (Edit Intent)
-
-Returned as `preflight` when search `intent` is `edit`, `refactor`, or `migrate`. Also returned for default searches when intelligence is available.
-
-Output: `{ ready: boolean, reason?: string }`
-
-- `ready`: whether evidence is sufficient to proceed with edits
-- `reason`: when `ready` is false, explains why (e.g., "Search quality is low", "Insufficient pattern evidence")
+## Decision Card (Edit Intent)
+
+Returned as `preflight` when search `intent` is `edit`, `refactor`, or `migrate`.
+
+**Output shape:**
+
+```typescript
+{
+  ready: boolean;
+  nextAction?: string;        // Only when ready=false; what to search for next
+  warnings?: string[];        // Failure memories (capped at 3)
+  patterns?: {
+    do: string[];             // Top 3 preferred patterns with adoption %
+    avoid: string[];          // Top 3 declining patterns
+  };
+  bestExample?: string;       // Top 1 golden file (path format)
+  impact?: {
+    coverage: string;         // "X/Y callers in results"
+    files: string[];          // Top 3 impact candidates (files importing results)
+  };
+  whatWouldHelp?: string[];   // Concrete next steps (max 4) when ready=false
+}
+```
+
+**Fields explained:**
+
+- `ready`: boolean, whether evidence is sufficient to proceed
+- `nextAction`: actionable reason why `ready=false` (e.g., "2 of 5 callers missing")
+- `warnings`: failure memories from team (auto-surfaces past mistakes)
+- `patterns.do`: patterns the team is adopting, ranked by adoption %
+- `patterns.avoid`: declining patterns, ranked by % (useful for migrations)
+- `bestExample`: exemplar file for the area under edit
+- `impact.coverage`: shows caller visibility ("3/5 callers in results" means 2 callers weren't searched yet)
+- `impact.files`: which files import the results (helps find blind spots)
+- `whatWouldHelp`: specific next searches, tool calls, or files to check that would close evidence gaps
 
 ### How `ready` is determined
 
 1. **Evidence triangulation** — scores code match (45%), pattern alignment (30%), and memory support (25%). Needs combined score ≥ 40 to pass.
-2. **Epistemic stress check** — if pattern conflicts, stale memories, or thin evidence are detected, `ready` is set to false with an abstain signal.
-3. **Search quality gate** — if `searchQuality.status` is `low_confidence`, `ready` is forced to false regardless of evidence scores. This prevents the "confidently wrong" problem where evidence counts look good but retrieval quality is poor.
+2. **Epistemic stress check** — if pattern conflicts, stale memories, thin evidence, or low caller coverage are detected, `ready` is set to false.
+3. **Search quality gate** — if `searchQuality.status` is `low_confidence`, `ready` is forced to false regardless of evidence scores. This prevents the "confidently wrong" problem.
 
-### Internal analysis (not in output, used to compute `ready`)
+### Internal signals (not in output, feed `ready` computation)
 
-- Risk level from circular deps + impact breadth + failure memories
+- Risk level from circular deps, impact breadth, and failure memories
 - Preferred/avoid patterns from team pattern analysis
-- Golden files by pattern density
-- Impact candidates from import graph
-- Failure warnings from related memories
+- Golden files ranked by pattern density
+- Caller coverage from import graph (X of Y callers appearing in results)
 - Pattern conflicts when two patterns in the same category are both > 20% adoption
+- Confidence decay of related memories
 
 ## Memory System
 
diff --git a/src/core/search.ts b/src/core/search.ts
@@ -693,6 +693,21 @@ export class CodebaseSearcher {
       })
       .sort((a, b) => b.score - a.score);
 
+    // SEARCH-01: Definition-first boost for EXACT_NAME intent
+    // Boost results where symbolName matches query (case-insensitive)
+    if (intent === 'EXACT_NAME') {
+      const queryNormalized = query.toLowerCase();
+      for (const result of scoredResults) {
+        const symbolName = result.metadata?.symbolName;
+        if (symbolName && symbolName.toLowerCase() === queryNormalized) {
+          result.score *= 1.15; // +15% boost for definition
+        }
+      }
+      // Re-sort after boost
+      scoredResults.sort((a, b) => b.score - a.score);
+    }
+
+    // File-level deduplication
     const seenFiles = new Set<string>();
     const deduped: SearchResult[] = [];
     for (const result of scoredResults) {
@@ -702,7 +717,36 @@ export class CodebaseSearcher {
       deduped.push(result);
       if (deduped.length >= limit) break;
     }
-    const finalResults = deduped;
+
+    // SEARCH-01: Symbol-level deduplication
+    // Within each symbol group (symbolPath), keep only the highest-scoring chunk
+    const seenSymbols = new Map<string, SearchResult>();
+    const symbolDeduped: SearchResult[] = [];
+    for (const result of deduped) {
+      const symbolPath = result.metadata?.symbolPath;
+      if (!symbolPath) {
+        // No symbol info, keep as-is
+        symbolDeduped.push(result);
+        continue;
+      }
+
+      const symbolPathKey = Array.isArray(symbolPath) ? symbolPath.join('.') : String(symbolPath);
+      const existing = seenSymbols.get(symbolPathKey);
+      if (!existing || result.score > existing.score) {
+        if (existing) {
+          // Replace lower-scoring version
+          const idx = symbolDeduped.indexOf(existing);
+          if (idx >= 0) {
+            symbolDeduped[idx] = result;
+          }
+        } else {
+          symbolDeduped.push(result);
+        }
+        seenSymbols.set(symbolPathKey, result);
+      }
+    }
+
+    const finalResults = symbolDeduped;
 
     if (
       isNonTestQuery &&
diff --git a/src/preflight/evidence-lock.ts b/src/preflight/evidence-lock.ts
@@ -25,6 +25,7 @@ export interface EvidenceLock {
   gaps?: string[];
   nextAction?: string;
   epistemicStress?: EpistemicStress;
+  whatWouldHelp?: string[];
 }
 
 interface PatternConflict {
@@ -41,6 +42,8 @@ interface BuildEvidenceLockInput {
   patternConflicts?: PatternConflict[];
   /** When search quality is low_confidence, evidence lock MUST block edits. */
   searchQualityStatus?: 'ok' | 'low_confidence';
+  /** Impact coverage: number of known callers covered by results */
+  impactCoverage?: { covered: number; total: number };
 }
 
 function strengthFactor(strength: EvidenceStrength): number {
@@ -162,6 +165,17 @@ export function buildEvidenceLock(input: BuildEvidenceLockInput): EvidenceLock {
     stressTriggers.push('Insufficient evidence: most evidence sources are empty');
   }
 
+  // Trigger: low caller coverage 
+  if (
+    input.impactCoverage &&
+    input.impactCoverage.total > 3 &&
+    input.impactCoverage.covered / input.impactCoverage.total < 0.4
+  ) {
+    stressTriggers.push(
+      `Low caller coverage: only ${input.impactCoverage.covered} of ${input.impactCoverage.total} callers appear in results`
+    );
+  }
+
   let epistemicStress: EpistemicStress | undefined;
   if (stressTriggers.length > 0) {
     const level: EpistemicStress['level'] =
@@ -195,6 +209,41 @@ export function buildEvidenceLock(input: BuildEvidenceLockInput): EvidenceLock {
     (!epistemicStress || !epistemicStress.abstain) &&
     input.searchQualityStatus !== 'low_confidence';
 
+  //  Generate whatWouldHelp recommendations
+  const whatWouldHelp: string[] = [];
+  if (!readyToEdit) {
+    // Code evidence weak/missing
+    if (codeStrength === 'weak' || codeStrength === 'missing') {
+      whatWouldHelp.push(
+        'Search with a more specific query targeting the implementation files'
+      );
+    }
+
+    // Pattern evidence missing
+    if (patternsStrength === 'missing') {
+      whatWouldHelp.push('Call get_team_patterns to see what patterns apply to this area');
+    }
+
+    // Low caller coverage with many callers
+    if (
+      input.impactCoverage &&
+      input.impactCoverage.total > 3 &&
+      input.impactCoverage.covered / input.impactCoverage.total < 0.4
+    ) {
+      const uncoveredCallers = input.impactCoverage.total - input.impactCoverage.covered;
+      if (uncoveredCallers > 0) {
+        whatWouldHelp.push(
+          `Search specifically for uncovered callers to check ${Math.min(2, uncoveredCallers)} more files`
+        );
+      }
+    }
+
+    // Memory evidence missing + failure warnings
+    if (memoriesStrength === 'missing' && input.failureWarnings.length > 0) {
+      whatWouldHelp.push('Review related memories with get_memory to understand past issues');
+    }
+  }
+
   return {
     mode: 'triangulated',
     status,
@@ -203,6 +252,7 @@ export function buildEvidenceLock(input: BuildEvidenceLockInput): EvidenceLock {
     sources,
     ...(gaps.length > 0 && { gaps }),
     ...(nextAction && { nextAction }),
-    ...(epistemicStress && { epistemicStress })
+    ...(epistemicStress && { epistemicStress }),
+    ...(whatWouldHelp.length > 0 && { whatWouldHelp: whatWouldHelp.slice(0, 4) })
   };
 }
diff --git a/src/tools/search-codebase.ts b/src/tools/search-codebase.ts