Skip to content

Commit 03964b3

Browse files
authored
feat: symbol ranking, smart snippets, and edit decision card (#40)
Cleaned up the edit decision card and sharpened search ranking. When you search for a symbol name, the file that defines it now ranks above files that just use it. Snippets include a scope header (// ClassName.methodName) so you see context without reading extra lines. And the preflight response for edit intent is now lean and actionable: ready, nextAction, patterns to follow/avoid, caller coverage ("3/5 callers in results" so you know what you haven't looked at), and concrete next steps in whatWouldHelp when you need more evidence. Removed the internal fields (evidenceLock, riskLevel, confidence) that leaked into the output. The decision card is stable by design — agents can rely on field names staying put. - SEARCH-01: definition-first boost (+15%) for EXACT_NAME intent - SEARCH-01: symbol-level dedup (keeps highest-scoring chunk per symbolPath) - SEARCH-02: scope headers on symbol-aware snippets - PREF-01-04: clean decision card with ready, nextAction, patterns, impact, whatWouldHelp - PREF-02: caller coverage tracking ("X/Y callers in results") - PREF-03: concrete next-step recommendations when evidence is thin Documentation updated to match the new output shape. No any types. 219 tests pass.
1 parent 33616aa commit 03964b3

File tree

7 files changed

+314
-71
lines changed

7 files changed

+314
-71
lines changed

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,29 @@
22

33
## [Unreleased]
44

5+
## [Unshipped - Phase 09] - High-Signal Search + Decision Card
6+
7+
Cleaned up the edit decision card and sharpened search ranking for exact-name queries.
8+
9+
### Added
10+
11+
- **Definition-first ranking (SEARCH-01)**: For exact-name queries (PascalCase/camelCase), the file that *defines* a symbol now ranks above files that merely use it. Symbol-level dedup ensures multiple methods from the same class don't clog the top slots.
12+
- **Smart snippets with scope headers (SEARCH-02)**: When `includeSnippets: true`, code chunks from symbol-aware analysis include a scope comment header (`// ClassName.methodName`) before the snippet, giving structural context without extra disk reads.
13+
- **Clean decision card (PREF-01-04)**: The preflight response for `intent="edit"|"refactor"|"migrate"` is now a decision card: `ready`, `nextAction` (if not ready), `warnings`, `patterns` (do/avoid capped at 3), `bestExample` (top golden file), `impact` (caller coverage + top files), and `whatWouldHelp`. Internal fields like `evidenceLock`, `riskLevel`, `confidence` are no longer exposed.
14+
- **Impact coverage gating (PREF-02)**: When result files have known callers (from import graph), the card shows caller coverage: "X/Y callers in results". Low coverage (< 40% with > 3 total callers) triggers an epistemic stress alert.
15+
- **whatWouldHelp recommendations (PREF-03)**: When `ready=false`, concrete next steps appear: search more specifically, call `get_team_patterns`, search for uncovered callers, or check memories. Each is actionable in 1-2 sentences.
16+
17+
### Changed
18+
19+
- **Preflight shape**: `{ ready, reason?, ... }``{ ready, nextAction?, warnings?, patterns?, bestExample?, impact?, whatWouldHelp? }`. `reason` renamed to `nextAction` for clarity. No breaking changes to `ready` (stays top-level).
20+
21+
### Fixed
22+
23+
- Agents no longer parse unstable internal fields. Preflight output is stable by design.
24+
- Snippets now include scope context, reducing ambiguity for symbol-heavy edits.
25+
26+
## [Unreleased]
27+
528
### Added
629

730
- **Index versioning (Phase 06)**: Index artifacts are versioned via `index-meta.json`. Mixed-version indexes are never served; version mismatches or corruption trigger automatic rebuild.

MOTIVATION.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ Correct the agent once. Record the decision. From then on, it surfaces in search
4949

5050
### Evidence gating
5151

52-
Before an edit, the agent gets a curated "preflight" check from three sources (code, patterns, memories). If evidence is thin or contradictory, the response tells the AI Agent to look for more evidence with a concrete next step. This is the difference between "confident assumption" and "informed decision."
52+
Before an edit, the response includes a decision card. `ready: true` means there's enough evidence from the codebase, patterns, and team memory to proceed. `ready: false` comes with `whatWouldHelp` — specific searches to run, specific files to check, or calls to `get_team_patterns` that would close the gap. The card also surfaces caller coverage: if you're editing a function that five files import but only two of them appear in your results, you know which ones you haven't looked at yet (`coverage: "2/5 callers in results"`). This is the difference between "confident assumption" and "informed decision."
5353

5454
### Guardrails via frozen eval + regressions
5555

README.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -122,14 +122,30 @@ This is where it all comes together. One call returns:
122122
- **Relationships** per result: `importedByCount` and `hasTests` (condensed) + **hints** (capped ranked callers, consumers, tests)
123123
- **Related memories**: up to 3 team decisions, gotchas, and failures matched to the query
124124
- **Search quality**: `ok` or `low_confidence` with confidence score and `hint` when low
125-
- **Preflight**: `ready` (boolean) + `reason` when evidence is thin. Pass `intent="edit"` to get the full preflight card. If search quality is low, `ready` is always `false`.
125+
- **Preflight**: `ready` (boolean) with decision card when `intent="edit"|"refactor"|"migrate"`. Shows `nextAction` (if not ready), `warnings`, `patterns` (do/avoid), `bestExample`, `impact` (caller coverage), and `whatWouldHelp` (next steps). If search quality is low, `ready` is always `false`.
126126

127127
Snippets are opt-in (`includeSnippets: true`). Default output is lean — if the agent wants code, it calls `read_file`.
128128

129129
```json
130130
{
131131
"searchQuality": { "status": "ok", "confidence": 0.72 },
132-
"preflight": { "ready": true },
132+
"preflight": {
133+
"ready": false,
134+
"nextAction": "2 of 5 callers aren't in results — search for src/app.module.ts",
135+
"patterns": {
136+
"do": ["HttpInterceptorFn — 97%", "standalone components — 84%"],
137+
"avoid": ["constructor injection — 3% (declining)"]
138+
},
139+
"bestExample": "src/auth/auth.interceptor.ts",
140+
"impact": {
141+
"coverage": "3/5 callers in results",
142+
"files": ["src/app.module.ts", "src/boot.ts"]
143+
},
144+
"whatWouldHelp": [
145+
"Search for src/app.module.ts to cover the main caller",
146+
"Call get_team_patterns for auth/ injection patterns"
147+
]
148+
},
133149
"results": [
134150
{
135151
"file": "src/auth/auth.interceptor.ts:1-20",
@@ -171,7 +187,7 @@ Record a decision once. It surfaces automatically in search results and prefligh
171187

172188
| Tool | What it does |
173189
| ------------------------------ | ------------------------------------------------------------------------------------------- |
174-
| `search_codebase` | Hybrid search with enrichment + preflight + ranked relationship hints. Pass `intent="edit"` for edit readiness check. |
190+
| `search_codebase` | Hybrid search + decision card. Pass `intent="edit"` to get `ready`, `nextAction`, patterns, caller coverage, and `whatWouldHelp`. |
175191
| `get_team_patterns` | Pattern frequencies, golden files, conflict detection |
176192
| `get_symbol_references` | Find concrete references to a symbol (usageCount + top snippets + confidence + completeness) |
177193
| `remember` | Record a convention, decision, gotcha, or failure |

docs/capabilities.md

Lines changed: 50 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Technical reference for what `codebase-context` ships today. For the user-facing
1010

1111
| Tool | Input | Output |
1212
| ----------------------- | ----------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
13-
| `search_codebase` | `query`, optional `intent`, `limit`, `filters`, `includeSnippets` | Ranked results (`file`, `summary`, `score`, `type`, `trend`, `patternWarning`, `relationships`, `hints`) + `searchQuality` (with `hint` when low confidence) + `preflight` ({ready, reason}). Hints capped at 3 per category. |
13+
| `search_codebase` | `query`, optional `intent`, `limit`, `filters`, `includeSnippets` | Ranked results (`file`, `summary`, `score`, `type`, `trend`, `patternWarning`, `relationships`, `hints`) + `searchQuality` + decision card (`ready`, `nextAction`, `patterns`, `bestExample`, `impact`, `whatWouldHelp`) when `intent="edit"`. Hints capped at 3 per category. |
1414
| `get_team_patterns` | optional `category` | Pattern frequencies, trends, golden files, conflicts |
1515
| `get_symbol_references` | `symbol`, optional `limit` | Concrete symbol usage evidence: `usageCount` + top usage snippets + `confidence` ("syntactic") + `isComplete` boolean |
1616
| `remember` | `type`, `category`, `memory`, `reason` | Persists to `.codebase-context/memory.json` |
@@ -34,11 +34,13 @@ Ordered by execution:
3434
2. **Query expansion** — bounded domain term expansion for conceptual queries.
3535
3. **Dual retrieval** — keyword (Fuse.js) + semantic (local embeddings or OpenAI).
3636
4. **RRF fusion** — Reciprocal Rank Fusion (k=60) across all retrieval channels.
37-
5. **Structure-aware boosting** — import centrality, composition root boost, path overlap, definition demotion for action queries.
38-
6. **Contamination control** — test file filtering for non-test queries.
39-
7. **File deduplication** — best chunk per file.
40-
8. **Stage-2 reranking** — cross-encoder (`Xenova/ms-marco-MiniLM-L-6-v2`) triggers when the score between the top files are very close. CPU-only, top-10 bounded.
41-
9. **Result enrichment** — compact type (`componentType:layer`), pattern momentum (`trend` Rising/Declining only, Stable omitted), `patternWarning`, condensed relationships (`importedByCount`/`hasTests`), structured hints (capped callers/consumers/tests ranked by frequency), related memories (capped to 3), search quality assessment with `hint` when low confidence.
37+
5. **Definition-first boost** — for EXACT_NAME intent, results matching the symbol name get +15% score boost (e.g., defining file ranks above using files).
38+
6. **Structure-aware boosting** — import centrality, composition root boost, path overlap, definition demotion for action queries.
39+
7. **Contamination control** — test file filtering for non-test queries.
40+
8. **File deduplication** — best chunk per file.
41+
9. **Symbol-level deduplication** — within each `symbolPath` group, keep only the highest-scoring chunk (prevents duplicate methods from same class clogging results).
42+
10. **Stage-2 reranking** — cross-encoder (`Xenova/ms-marco-MiniLM-L-6-v2`) triggers when the score between the top files are very close. CPU-only, top-10 bounded.
43+
11. **Result enrichment** — compact type (`componentType:layer`), pattern momentum (`trend` Rising/Declining only, Stable omitted), `patternWarning`, condensed relationships (`importedByCount`/`hasTests`), structured hints (capped callers/consumers/tests ranked by frequency), scope header for symbol-aware snippets (`// ClassName.methodName`), related memories (capped to 3), search quality assessment with `hint` when low confidence.
4244

4345
### Defaults
4446

@@ -47,29 +49,56 @@ Ordered by execution:
4749
- **Embedding model**: Granite (`ibm-granite/granite-embedding-30m-english`, 8192 token context) via `@huggingface/transformers` v3
4850
- **Vector DB**: LanceDB with cosine distance
4951

50-
## Preflight (Edit Intent)
51-
52-
Returned as `preflight` when search `intent` is `edit`, `refactor`, or `migrate`. Also returned for default searches when intelligence is available.
53-
54-
Output: `{ ready: boolean, reason?: string }`
55-
56-
- `ready`: whether evidence is sufficient to proceed with edits
57-
- `reason`: when `ready` is false, explains why (e.g., "Search quality is low", "Insufficient pattern evidence")
52+
## Decision Card (Edit Intent)
53+
54+
Returned as `preflight` when search `intent` is `edit`, `refactor`, or `migrate`.
55+
56+
**Output shape:**
57+
58+
```typescript
59+
{
60+
ready: boolean;
61+
nextAction?: string; // Only when ready=false; what to search for next
62+
warnings?: string[]; // Failure memories (capped at 3)
63+
patterns?: {
64+
do: string[]; // Top 3 preferred patterns with adoption %
65+
avoid: string[]; // Top 3 declining patterns
66+
};
67+
bestExample?: string; // Top 1 golden file (path format)
68+
impact?: {
69+
coverage: string; // "X/Y callers in results"
70+
files: string[]; // Top 3 impact candidates (files importing results)
71+
};
72+
whatWouldHelp?: string[]; // Concrete next steps (max 4) when ready=false
73+
}
74+
```
75+
76+
**Fields explained:**
77+
78+
- `ready`: boolean, whether evidence is sufficient to proceed
79+
- `nextAction`: actionable reason why `ready=false` (e.g., "2 of 5 callers missing")
80+
- `warnings`: failure memories from team (auto-surfaces past mistakes)
81+
- `patterns.do`: patterns the team is adopting, ranked by adoption %
82+
- `patterns.avoid`: declining patterns, ranked by % (useful for migrations)
83+
- `bestExample`: exemplar file for the area under edit
84+
- `impact.coverage`: shows caller visibility ("3/5 callers in results" means 2 callers weren't searched yet)
85+
- `impact.files`: which files import the results (helps find blind spots)
86+
- `whatWouldHelp`: specific next searches, tool calls, or files to check that would close evidence gaps
5887

5988
### How `ready` is determined
6089

6190
1. **Evidence triangulation** — scores code match (45%), pattern alignment (30%), and memory support (25%). Needs combined score ≥ 40 to pass.
62-
2. **Epistemic stress check** — if pattern conflicts, stale memories, or thin evidence are detected, `ready` is set to false with an abstain signal.
63-
3. **Search quality gate** — if `searchQuality.status` is `low_confidence`, `ready` is forced to false regardless of evidence scores. This prevents the "confidently wrong" problem where evidence counts look good but retrieval quality is poor.
91+
2. **Epistemic stress check** — if pattern conflicts, stale memories, thin evidence, or low caller coverage are detected, `ready` is set to false.
92+
3. **Search quality gate** — if `searchQuality.status` is `low_confidence`, `ready` is forced to false regardless of evidence scores. This prevents the "confidently wrong" problem.
6493

65-
### Internal analysis (not in output, used to compute `ready`)
94+
### Internal signals (not in output, feed `ready` computation)
6695

67-
- Risk level from circular deps + impact breadth + failure memories
96+
- Risk level from circular deps, impact breadth, and failure memories
6897
- Preferred/avoid patterns from team pattern analysis
69-
- Golden files by pattern density
70-
- Impact candidates from import graph
71-
- Failure warnings from related memories
98+
- Golden files ranked by pattern density
99+
- Caller coverage from import graph (X of Y callers appearing in results)
72100
- Pattern conflicts when two patterns in the same category are both > 20% adoption
101+
- Confidence decay of related memories
73102

74103
## Memory System
75104

src/core/search.ts

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -693,6 +693,21 @@ export class CodebaseSearcher {
693693
})
694694
.sort((a, b) => b.score - a.score);
695695

696+
// SEARCH-01: Definition-first boost for EXACT_NAME intent
697+
// Boost results where symbolName matches query (case-insensitive)
698+
if (intent === 'EXACT_NAME') {
699+
const queryNormalized = query.toLowerCase();
700+
for (const result of scoredResults) {
701+
const symbolName = result.metadata?.symbolName;
702+
if (symbolName && symbolName.toLowerCase() === queryNormalized) {
703+
result.score *= 1.15; // +15% boost for definition
704+
}
705+
}
706+
// Re-sort after boost
707+
scoredResults.sort((a, b) => b.score - a.score);
708+
}
709+
710+
// File-level deduplication
696711
const seenFiles = new Set<string>();
697712
const deduped: SearchResult[] = [];
698713
for (const result of scoredResults) {
@@ -702,7 +717,36 @@ export class CodebaseSearcher {
702717
deduped.push(result);
703718
if (deduped.length >= limit) break;
704719
}
705-
const finalResults = deduped;
720+
721+
// SEARCH-01: Symbol-level deduplication
722+
// Within each symbol group (symbolPath), keep only the highest-scoring chunk
723+
const seenSymbols = new Map<string, SearchResult>();
724+
const symbolDeduped: SearchResult[] = [];
725+
for (const result of deduped) {
726+
const symbolPath = result.metadata?.symbolPath;
727+
if (!symbolPath) {
728+
// No symbol info, keep as-is
729+
symbolDeduped.push(result);
730+
continue;
731+
}
732+
733+
const symbolPathKey = Array.isArray(symbolPath) ? symbolPath.join('.') : String(symbolPath);
734+
const existing = seenSymbols.get(symbolPathKey);
735+
if (!existing || result.score > existing.score) {
736+
if (existing) {
737+
// Replace lower-scoring version
738+
const idx = symbolDeduped.indexOf(existing);
739+
if (idx >= 0) {
740+
symbolDeduped[idx] = result;
741+
}
742+
} else {
743+
symbolDeduped.push(result);
744+
}
745+
seenSymbols.set(symbolPathKey, result);
746+
}
747+
}
748+
749+
const finalResults = symbolDeduped;
706750

707751
if (
708752
isNonTestQuery &&

src/preflight/evidence-lock.ts

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ export interface EvidenceLock {
2525
gaps?: string[];
2626
nextAction?: string;
2727
epistemicStress?: EpistemicStress;
28+
whatWouldHelp?: string[];
2829
}
2930

3031
interface PatternConflict {
@@ -41,6 +42,8 @@ interface BuildEvidenceLockInput {
4142
patternConflicts?: PatternConflict[];
4243
/** When search quality is low_confidence, evidence lock MUST block edits. */
4344
searchQualityStatus?: 'ok' | 'low_confidence';
45+
/** Impact coverage: number of known callers covered by results */
46+
impactCoverage?: { covered: number; total: number };
4447
}
4548

4649
function strengthFactor(strength: EvidenceStrength): number {
@@ -162,6 +165,17 @@ export function buildEvidenceLock(input: BuildEvidenceLockInput): EvidenceLock {
162165
stressTriggers.push('Insufficient evidence: most evidence sources are empty');
163166
}
164167

168+
// Trigger: low caller coverage
169+
if (
170+
input.impactCoverage &&
171+
input.impactCoverage.total > 3 &&
172+
input.impactCoverage.covered / input.impactCoverage.total < 0.4
173+
) {
174+
stressTriggers.push(
175+
`Low caller coverage: only ${input.impactCoverage.covered} of ${input.impactCoverage.total} callers appear in results`
176+
);
177+
}
178+
165179
let epistemicStress: EpistemicStress | undefined;
166180
if (stressTriggers.length > 0) {
167181
const level: EpistemicStress['level'] =
@@ -195,6 +209,41 @@ export function buildEvidenceLock(input: BuildEvidenceLockInput): EvidenceLock {
195209
(!epistemicStress || !epistemicStress.abstain) &&
196210
input.searchQualityStatus !== 'low_confidence';
197211

212+
// Generate whatWouldHelp recommendations
213+
const whatWouldHelp: string[] = [];
214+
if (!readyToEdit) {
215+
// Code evidence weak/missing
216+
if (codeStrength === 'weak' || codeStrength === 'missing') {
217+
whatWouldHelp.push(
218+
'Search with a more specific query targeting the implementation files'
219+
);
220+
}
221+
222+
// Pattern evidence missing
223+
if (patternsStrength === 'missing') {
224+
whatWouldHelp.push('Call get_team_patterns to see what patterns apply to this area');
225+
}
226+
227+
// Low caller coverage with many callers
228+
if (
229+
input.impactCoverage &&
230+
input.impactCoverage.total > 3 &&
231+
input.impactCoverage.covered / input.impactCoverage.total < 0.4
232+
) {
233+
const uncoveredCallers = input.impactCoverage.total - input.impactCoverage.covered;
234+
if (uncoveredCallers > 0) {
235+
whatWouldHelp.push(
236+
`Search specifically for uncovered callers to check ${Math.min(2, uncoveredCallers)} more files`
237+
);
238+
}
239+
}
240+
241+
// Memory evidence missing + failure warnings
242+
if (memoriesStrength === 'missing' && input.failureWarnings.length > 0) {
243+
whatWouldHelp.push('Review related memories with get_memory to understand past issues');
244+
}
245+
}
246+
198247
return {
199248
mode: 'triangulated',
200249
status,
@@ -203,6 +252,7 @@ export function buildEvidenceLock(input: BuildEvidenceLockInput): EvidenceLock {
203252
sources,
204253
...(gaps.length > 0 && { gaps }),
205254
...(nextAction && { nextAction }),
206-
...(epistemicStress && { epistemicStress })
255+
...(epistemicStress && { epistemicStress }),
256+
...(whatWouldHelp.length > 0 && { whatWouldHelp: whatWouldHelp.slice(0, 4) })
207257
};
208258
}

0 commit comments

Comments
 (0)