You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: symbol ranking, smart snippets, and edit decision card (#40)
Cleaned up the edit decision card and sharpened search ranking.
When you search for a symbol name, the file that defines it now ranks above files
that just use it. Snippets include a scope header (// ClassName.methodName) so you
see context without reading extra lines. And the preflight response for edit intent
is now lean and actionable: ready, nextAction, patterns to follow/avoid, caller
coverage ("3/5 callers in results" so you know what you haven't looked at), and
concrete next steps in whatWouldHelp when you need more evidence.
Removed the internal fields (evidenceLock, riskLevel, confidence) that leaked into
the output. The decision card is stable by design — agents can rely on field names
staying put.
- SEARCH-01: definition-first boost (+15%) for EXACT_NAME intent
- SEARCH-01: symbol-level dedup (keeps highest-scoring chunk per symbolPath)
- SEARCH-02: scope headers on symbol-aware snippets
- PREF-01-04: clean decision card with ready, nextAction, patterns, impact, whatWouldHelp
- PREF-02: caller coverage tracking ("X/Y callers in results")
- PREF-03: concrete next-step recommendations when evidence is thin
Documentation updated to match the new output shape. No any types. 219 tests pass.
Cleaned up the edit decision card and sharpened search ranking for exact-name queries.
8
+
9
+
### Added
10
+
11
+
-**Definition-first ranking (SEARCH-01)**: For exact-name queries (PascalCase/camelCase), the file that *defines* a symbol now ranks above files that merely use it. Symbol-level dedup ensures multiple methods from the same class don't clog the top slots.
12
+
-**Smart snippets with scope headers (SEARCH-02)**: When `includeSnippets: true`, code chunks from symbol-aware analysis include a scope comment header (`// ClassName.methodName`) before the snippet, giving structural context without extra disk reads.
13
+
-**Clean decision card (PREF-01-04)**: The preflight response for `intent="edit"|"refactor"|"migrate"` is now a decision card: `ready`, `nextAction` (if not ready), `warnings`, `patterns` (do/avoid capped at 3), `bestExample` (top golden file), `impact` (caller coverage + top files), and `whatWouldHelp`. Internal fields like `evidenceLock`, `riskLevel`, `confidence` are no longer exposed.
14
+
-**Impact coverage gating (PREF-02)**: When result files have known callers (from import graph), the card shows caller coverage: "X/Y callers in results". Low coverage (< 40% with > 3 total callers) triggers an epistemic stress alert.
15
+
-**whatWouldHelp recommendations (PREF-03)**: When `ready=false`, concrete next steps appear: search more specifically, call `get_team_patterns`, search for uncovered callers, or check memories. Each is actionable in 1-2 sentences.
16
+
17
+
### Changed
18
+
19
+
-**Preflight shape**: `{ ready, reason?, ... }` → `{ ready, nextAction?, warnings?, patterns?, bestExample?, impact?, whatWouldHelp? }`. `reason` renamed to `nextAction` for clarity. No breaking changes to `ready` (stays top-level).
20
+
21
+
### Fixed
22
+
23
+
- Agents no longer parse unstable internal fields. Preflight output is stable by design.
24
+
- Snippets now include scope context, reducing ambiguity for symbol-heavy edits.
25
+
26
+
## [Unreleased]
27
+
5
28
### Added
6
29
7
30
-**Index versioning (Phase 06)**: Index artifacts are versioned via `index-meta.json`. Mixed-version indexes are never served; version mismatches or corruption trigger automatic rebuild.
Copy file name to clipboardExpand all lines: MOTIVATION.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,7 +49,7 @@ Correct the agent once. Record the decision. From then on, it surfaces in search
49
49
50
50
### Evidence gating
51
51
52
-
Before an edit, the agent gets a curated "preflight" check from three sources (code, patterns, memories). If evidence is thin or contradictory, the response tells the AI Agent to look for more evidence with a concrete next step. This is the difference between "confident assumption" and "informed decision."
52
+
Before an edit, the response includes a decision card. `ready: true` means there's enough evidence from the codebase, patterns, and team memory to proceed. `ready: false` comes with `whatWouldHelp` — specific searches to run, specific files to check, or calls to `get_team_patterns` that would close the gap. The card also surfaces caller coverage: if you're editing a function that five files import but only two of them appear in your results, you know which ones you haven't looked at yet (`coverage: "2/5 callers in results"`). This is the difference between "confident assumption" and "informed decision."
Copy file name to clipboardExpand all lines: README.md
+19-3Lines changed: 19 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -122,14 +122,30 @@ This is where it all comes together. One call returns:
122
122
-**Relationships** per result: `importedByCount` and `hasTests` (condensed) + **hints** (capped ranked callers, consumers, tests)
123
123
-**Related memories**: up to 3 team decisions, gotchas, and failures matched to the query
124
124
-**Search quality**: `ok` or `low_confidence` with confidence score and `hint` when low
125
-
-**Preflight**: `ready` (boolean) + `reason`when evidence is thin. Pass `intent="edit"` to get the full preflight card. If search quality is low, `ready` is always `false`.
125
+
-**Preflight**: `ready` (boolean) with decision card when `intent="edit"|"refactor"|"migrate"`. Shows `nextAction` (if not ready), `warnings`, `patterns` (do/avoid), `bestExample`, `impact` (caller coverage), and `whatWouldHelp` (next steps). If search quality is low, `ready` is always `false`.
126
126
127
127
Snippets are opt-in (`includeSnippets: true`). Default output is lean — if the agent wants code, it calls `read_file`.
6.**Contamination control** — test file filtering for non-test queries.
39
-
7.**File deduplication** — best chunk per file.
40
-
8.**Stage-2 reranking** — cross-encoder (`Xenova/ms-marco-MiniLM-L-6-v2`) triggers when the score between the top files are very close. CPU-only, top-10 bounded.
41
-
9.**Result enrichment** — compact type (`componentType:layer`), pattern momentum (`trend` Rising/Declining only, Stable omitted), `patternWarning`, condensed relationships (`importedByCount`/`hasTests`), structured hints (capped callers/consumers/tests ranked by frequency), related memories (capped to 3), search quality assessment with `hint` when low confidence.
37
+
5.**Definition-first boost** — for EXACT_NAME intent, results matching the symbol name get +15% score boost (e.g., defining file ranks above using files).
7.**Contamination control** — test file filtering for non-test queries.
40
+
8.**File deduplication** — best chunk per file.
41
+
9.**Symbol-level deduplication** — within each `symbolPath` group, keep only the highest-scoring chunk (prevents duplicate methods from same class clogging results).
42
+
10.**Stage-2 reranking** — cross-encoder (`Xenova/ms-marco-MiniLM-L-6-v2`) triggers when the score between the top files are very close. CPU-only, top-10 bounded.
43
+
11.**Result enrichment** — compact type (`componentType:layer`), pattern momentum (`trend` Rising/Declining only, Stable omitted), `patternWarning`, condensed relationships (`importedByCount`/`hasTests`), structured hints (capped callers/consumers/tests ranked by frequency), scope header for symbol-aware snippets (`// ClassName.methodName`), related memories (capped to 3), search quality assessment with `hint` when low confidence.
42
44
43
45
### Defaults
44
46
@@ -47,29 +49,56 @@ Ordered by execution:
47
49
-**Embedding model**: Granite (`ibm-granite/granite-embedding-30m-english`, 8192 token context) via `@huggingface/transformers` v3
48
50
-**Vector DB**: LanceDB with cosine distance
49
51
50
-
## Preflight (Edit Intent)
51
-
52
-
Returned as `preflight` when search `intent` is `edit`, `refactor`, or `migrate`. Also returned for default searches when intelligence is available.
53
-
54
-
Output: `{ ready: boolean, reason?: string }`
55
-
56
-
-`ready`: whether evidence is sufficient to proceed with edits
57
-
-`reason`: when `ready` is false, explains why (e.g., "Search quality is low", "Insufficient pattern evidence")
52
+
## Decision Card (Edit Intent)
53
+
54
+
Returned as `preflight` when search `intent` is `edit`, `refactor`, or `migrate`.
55
+
56
+
**Output shape:**
57
+
58
+
```typescript
59
+
{
60
+
ready: boolean;
61
+
nextAction?:string; // Only when ready=false; what to search for next
62
+
warnings?:string[]; // Failure memories (capped at 3)
63
+
patterns?: {
64
+
do: string[]; // Top 3 preferred patterns with adoption %
65
+
avoid: string[]; // Top 3 declining patterns
66
+
};
67
+
bestExample?:string; // Top 1 golden file (path format)
68
+
impact?: {
69
+
coverage: string; // "X/Y callers in results"
70
+
files: string[]; // Top 3 impact candidates (files importing results)
71
+
};
72
+
whatWouldHelp?:string[]; // Concrete next steps (max 4) when ready=false
73
+
}
74
+
```
75
+
76
+
**Fields explained:**
77
+
78
+
-`ready`: boolean, whether evidence is sufficient to proceed
-`warnings`: failure memories from team (auto-surfaces past mistakes)
81
+
-`patterns.do`: patterns the team is adopting, ranked by adoption %
82
+
-`patterns.avoid`: declining patterns, ranked by % (useful for migrations)
83
+
-`bestExample`: exemplar file for the area under edit
84
+
-`impact.coverage`: shows caller visibility ("3/5 callers in results" means 2 callers weren't searched yet)
85
+
-`impact.files`: which files import the results (helps find blind spots)
86
+
-`whatWouldHelp`: specific next searches, tool calls, or files to check that would close evidence gaps
58
87
59
88
### How `ready` is determined
60
89
61
90
1.**Evidence triangulation** — scores code match (45%), pattern alignment (30%), and memory support (25%). Needs combined score ≥ 40 to pass.
62
-
2.**Epistemic stress check** — if pattern conflicts, stale memories, or thin evidenceare detected, `ready` is set to false with an abstain signal.
63
-
3.**Search quality gate** — if `searchQuality.status` is `low_confidence`, `ready` is forced to false regardless of evidence scores. This prevents the "confidently wrong" problem where evidence counts look good but retrieval quality is poor.
91
+
2.**Epistemic stress check** — if pattern conflicts, stale memories, thin evidence, or low caller coverage are detected, `ready` is set to false.
92
+
3.**Search quality gate** — if `searchQuality.status` is `low_confidence`, `ready` is forced to false regardless of evidence scores. This prevents the "confidently wrong" problem.
64
93
65
-
### Internal analysis (not in output, used to compute `ready`)
94
+
### Internal signals (not in output, feed `ready` computation)
0 commit comments