You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+12-15Lines changed: 12 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,30 +8,27 @@
8
8
-**Scope headers in code snippets**: When requesting snippets (`includeSnippets: true`), each code block now starts with a comment like `// UserService.login()` so agents know where the code lives without extra file reads.
9
9
-**Edit decision card**: When searching with `intent="edit"`, `intent="refactor"`, or `intent="migrate"`, results now include a decision card telling you whether there's enough evidence to proceed safely. The card shows: whether you're ready (`ready: true/false`), what to do next if not (`nextAction`), relevant team patterns to follow, a top example file, how many callers appear in results (`impact.coverage`), and what searches would help close gaps (`whatWouldHelp`).
10
10
-**Caller coverage tracking**: The decision card shows how many of a symbol's callers are in your search results. Low coverage (less than 40% when there are lots of callers) triggers an alert so you know to search more before editing.
11
+
-**Index versioning**: Index artifacts are versioned via `index-meta.json`. Mixed-version indexes are never served; version mismatches or corruption trigger automatic rebuild.
12
+
-**Crash-safe rebuilds**: Full rebuilds write to `.staging/` and swap atomically only on success. Failed rebuilds don't corrupt the active index.
13
+
-**Relationship sidecar**: New `relationships.json` artifact containing file import graph, reverse imports, and symbol export index. Updated incrementally alongside the main index.
14
+
-**References confidence + hints**: `get_symbol_references` now includes `confidence: "syntactic"` and `isComplete: boolean` to help agents assess result completeness. `search_codebase` results now include a structured `hints` object (capped callers/consumers/tests ranked by frequency) drawn from the relationships sidecar. **`get_component_usage` removed from MCP surface (11→10 tools).** If you previously used `get_component_usage`, use `get_symbol_references` for symbol usage evidence (usageCount, top snippets, callers/consumers).
15
+
- Tree-sitter-backed symbol extraction is now used by the Generic analyzer when available (with safe fallbacks).
-**Preflight response shape**: Renamed `reason` to `nextAction` for clarity. Removed internal fields (`evidenceLock`, `riskLevel`, `confidence`) so the output is stable and doesn't change shape unexpectedly.
15
-
26
+
16
27
### Fixed
17
28
18
29
- Null-pointer crash in GenericAnalyzer when chunk content is undefined.
19
30
- Tree-sitter symbol extraction now treats node offsets as UTF-8 byte ranges and evicts cached parsers on failures/timeouts.
20
31
21
-
### More improvements (Phases 06–08)
22
-
23
-
-**Index versioning (Phase 06)**: Index artifacts are versioned via `index-meta.json`. Mixed-version indexes are never served; version mismatches or corruption trigger automatic rebuild.
24
-
-**Crash-safe rebuilds (Phase 06)**: Full rebuilds write to `.staging/` and swap atomically only on success. Failed rebuilds don't corrupt the active index.
25
-
-**Relationship sidecar (Phase 07)**: New `relationships.json` artifact containing file import graph, reverse imports, and symbol export index. Updated incrementally alongside the main index.
26
-
-**References confidence + hints (Phase 08)**: `get_symbol_references` now includes `confidence: "syntactic"` and `isComplete: boolean` to help agents assess result completeness. `search_codebase` results now include a structured `hints` object (capped callers/consumers/tests ranked by frequency) drawn from the relationships sidecar. `get_component_usage` removed from MCP surface (11→10 tools).
27
-
- Tree-sitter-backed symbol extraction is now used by the Generic analyzer when available (with safe fallbacks).
-**Pattern signals** per result: `trend` (Rising/Declining — Stable is omitted) and `patternWarning` when using legacy code
122
-
-**Relationships** per result: `importedByCount` and `hasTests` (condensed) + **hints** (capped ranked callers, consumers, tests)
122
+
-**Relationships** per result: `importedByCount` and `hasTests` (condensed) + **hints** (capped ranked callers, consumers, tests) — so you see suggested next reads and know what you haven't looked at yet
123
123
-**Related memories**: up to 3 team decisions, gotchas, and failures matched to the query
124
124
-**Search quality**: `ok` or `low_confidence` with confidence score and `hint` when low
125
125
-**Preflight**: `ready` (boolean) with decision card when `intent="edit"|"refactor"|"migrate"`. Shows `nextAction` (if not ready), `warnings`, `patterns` (do/avoid), `bestExample`, `impact` (caller coverage), and `whatWouldHelp` (next steps). If search quality is low, `ready` is always `false`.
126
126
127
-
Snippets are opt-in (`includeSnippets: true`). Default output is lean — if the agent wants code, it calls `read_file`.
127
+
Snippets are optional (`includeSnippets: true`). When enabled, snippets that have symbol metadata (e.g. from the Generic analyzer's AST chunking or Angular component chunks) start with a scope header so you know where the code lives (e.g. `// AuthService.getToken()` or `// SpotifyApiService`). Example:
128
+
129
+
```ts
130
+
// AuthService.getToken()
131
+
getToken(): string {
132
+
returnthis.token;
133
+
}
134
+
```
135
+
136
+
Default output is lean — if the agent wants code, it calls `read_file`.
128
137
129
138
```json
130
139
{
@@ -189,7 +198,7 @@ Record a decision once. It surfaces automatically in search results and prefligh
|`search_codebase`| Hybrid search + decision card. Pass `intent="edit"` to get `ready`, `nextAction`, patterns, caller coverage, and `whatWouldHelp`. |
191
200
|`get_team_patterns`| Pattern frequencies, golden files, conflict detection |
192
-
|`get_symbol_references`| Find concrete references to a symbol (usageCount + top snippets + confidence + completeness)|
201
+
|`get_symbol_references`| Find concrete references to a symbol (usageCount + top snippets). `confidence: "syntactic"` = static/source-based only; no runtime or dynamic dispatch.|
193
202
|`remember`| Record a convention, decision, gotcha, or failure |
194
203
|`get_memory`| Query team memory with confidence decay scoring |
@@ -200,7 +209,7 @@ Record a decision once. It surfaces automatically in search results and prefligh
200
209
201
210
## Evaluation Harness (`npm run eval`)
202
211
203
-
Reproducible evaluation with frozen fixtures so ranking/chunking changes are measured honestly and regressions get caught.
212
+
Reproducible evaluation with frozen fixtures so ranking/chunking changes are measured honestly and regressions get caught.**For contributors and CI:** run before releases or after changing search/ranking/chunking to guard against regressions.
204
213
205
214
- Two codebases: `npm run eval -- <codebaseA> <codebaseB>`
206
215
- Defaults: fixture A = `tests/fixtures/eval-angular-spotify.json`, fixture B = `tests/fixtures/eval-controlled.json`
@@ -214,11 +223,13 @@ npm run eval -- tests/fixtures/codebases/eval-controlled tests/fixtures/codebase
- To save a report for later comparison, redirect stdout (e.g. `pnpm run eval -- <path-to-angular-spotify> --skip-reindex > internal-docs/tests/eval-runs/angular-spotify-YYYY-MM-DD.txt`).
217
227
218
228
## How the Search Works
219
229
220
230
The retrieval pipeline is designed around one goal: give the agent the right context, not just any file that matches.
221
231
232
+
-**Definition-first ranking** - for exact-name lookups (e.g. a symbol name), the file that *defines* the symbol ranks above files that only use it.
222
233
-**Intent classification** - knows whether "AuthService" is a name lookup or "how does auth work" is conceptual. Adjusts keyword/semantic weights accordingly.
223
234
-**Hybrid fusion (RRF)** - combines keyword and semantic search using Reciprocal Rank Fusion instead of brittle score averaging.
@@ -229,13 +240,15 @@ The retrieval pipeline is designed around one goal: give the agent the right con
229
240
-**Version gating** - index artifacts are versioned; mismatches trigger automatic rebuild so mixed-version data is never served.
230
241
-**Auto-heal** - if the index corrupts, search triggers a full re-index automatically.
231
242
243
+
**Index reliability:** Rebuilds write to a staging directory and swap atomically only on success, so a failed rebuild never corrupts the active index. Version mismatches or corruption trigger an automatic full re-index (no user action required).
244
+
232
245
## Language Support
233
246
234
-
Over **30+ languages**are supported for indexing + retrieval: TypeScript/JavaScript, Python (incl `.pyi`), PHP, Ruby, Java, Kotlin (`.kt`/`.kts`), Go, Rust, C/C++ (incl `.cc`/`.cxx`), C#, Swift, Scala, Shell, plus common config/markup formats (JSON/YAML/TOML/XML, etc.).
247
+
**10 languages**have full symbol extraction (Tree-sitter): TypeScript, JavaScript, Python, Java, Kotlin, C, C++, C#, Go, Rust. **30+ languages** have indexing and retrieval coverage (keyword + semantic), including PHP, Ruby, Swift, Scala, Shell, and config/markup (JSON/YAML/TOML/XML, etc.).
235
248
236
249
Enrichment is framework-specific: right now only **Angular** has a dedicated analyzer for rich conventions/context (signals, standalone components, control flow, DI patterns).
237
250
238
-
For non-Angular projects, the **Generic** analyzer still provides broad coverage, and will use Tree-sitter symbol extraction when a grammar is available (otherwise it falls back to safe parsing).
251
+
For non-Angular projects, the **Generic** analyzer uses **AST-aligned chunking** when a Tree-sitter grammar is available: symbol-bounded chunks with **scope-aware prefixes** (e.g. `// ClassName.methodName`) so snippets show where code lives. Without a grammar it falls back to safe line-based chunking.
10 MCP tools + 1 optional resource (`codebase://context`).**Migration:**`get_component_usage` was removed; use `get_symbol_references` for symbol usage evidence.
|`get_team_patterns`| optional `category`| Pattern frequencies, trends, golden files, conflicts |
15
-
|`get_symbol_references`|`symbol`, optional `limit`| Concrete symbol usage evidence: `usageCount` + top usage snippets + `confidence`("syntactic") + `isComplete` boolean |
15
+
|`get_symbol_references`|`symbol`, optional `limit`| Concrete symbol usage evidence: `usageCount` + top usage snippets + `confidence` + `isComplete`. `confidence: "syntactic"` means static/source-based only (no runtime or dynamic dispatch). Replaces the removed `get_component_usage`.|
16
16
|`remember`|`type`, `category`, `memory`, `reason`| Persists to `.codebase-context/memory.json`|
-**Generic**: 30+ have indexing/retrieval coverage including PHP, Ruby, Swift, Scala, Shell, config/markup., 10 languages have full symbol extraction (Tree-sitter: TypeScript, JavaScript, Python, Java, Kotlin, C, C++, C#, Go, Rust).
125
125
126
126
Notes:
127
127
128
128
- Language detection covers common extensions including `.pyi`, `.kt`/`.kts`, `.cc`/`.cxx`, and config formats like `.toml`/`.xml`.
129
-
- When Tree-sitter grammars are present, the Generic analyzer can derive symbol components from Tree-sitter extraction (with fallbacks).
129
+
- When Tree-sitter grammars are present, the Generic analyzer uses AST-aligned chunking and scope-aware prefixes for symbol-aware snippets (with fallbacks).
0 commit comments