Skip to content

Commit e98a5a1

Browse files
author
razvan
committed
refactor: extract pkg/scoring + smart_search Execute pipeline
- pkg/scoring: consolidate duplicate scoring primitives from 3 packages - FilterTokens, LexicalMatchScore, TokenMatchRatio (from search/ and engine/) - ScopeDir, PathProximity, LongestCommonPath (from tools/path_scope) - smart_search: break 309-line Execute() into pipeline stages - normalizeInput, runParallelSearch, applyFilters (mode+score+path) - buildResponseMeta, serializeResults, resultToMap, noResultsResponse - Execute() reduced to ~33 lines - smart_search_path_scope: thin adapter delegating to pkg/scoring - tools/README: simplified to navigation index pointing to docs/tools/ - docs/tools/: moved from internal/service/tools/ (by user)
1 parent cd66566 commit e98a5a1

20 files changed

Lines changed: 1286 additions & 663 deletions
Lines changed: 306 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,306 @@
1+
# SmartSearch Execute() Refactor Plan
2+
3+
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
4+
5+
**Goal:** Break the 309-line `Execute()` monolith into small, testable, single-responsibility functions following Go community standards (Uber Guide, Effective Go). Extract shared scoring logic into `pkg/scoring`.
6+
7+
**Architecture:**
8+
1. Create `pkg/scoring` for reusable pure functions duplicated across packages
9+
2. Extract 6 logical pipeline stages into named functions in separate files
10+
3. `Execute()` becomes a ~40-line orchestrator calling each stage
11+
12+
**Tech Stack:** Pure Go refactoring, no new dependencies.
13+
14+
---
15+
16+
## Duplications Found (via RagCode MCP)
17+
18+
| Function | Location 1 | Location 2 | Action |
19+
|----------|-----------|-----------|--------|
20+
| `filterTokens()` | `internal/service/search/search.go:214` | `engine/engine_fallback_search.go:248` |`pkg/scoring.FilterTokens()` |
21+
| `lexicalMatchScore()` | `internal/service/search/search.go:225` | `engine/engine_fallback_search.go:239` |`pkg/scoring.LexicalMatchScore()` |
22+
| `fallbackTokenMatchRatio()` | `engine/engine_fallback_search.go:225` ||`pkg/scoring.TokenMatchRatio()` |
23+
| `pathProximity()` + helpers | `tools/smart_search_path_scope.go` ||`pkg/scoring.PathProximity()` |
24+
| `longestCommonPath()` | `tools/smart_search_path_scope.go` ||`pkg/scoring.LongestCommonPath()` |
25+
26+
---
27+
28+
### Task 0: Create `pkg/scoring` — shared scoring primitives
29+
30+
**Files:**
31+
- Create: `pkg/scoring/scoring.go` — exported pure functions
32+
- Create: `pkg/scoring/path.go` — path proximity scoring
33+
- Create: `pkg/scoring/scoring_test.go` — tests
34+
- Create: `pkg/scoring/path_test.go` — path tests
35+
- Modify: `internal/service/search/search.go` — replace local with `scoring.FilterTokens`, `scoring.LexicalMatchScore`
36+
- Modify: `internal/service/engine/engine_fallback_search.go` — replace local with `scoring.*`
37+
- Modify: `internal/service/tools/smart_search_path_scope.go` — import from `scoring` or move there
38+
- Delete: `internal/service/tools/smart_search_path_scope_test.go` — tests move to `pkg/scoring/path_test.go`
39+
40+
**`pkg/scoring/scoring.go`:**
41+
```go
42+
package scoring
43+
44+
// FilterTokens removes very short tokens (≤2 chars) from a token list.
45+
func FilterTokens(tokens []string) []string
46+
47+
// LexicalMatchScore counts total token occurrences in content (frequency-weighted).
48+
func LexicalMatchScore(content string, tokens []string) float64
49+
50+
// TokenMatchRatio returns the fraction of tokens found in text [0, 1].
51+
func TokenMatchRatio(text string, tokens []string) float64
52+
```
53+
54+
**`pkg/scoring/path.go`:**
55+
```go
56+
package scoring
57+
58+
// PathProximity computes a score multiplier based on how close
59+
// a result's file path is to a reference scope directory.
60+
func PathProximity(resultPath, scopePath string) float64
61+
62+
// ScopeDir extracts the reference directory from a file path.
63+
func ScopeDir(filePath string) string
64+
65+
// LongestCommonPath returns the longest shared directory prefix.
66+
func LongestCommonPath(a, b string) string
67+
```
68+
69+
**Step 1:** Create `pkg/scoring/scoring.go` with exported functions
70+
**Step 2:** Create `pkg/scoring/path.go` with path functions (from smart_search_path_scope.go)
71+
**Step 3:** Create tests
72+
**Step 4:** Update `search/search.go` to use `scoring.FilterTokens`, `scoring.LexicalMatchScore`
73+
**Step 5:** Update `engine/engine_fallback_search.go` to use `scoring.*`
74+
**Step 6:** Update `tools/smart_search_path_scope.go` to delegate to `scoring.PathProximity`
75+
**Step 7:** Run: `go test ./... -count=1 -race`
76+
**Step 8:** Commit: `refactor: extract pkg/scoring for shared scoring primitives`
77+
78+
---
79+
80+
## Current Structure Analysis
81+
82+
The `Execute()` method in `smart_search.go` (lines 100-408) has these responsibilities crammed into one function:
83+
84+
| Lines | Responsibility | Target |
85+
|-------|---------------|--------|
86+
| 100-115 | Input validation + defaults | `normalizeInput()` |
87+
| 117-186 | Parallel search fan-out + result collection | `runParallelSearch()` |
88+
| 188-191 | Error handling | `handleSearchError()` (exists) |
89+
| 193-238 | Post-processing pipeline (merge, mode filter, score filter, path scope) | `applyFilters()` |
90+
| 243-254 | Empty results check | inline (3 lines) |
91+
| 257-306 | Response metadata construction (compact/full detection, fallback, warnings) | `buildResponseMeta()` |
92+
| 308-406 | Result serialization (compact vs full) + telemetry + stale detection | `serializeResults()` |
93+
94+
---
95+
96+
### Task 1: Extract `normalizeInput()`
97+
98+
**Files:**
99+
- Modify: `internal/service/tools/smart_search.go`
100+
- Test: `internal/service/tools/smart_search_test.go` (add test)
101+
102+
**What:** Extract lines 100-115 into a pure function that validates and normalizes input.
103+
104+
```go
105+
// normalizeInput validates the search input and applies defaults.
106+
// Returns the effective query, limit, and modified input.
107+
func normalizeInput(input SmartSearchInput, defaultLimit int) (string, int, SmartSearchInput, error) {
108+
query := strings.TrimSpace(input.Query)
109+
if query == "" {
110+
return "", 0, input, fmt.Errorf("query parameter is required")
111+
}
112+
limit := defaultLimit
113+
if input.Limit > 0 {
114+
limit = input.Limit
115+
}
116+
if input.Mode == "strict_docs" {
117+
input.IncludeDocs = true
118+
}
119+
return query, limit, input, nil
120+
}
121+
```
122+
123+
**Step 1:** Extract the function (pure, no receiver needed)
124+
**Step 2:** Replace lines 100-115 with call to `normalizeInput`
125+
**Step 3:** Run: `go test ./internal/service/tools/... -count=1 -race`
126+
**Step 4:** Commit
127+
128+
---
129+
130+
### Task 2: Extract `searchMetadata` struct + `runParallelSearch()`
131+
132+
**Files:**
133+
- Modify: `internal/service/tools/smart_search.go`
134+
135+
**What:** Extract lines 117-191 into a method. The local `searchResult` type and all the goroutine + channel logic moves out.
136+
137+
```go
138+
// searchMetadata holds workspace context extracted from search results.
139+
type searchMetadata struct {
140+
workspaceRoot string
141+
workspaceID string
142+
collection string
143+
language string
144+
detectionSource string
145+
mismatchRisk string
146+
}
147+
148+
// parallelSearchResult holds the output of runParallelSearch.
149+
type parallelSearchResult struct {
150+
semantic *engine.SearchCodeResult
151+
hybrid *engine.SearchCodeResult
152+
meta searchMetadata
153+
err error // first non-nil error from search strategies
154+
}
155+
156+
// runParallelSearch executes semantic and hybrid searches concurrently,
157+
// collects results, and extracts workspace metadata.
158+
func (t *SmartSearchTool) runParallelSearch(ctx context.Context, filePath, query string, limit int, includeDocs bool) parallelSearchResult
159+
```
160+
161+
**Step 1:** Extract struct + function
162+
**Step 2:** Replace lines 117-191 in `Execute()` with one call
163+
**Step 3:** Run tests
164+
**Step 4:** Commit
165+
166+
---
167+
168+
### Task 3: Extract `applyFilters()`
169+
170+
**Files:**
171+
- Modify: `internal/service/tools/smart_search.go`
172+
- Test: `internal/service/tools/smart_search_test.go` (add test)
173+
174+
**What:** Extract lines 193-241 (merge + mode filter + score filter + path scope + doc grouping) into a single pipeline function.
175+
176+
```go
177+
// filterConfig holds the filtering parameters for post-processing.
178+
type filterConfig struct {
179+
Mode string
180+
MinScore float32
181+
FilePath string // for path scoping
182+
}
183+
184+
// applyFilters runs the full post-processing pipeline on merged results:
185+
// mode filtering → score threshold → path scoping → doc grouping.
186+
func (t *SmartSearchTool) applyFilters(merged []mergedResult, cfg filterConfig) []mergedResult
187+
```
188+
189+
This function is ~40 lines calling the existing helpers: `applyModeFilter`, `applyScoreFilter`, `applyPathScoping`, `groupDocsByTree`.
190+
191+
**Step 1:** Extract `applyModeFilter()` (pure function, ~15 lines)
192+
**Step 2:** Extract `applyScoreFilter()` (pure function, ~20 lines)
193+
**Step 3:** Create `applyFilters()` that chains them
194+
**Step 4:** Replace lines 193-241 in Execute() with one call
195+
**Step 5:** Run tests
196+
**Step 6:** Commit
197+
198+
---
199+
200+
### Task 4: Extract `buildResponseMeta()`
201+
202+
**Files:**
203+
- Modify: `internal/service/tools/smart_search.go`
204+
205+
**What:** Extract lines 257-306 (response object construction, fallback detection, warnings) into a function.
206+
207+
```go
208+
// buildResponseMeta constructs the ToolResponse shell with metadata, warnings,
209+
// and messaging based on whether results are from fallback or vector search.
210+
func (t *SmartSearchTool) buildResponseMeta(meta searchMetadata, merged []mergedResult, useCompact bool) ToolResponse
211+
```
212+
213+
**Step 1:** Extract function
214+
**Step 2:** Replace lines 257-306 in Execute()
215+
**Step 3:** Run tests
216+
**Step 4:** Commit
217+
218+
---
219+
220+
### Task 5: Extract `serializeResults()`
221+
222+
**Files:**
223+
- Modify: `internal/service/tools/smart_search.go`
224+
225+
**What:** Extract lines 308-406 (compact vs full serialization, telemetry, stale file detection) into a function.
226+
227+
```go
228+
// serializeResults populates the ToolResponse with either compact or full result data,
229+
// calculates telemetry savings, and detects stale indexed files.
230+
func serializeResults(response *ToolResponse, merged []mergedResult, useCompact, isFallback bool)
231+
```
232+
233+
**Step 1:** Extract function
234+
**Step 2:** Replace lines 308-406 in Execute()
235+
**Step 3:** Run tests
236+
**Step 4:** Commit
237+
238+
---
239+
240+
### Task 6: Verify final `Execute()` orchestrator
241+
242+
**Files:**
243+
- Verify: `internal/service/tools/smart_search.go`
244+
245+
**What:** The final `Execute()` should be ~40 lines:
246+
247+
```go
248+
func (t *SmartSearchTool) Execute(ctx context.Context, input SmartSearchInput) (string, error) {
249+
query, limit, input, err := normalizeInput(input, t.searchLimit)
250+
if err != nil {
251+
return "", err
252+
}
253+
254+
sr := t.runParallelSearch(ctx, input.FilePath, query, limit, input.IncludeDocs)
255+
if sr.semantic == nil && sr.hybrid == nil {
256+
return t.handleSearchError(sr.err, sr.meta.workspaceRoot, sr.meta.workspaceID)
257+
}
258+
259+
merged := t.mergeResults(sr.semantic, sr.hybrid, limit)
260+
merged = t.applyFilters(merged, filterConfig{
261+
Mode: input.Mode, MinScore: input.MinScore, FilePath: input.FilePath,
262+
})
263+
264+
if len(merged) == 0 {
265+
return noResultsResponse(query, sr.meta)
266+
}
267+
268+
useCompact := len(merged) > compactResultCap || merged[0].score < highConfidenceThreshold
269+
if input.IncludeFullContent {
270+
useCompact = false
271+
}
272+
273+
response := t.buildResponseMeta(sr.meta, merged, useCompact)
274+
serializeResults(&response, merged, useCompact, sr.meta.collection == "fallback")
275+
276+
return response.JSON()
277+
}
278+
```
279+
280+
**Step 1:** Verify Execute() is ~40 lines
281+
**Step 2:** Run full test suite: `go test ./... -count=1 -race`
282+
**Step 3:** Commit: `refactor: break SmartSearch Execute into pipeline stages`
283+
284+
---
285+
286+
## File Organization (after refactor)
287+
288+
```
289+
pkg/scoring/
290+
├── scoring.go # FilterTokens, LexicalMatchScore, TokenMatchRatio
291+
├── scoring_test.go # Tests for scoring functions
292+
├── path.go # ScopeDir, PathProximity, LongestCommonPath, CountSeparators
293+
└── path_test.go # Tests for path proximity (moved from tools/)
294+
295+
internal/service/tools/
296+
├── smart_search.go # Execute orchestrator + Register + types (~120 lines)
297+
├── smart_search_pipeline.go # normalizeInput, runParallelSearch, applyFilters, buildResponseMeta, serializeResults
298+
├── smart_search_path_scope.go # thin wrapper calling scoring.PathProximity (applyPathScoping for []mergedResult)
299+
├── smart_search_test.go # Existing + new pipeline tests
300+
└── smart_search_doc_grouping.go # groupDocsByTree + readLines (moved from smart_search.go)
301+
```
302+
303+
**Packages updated to use `pkg/scoring`:**
304+
- `internal/service/search/search.go``scoring.FilterTokens`, `scoring.LexicalMatchScore`
305+
- `internal/service/engine/engine_fallback_search.go``scoring.FilterTokens`, `scoring.LexicalMatchScore`, `scoring.TokenMatchRatio`
306+
- `internal/service/tools/smart_search_path_scope.go``scoring.PathProximity`, `scoring.ScopeDir`

docs/tools/doc_call_hierarchy.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Tool: `rag_call_hierarchy`
2+
3+
**Source File**: `call_hierarchy.go`
4+
5+
`rag_call_hierarchy` is an advanced navigation tool that allows for automated, recursive exploration of the code's call graph. It bridges the gap between individual symbol lookups and full architectural understanding.
6+
7+
## Query Mechanisms
8+
9+
This tool operates as a stateful, recursive orchestrator over the **Code Graph Relations**. It primarily uses vector-less `ExactSearch` to traverse AST edges.
10+
11+
1. **Incoming Mode (`direction="incoming"`)**:
12+
* Identifies all "Caller" nodes by querying the database for any chunk where the `Relations` array contains a `target_name` matching the current symbol.
13+
* DB Filter: `{ "relations[].target_name": "<SymbolName>" }`
14+
15+
2. **Outgoing Mode (`direction="outgoing"`)**:
16+
* Identifies "Callee" nodes by searching for the symbol itself and extracting all `target_name` entries from its own `Relations` payload metadata.
17+
18+
## Recursive Resolution Engine
19+
20+
The tool implements a depth-first search (DFS) with a configurable `depth` (default 2, max 5).
21+
22+
- **Circular Dependency Detection**: It maintains a `visited` map per-session to detect and flag recursions (e.g., Function A calling Function B which calls Function A). These are visualized with a 🔄 marker.
23+
- **Polyglot Stitching**: The engine automatically searches across multiple language collections (Go, Python, JS, PHP). If a Go function calls a JavaScript microservice endpoint (indexed as a relation), the tool will attempt to bridge the call graph across languages.
24+
25+
## Features
26+
- **Structural Tree Metadata**: Returns a JSON-ready tree structure (`CallNode`) containing the signature, type, package, and file path for every node in the hierarchy.
27+
- **Visual Markdown Representation**: Provides an indented tree view using standard tree-drawing characters (`└─`) for immediate human readability in the AI's chat window.
28+
- **Context Preservation**: Like all search tools, it attaches `ContextMetadata` with detection sources and potential branch mismatch risks.
29+
30+
## Use Cases
31+
- **Impact Analysis**: *"If I change this core utility function, which high-level services will be affected 3 levels up?"*
32+
- **Execution Flow Tracking**: *"How does a request travel from the Gin Router to the SQL Repository?"*
33+
- **Refactoring**: Understanding the complexity of a module's dependencies before attempting to decouple it.
34+
35+
## Telemetry
36+
This tool does not currently log byte-savings telemetry, as its primary value is "Cognitive Speed" (reducing the number of manual search turns) rather than direct data minimization.

docs/tools/doc_evaluate_ragcode.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Tool: `rag_evaluate`
2+
3+
**Source File**: `evaluate_ragcode.go`
4+
5+
`rag_evaluate` is a unique introspective tool designed for continuous improvement of the RagCode MCP server from the perspective of the very AI instances that use it.
6+
7+
## Query Mechanisms
8+
9+
This tool does not query the code graph repository. Instead, it queries the **internal state and configuration** of the running MCP Engine and dynamically orchestrates a prompt requesting self-reflection from the AI context.
10+
11+
1. **State Diagnostics**: Resolves the current workspace boundary.
12+
2. **Health Aggregation**: Pings the underlying Ollama embeddings server and Qdrant backend to log their ping/status responses.
13+
3. **Prompt Generation**: Dynamically constructs a Markdown questionnaire prompting the AI to write down its pain points, cache misses, or hallucination events during the active session.
14+
15+
## Features
16+
- **Unchained Execution**: Unlike search tools, `rag_evaluate` executes successfully even if the `DetectContext` phase fails or falls-back. It intentionally allows evaluation independent of workspace bounds.
17+
- **Diagnostic Visibility**: It dumps the active system configuration (e.g. `LLM=qwen3ro`, `Embed=nomic-embed-text`) into the LLM's context window. This teaches the AI exactly what hardware/models are actively providing its context, allowing for highly specific telemetry feedbacks like *"The nomic embeddings model seems to fail mapping the PHP namespaces correctly"*.
18+
- **Agentic Loop**: Acts as an anchor point for automated End-to-End tests where test agents utilize tools, and run `rag_evaluate` at the end of their task to assert tool qualities.

docs/tools/doc_find_usages.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Tool: `rag_find_usages`
2+
3+
**Source File**: `find_usages.go`
4+
5+
`rag_find_usages` provides deterministic identification of where and how a symbol (function, class, module or type) is utilized across the entirety of a workspace.
6+
7+
## Heritage
8+
This tool replaced the legacy fuzzy-search tool `find_implementations`. Previously, discovering where an interface was implemented relied heavily on string proximity and heuristic keywords embedded in LLM queries (e.g. "Engine implementation usage"). This proved brittle and imprecise.
9+
10+
## Query Mechanisms
11+
12+
`rag_find_usages` is built directly on top of the **Code Graph AST (Relations Matrix)** stored inside Qdrant during index time.
13+
14+
It completely **bypasses the Semantic Embedder (LLM)** and issues an Exact Match database filter:
15+
16+
```json
17+
{
18+
"relations[].target_name": "<SymbolName>"
19+
}
20+
```
21+
22+
By querying the exact path of the payload JSON, the database operates efficiently and precisely, scanning the `relations` arrays of every indexed AST node (both methods, classes or standalone functions) to identify callers, implementers, and dependent types referencing the `SymbolName`.
23+
24+
## Features
25+
- **Deterministic and Robust**: Resolves complex usage paths correctly regardless of lexical variations or generic terms used by developers. Zero semantic hallucinations!
26+
- **Relation Context Mapping**: Not only does it return the line numbers and snippets of where something is used, but it also extracts the specific `<type>` of the relation match (e.g. `call` or `implements`), explaining **why** the snippet was returned.
27+
- **Polyglot Execution**: Iterates across possible language indexes (`go`, `python`, `javascript`, `php`) mapped to the requested workspace dynamically.
28+
29+
## Use Cases
30+
- Perfect for safely finding references prior to refactoring a function.
31+
- Finding which structs satisfy a given Go interface.
32+
- Discovering what modules call a legacy utility function.
33+
34+
## Telemetry
35+
This tool measures the baseline sizes of all parent source files associated with every snippet returned. The byte-savings are calculated dynamically against the slim snippets included in the ToolResponse and sent via the `Context.Telemetry` pipeline.

0 commit comments

Comments
 (0)