Skip to content

Commit 52ae5d2

Browse files
author
razvan
committed
fix: address Copilot PR #40 review comments
- Fix nil panic in ContextFromWorkspaceWithStatus when wctx is nil (#7) - Fix indentation in smart_search_pipeline.go (#1) - Use loaded idx instead of nil in call_hierarchy.go and find_usages.go (#3, #9) - Add backward-compat comment on JSON tag mismatch (#6) - Create fresh IndexStatus when LoadIndexStatus returns nil (#8) - Populate Elapsed field at completed/failed transitions (#2) - Throttle progress I/O writes to every 10 files (#4) - Fix test cleanup for .ragcode dir in TempDir
1 parent c10b6bf commit 52ae5d2

File tree

7 files changed

+37
-82
lines changed

7 files changed

+37
-82
lines changed

SUGGESTIONS.md

Lines changed: 4 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -1,71 +1,7 @@
1-
# Analysis: Feature Proposals & Implementations
1+
# Suggestions
22

3-
## 💡 Standout Ideas and Incremental Enhancements
3+
## Incremental indexing resets status to "starting"
44

5-
### 1. 🔄 Live Tracking for "Token Savings" & "Cost Avoided"
6-
**Concept:**
7-
- Instead of just calculating saved tokens per request ephemerally, maintain a global tracker (`~/.ragcode/savings.json`) that cumulatively stores `total_tokens_saved` across all sessions.
8-
- Provide a feature that calculates the real-world USD value of the saved tokens based on standard LLM pricing (e.g., Claude 3.5 Sonnet token costs).
9-
- Send this telemetry back to the AI under the MCP `_meta` response so the user directly sees the financial value RagCode generates (e.g., "RagCode saved you $42 this month").
5+
Când se re-indexează incremental un singur fișier, `StartIndexingAsync` suprascrie statusul la `state: "starting"` cu totul de la zero, ștergând informația că 99% din index e deja acolo și funcțional. AI-ul vede `"starting"` + `"processed": 0` și crede că nu are date.
106

11-
### 2. 🔄 O(1) Fetch via Byte Offsets
12-
**Concept:**
13-
- While extracting AST symbols, store exact `Byte Offsets` (start and end) in addition to Line Numbers.
14-
- When `rag_read_file_context` is called, instead of reading the file line-by-line or using regex, perform a strict `seek()` operation to jump straight to the exact byte. This prevents loading massive files strictly into RAM.
15-
16-
### 3. 🔄 Stable Symbol IDs
17-
**Concept:**
18-
- Expose fixed, semantic ID targets for every AST Node, such as `{file_path}::{qualified_name}#{kind}` (e.g., `pkg/parser/php/laravel/adapter.go::Parser.Extract#method`).
19-
- Instead of searching, an Agent could request the direct structure of a known unique ID.
20-
21-
### 5. 🔄 Active Symbol Summarization (During Indexing)
22-
**Concept:**
23-
- If a function or class lacks a Docstring, forward the chunk asynchronously to a cheap LLM (like Gemini Flash or Claude Haiku) *during* the indexing phase.
24-
- Pre-generate a "One Line Summary" and embed that summary instead of the raw cryptic code. This drastically improves the semantic vector matching quality for poorly documented code.
25-
26-
---
27-
28-
## 🤖 AI Agent Validated Implementations (Already Deployed)
29-
30-
Based on rigorous real-world Agent usage, the following core features have been definitively implemented to drastically reduce LLM "decision fatigue".
31-
32-
### 6. ✅ IMPLEMENTED — `rag_search`: Dual Search + Adaptive Response
33-
**Status:** Deployed in `internal/service/tools/smart_search.go`
34-
35-
**Challenge:** Agents used to guess between `mode: "exact"` and `"discovery"`. Even when they found results, pulling 5 full files instantly maxed out the context window.
36-
**Solution:**
37-
1. **Parallel Dual Search**: Executes `SearchCode` (Semantic Vector Qdrant) and `HybridSearchCode` (Exact Path/Substrings) simultaneously across Goroutines.
38-
2. **Merging & Deduplication**: Vector IDs are matched, and results are tagged by provenance (`_source: "semantic" | "hybrid" | "both"`).
39-
3. **Adaptive Formatting**:
40-
- **Compact Mode**: If >4 results are found, returns only the signatures, paths, and scores (costs ~500 tokens).
41-
- **Full Source**: Returns raw source code *only* for highly-confident, tight matches.
42-
43-
### 7. ✅ IMPLEMENTED — Indexing Status & Health Metrics + Lazy Stale Cleanup
44-
**Status:** Deployed across all `internal/service/tools/` endpoints.
45-
46-
**Challenge:** Agents would search and hallucinate code that had actually been deleted by the user simply because the Qdrant index was stale.
47-
**Solution:**
48-
- **Pre-flight Disk Verification**: `rag_search` verifies `os.Stat` before returning matches.
49-
- **Lazy Stale Cleanup**: Stale results are **filtered out** from the response (they never reach the AI). Additionally, the engine triggers an **async deletion** of all vectors for the stale file from every language collection in the workspace — a self-healing mechanism with a 10-minute dedup cooldown.
50-
- **Auto-Cleanup Warning**: The response includes a `🧹 N stale file(s) detected and filtered out. Auto-cleanup triggered.` warning giving the AI full observability.
51-
- **Chronological Awareness**: The response schema appends `index_age` (e.g., `"3 minutes ago"`) and `indexing_progress` strictly to maintain absolute validity.
52-
53-
### 8. 🔄 PROPOSED — Migrate from `langchaingo` to Native Ollama Client
54-
**Challenge:** Using `langchaingo` masks underlying context cancellations, causing deadlocks.
55-
**Solution:** Replace it fully with the native `github.com/ollama/ollama/api` which provides direct HTTP keep-alive manipulation, native batch embedding capabilities, and proper Context Propagation timeouts.
56-
57-
### 9. ✅ IMPLEMENTED — Smart Search Consolidation
58-
**Challenge:** Agents suffered from "tool overwhelm" when attempting code searches.
59-
**Solution:** Deprecated `rag_search_code` and moved everything explicitly to `rag_search`. Input schemas were simplified to `query` + `include_full_content` boolean overrides.
60-
61-
### 10. ✅ IMPLEMENTED — Markdown Documentation Indexing
62-
**Challenge:** The engine only understood codebase logic, completely blinding the AI to `README.md` architectural guidelines or implementation plans.
63-
**Solution:** Integrated advanced hierarchical chunking (`MarkdownHeaderTextSplitter`) that natively indexes Headings, Tables, and Lists while keeping overlapping sliding windows for vectors. When an AI searches via `include_docs: true`, it searches the markdown chunks simultaneously with source code.
64-
65-
### 11. ✅ IMPLEMENTED — Deep WordPress & WooCommerce Native Parsers
66-
**Challenge:** The baseline PHP Tree-sitter AST parser could not navigate the massive WordPress hook ecosystem.
67-
**Solution:** Created `pkg/parser/php/wordpress/`, a hyper-specialized sub-package that detects explicit CMS structures:
68-
- Native extraction of **Hooks** (`add_action`, `add_filter`, `do_action`).
69-
- Automatic identification of **Custom Post Types**, **Taxonomies**, and **Shortcodes**.
70-
- **WooCommerce Integration**: Specifically isolates `woocommerce_` hooks and shopping cart overrides.
71-
- **Oxygen Builder**: AST scanning for `extends OxyEl`, rendering layouts, and `ct_builder_json` dynamic components.
7+
Fix-ul corect ar fi: la indexare incrementală, nu reseta starea la `"starting"` — folosește ceva gen `"updating"` sau păstrează `"completed"` cu un sub-status. Dar asta e un issue separat, nu din PR review-ul curent.

internal/service/engine/engine.go

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -809,19 +809,29 @@ func (e *Engine) StartIndexingAsync(root, id string, changedFiles []string, recr
809809

810810
if err != nil {
811811
logger.Instance.Error("[IDX] ws=%s Background indexing failed: %v", filepath.Base(root), err)
812-
if s := indexer.LoadIndexStatus(root); s != nil {
813-
s.State = "failed"
814-
s.Error = err.Error()
815-
s.EndedAt = time.Now().UTC().Format(time.RFC3339)
816-
indexer.SaveIndexStatus(root, s)
812+
s := indexer.LoadIndexStatus(root)
813+
if s == nil {
814+
s = &indexer.IndexStatus{State: "starting"}
817815
}
816+
s.State = "failed"
817+
s.Error = err.Error()
818+
s.EndedAt = time.Now().UTC().Format(time.RFC3339)
819+
if started, pErr := time.Parse(time.RFC3339, s.StartedAt); pErr == nil {
820+
s.Elapsed = time.Since(started).Round(time.Second).String()
821+
}
822+
indexer.SaveIndexStatus(root, s)
818823
} else {
819824
logger.Instance.Info("[IDX] ✅ ws=%s Background indexing completed", filepath.Base(root))
820-
if s := indexer.LoadIndexStatus(root); s != nil {
821-
s.State = "completed"
822-
s.EndedAt = time.Now().UTC().Format(time.RFC3339)
823-
indexer.SaveIndexStatus(root, s)
825+
s := indexer.LoadIndexStatus(root)
826+
if s == nil {
827+
s = &indexer.IndexStatus{State: "starting"}
828+
}
829+
s.State = "completed"
830+
s.EndedAt = time.Now().UTC().Format(time.RFC3339)
831+
if started, pErr := time.Parse(time.RFC3339, s.StartedAt); pErr == nil {
832+
s.Elapsed = time.Since(started).Round(time.Second).String()
824833
}
834+
indexer.SaveIndexStatus(root, s)
825835
}
826836
}()
827837
}
@@ -921,6 +931,10 @@ func (e *Engine) IndexWorkspace(ctx context.Context, path string, recreate bool)
921931
ExcludePatterns: excludePatterns,
922932
Recreate: recreate,
923933
Progress: func(doneFiles, totalFiles int) {
934+
// Throttle disk I/O: write every 10 files or on the last file
935+
if doneFiles%10 != 0 && doneFiles != totalFiles {
936+
return
937+
}
924938
if s := indexer.LoadIndexStatus(wctx.Root); s != nil {
925939
s.State = "running"
926940
if s.Languages == nil {

internal/service/engine/engine_searchcode_test.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@ package engine
22

33
import (
44
"context"
5+
"os"
6+
"path/filepath"
57
"sync/atomic"
68
"testing"
79

@@ -278,6 +280,9 @@ func TestSearchCodeResumeInterruptedIndexing(t *testing.T) {
278280
rootDir := t.TempDir()
279281
eng.SetResolver(resolver.New(resolver.Dependencies{Detector: &mockDirDetector{root: rootDir}}))
280282

283+
// Clean up .ragcode dir created by auto-triggered StartIndexingAsync
284+
t.Cleanup(func() { os.RemoveAll(filepath.Join(rootDir, ".ragcode")) })
285+
281286
// Get workspace ID early
282287
wctx, _ := eng.DetectContext(context.Background(), "dummy.go")
283288
if wctx == nil {

internal/service/tools/call_hierarchy.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ func (t *CallHierarchyTool) Execute(ctx context.Context, args map[string]interfa
168168
WorkspaceRoot: wctx.Root,
169169
DetectionSource: wctx.DetectionSource,
170170
Telemetry: telemetry.CalculateSavings(baselineBytes, actualBytes),
171-
IndexingStatus: nil,
171+
IndexingStatus: idx,
172172
},
173173
}
174174
return resp.JSON()

internal/service/tools/find_usages.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ func (t *FindUsagesTool) Execute(ctx context.Context, args map[string]interface{
256256
WorkspaceRoot: wctx.Root,
257257
DetectionSource: wctx.DetectionSource,
258258
Telemetry: telemetry.CalculateSavings(baselineBytes, actualBytes),
259-
IndexingStatus: nil,
259+
IndexingStatus: idx,
260260
},
261261
}
262262
return resp.JSON()

internal/service/tools/response.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ type ContextMetadata struct {
2626
Language string `json:"language,omitempty"`
2727
Collection string `json:"collection,omitempty"`
2828
Telemetry *telemetry.Savings `json:"telemetry,omitempty"`
29-
IndexingStatus *indexer.IndexStatus `json:"indexing_progress,omitempty"` // present when indexing is in progress or just completed
29+
IndexingStatus *indexer.IndexStatus `json:"indexing_progress,omitempty"` // JSON tag kept as "indexing_progress" for backward compatibility; present when indexing is in progress or just completed
3030
SessionMetrics *telemetry.AggregatedMetrics `json:"session_metrics,omitempty"` // cumulative search stats from .ragcode/search_metrics.jsonl
3131
}
3232

@@ -60,7 +60,7 @@ func ContextFromWorkspace(wctx *engine.WorkspaceContext) ContextMetadata {
6060
// ContextFromWorkspaceWithStatus builds ContextMetadata and attaches indexing status from disk.
6161
func ContextFromWorkspaceWithStatus(wctx *engine.WorkspaceContext, eng *engine.Engine) ContextMetadata {
6262
ctx := ContextFromWorkspace(wctx)
63-
if eng != nil {
63+
if eng != nil && wctx != nil {
6464
ctx.IndexingStatus = eng.GetIndexStatus(wctx.Root)
6565
}
6666
return ctx

internal/service/tools/smart_search_pipeline.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ func (t *SmartSearchTool) buildResponseMeta(meta searchMetadata) ToolResponse {
205205

206206
var idxStatus *indexer.IndexStatus
207207
if meta.workspaceRoot != "" {
208-
idxStatus = indexer.LoadIndexStatus(meta.workspaceRoot)
208+
idxStatus = indexer.LoadIndexStatus(meta.workspaceRoot)
209209
}
210210

211211
response := ToolResponse{

0 commit comments

Comments
 (0)