fix: address Copilot PR #40 review comments

razvan · razvan · commit 52ae5d2fa0ed · 2026-03-09T08:56:09.000+02:00
- Fix nil panic in ContextFromWorkspaceWithStatus when wctx is nil (#7) - Fix indentation in smart_search_pipeline.go (#1) - Use loaded idx instead of nil in call_hierarchy.go and find_usages.go (#3, #9) - Add backward-compat comment on JSON tag mismatch (#6) - Create fresh IndexStatus when LoadIndexStatus returns nil (#8) - Populate Elapsed field at completed/failed transitions (#2) - Throttle progress I/O writes to every 10 files (#4) - Fix test cleanup for .ragcode dir in TempDir
diff --git a/SUGGESTIONS.md b/SUGGESTIONS.md
@@ -1,71 +1,7 @@
-# Analysis: Feature Proposals & Implementations
+# Suggestions
 
-## 💡 Standout Ideas and Incremental Enhancements
+## Incremental indexing resets status to "starting"
 
-### 1. 🔄 Live Tracking for "Token Savings" & "Cost Avoided"
-**Concept:**
-- Instead of just calculating saved tokens per request ephemerally, maintain a global tracker (`~/.ragcode/savings.json`) that cumulatively stores `total_tokens_saved` across all sessions.
-- Provide a feature that calculates the real-world USD value of the saved tokens based on standard LLM pricing (e.g., Claude 3.5 Sonnet token costs).
-- Send this telemetry back to the AI under the MCP `_meta` response so the user directly sees the financial value RagCode generates (e.g., "RagCode saved you $42 this month").
+Când se re-indexează incremental un singur fișier, `StartIndexingAsync` suprascrie statusul la `state: "starting"` cu totul de la zero, ștergând informația că 99% din index e deja acolo și funcțional. AI-ul vede `"starting"` + `"processed": 0` și crede că nu are date.
 
-### 2. 🔄 O(1) Fetch via Byte Offsets
-**Concept:**
-- While extracting AST symbols, store exact `Byte Offsets` (start and end) in addition to Line Numbers.
-- When `rag_read_file_context` is called, instead of reading the file line-by-line or using regex, perform a strict `seek()` operation to jump straight to the exact byte. This prevents loading massive files strictly into RAM.
-
-### 3. 🔄 Stable Symbol IDs
-**Concept:** 
-- Expose fixed, semantic ID targets for every AST Node, such as `{file_path}::{qualified_name}#{kind}` (e.g., `pkg/parser/php/laravel/adapter.go::Parser.Extract#method`).
-- Instead of searching, an Agent could request the direct structure of a known unique ID.
-
-### 5. 🔄 Active Symbol Summarization (During Indexing)
-**Concept:**
-- If a function or class lacks a Docstring, forward the chunk asynchronously to a cheap LLM (like Gemini Flash or Claude Haiku) *during* the indexing phase.
-- Pre-generate a "One Line Summary" and embed that summary instead of the raw cryptic code. This drastically improves the semantic vector matching quality for poorly documented code.
-
----
-
-## 🤖 AI Agent Validated Implementations (Already Deployed)
-
-Based on rigorous real-world Agent usage, the following core features have been definitively implemented to drastically reduce LLM "decision fatigue".
-
-### 6. ✅ IMPLEMENTED — `rag_search`: Dual Search + Adaptive Response
-**Status:** Deployed in `internal/service/tools/smart_search.go`
-
-**Challenge:** Agents used to guess between `mode: "exact"` and `"discovery"`. Even when they found results, pulling 5 full files instantly maxed out the context window.
-**Solution:**
-1. **Parallel Dual Search**: Executes `SearchCode` (Semantic Vector Qdrant) and `HybridSearchCode` (Exact Path/Substrings) simultaneously across Goroutines.
-2. **Merging & Deduplication**: Vector IDs are matched, and results are tagged by provenance (`_source: "semantic" | "hybrid" | "both"`).
-3. **Adaptive Formatting**:
-   - **Compact Mode**: If >4 results are found, returns only the signatures, paths, and scores (costs ~500 tokens).
-   - **Full Source**: Returns raw source code *only* for highly-confident, tight matches.
-
-### 7. ✅ IMPLEMENTED — Indexing Status & Health Metrics + Lazy Stale Cleanup
-**Status:** Deployed across all `internal/service/tools/` endpoints.
-
-**Challenge:** Agents would search and hallucinate code that had actually been deleted by the user simply because the Qdrant index was stale.
-**Solution:**
-- **Pre-flight Disk Verification**: `rag_search` verifies `os.Stat` before returning matches.
-- **Lazy Stale Cleanup**: Stale results are **filtered out** from the response (they never reach the AI). Additionally, the engine triggers an **async deletion** of all vectors for the stale file from every language collection in the workspace — a self-healing mechanism with a 10-minute dedup cooldown.
-- **Auto-Cleanup Warning**: The response includes a `🧹 N stale file(s) detected and filtered out. Auto-cleanup triggered.` warning giving the AI full observability.
-- **Chronological Awareness**: The response schema appends `index_age` (e.g., `"3 minutes ago"`) and `indexing_progress` strictly to maintain absolute validity.
-
-### 8. 🔄 PROPOSED — Migrate from `langchaingo` to Native Ollama Client
-**Challenge:** Using `langchaingo` masks underlying context cancellations, causing deadlocks.
-**Solution:** Replace it fully with the native `github.com/ollama/ollama/api` which provides direct HTTP keep-alive manipulation, native batch embedding capabilities, and proper Context Propagation timeouts.
-
-### 9. ✅ IMPLEMENTED — Smart Search Consolidation
-**Challenge:** Agents suffered from "tool overwhelm" when attempting code searches.
-**Solution:** Deprecated `rag_search_code` and moved everything explicitly to `rag_search`. Input schemas were simplified to `query` + `include_full_content` boolean overrides.
-
-### 10. ✅ IMPLEMENTED — Markdown Documentation Indexing
-**Challenge:** The engine only understood codebase logic, completely blinding the AI to `README.md` architectural guidelines or implementation plans.
-**Solution:** Integrated advanced hierarchical chunking (`MarkdownHeaderTextSplitter`) that natively indexes Headings, Tables, and Lists while keeping overlapping sliding windows for vectors. When an AI searches via `include_docs: true`, it searches the markdown chunks simultaneously with source code.
-
-### 11. ✅ IMPLEMENTED — Deep WordPress & WooCommerce Native Parsers
-**Challenge:** The baseline PHP Tree-sitter AST parser could not navigate the massive WordPress hook ecosystem.
-**Solution:** Created `pkg/parser/php/wordpress/`, a hyper-specialized sub-package that detects explicit CMS structures:
-- Native extraction of **Hooks** (`add_action`, `add_filter`, `do_action`).
-- Automatic identification of **Custom Post Types**, **Taxonomies**, and **Shortcodes**.
-- **WooCommerce Integration**: Specifically isolates `woocommerce_` hooks and shopping cart overrides.
-- **Oxygen Builder**: AST scanning for `extends OxyEl`, rendering layouts, and `ct_builder_json` dynamic components.
+Fix-ul corect ar fi: la indexare incrementală, nu reseta starea la `"starting"` — folosește ceva gen `"updating"` sau păstrează `"completed"` cu un sub-status. Dar asta e un issue separat, nu din PR review-ul curent.
diff --git a/internal/service/engine/engine.go b/internal/service/engine/engine.go
@@ -809,19 +809,29 @@ func (e *Engine) StartIndexingAsync(root, id string, changedFiles []string, recr
 
 		if err != nil {
 			logger.Instance.Error("[IDX] ws=%s Background indexing failed: %v", filepath.Base(root), err)
-			if s := indexer.LoadIndexStatus(root); s != nil {
-				s.State = "failed"
-				s.Error = err.Error()
-				s.EndedAt = time.Now().UTC().Format(time.RFC3339)
-				indexer.SaveIndexStatus(root, s)
+			s := indexer.LoadIndexStatus(root)
+			if s == nil {
+				s = &indexer.IndexStatus{State: "starting"}
 			}
+			s.State = "failed"
+			s.Error = err.Error()
+			s.EndedAt = time.Now().UTC().Format(time.RFC3339)
+			if started, pErr := time.Parse(time.RFC3339, s.StartedAt); pErr == nil {
+				s.Elapsed = time.Since(started).Round(time.Second).String()
+			}
+			indexer.SaveIndexStatus(root, s)
 		} else {
 			logger.Instance.Info("[IDX] ✅ ws=%s Background indexing completed", filepath.Base(root))
-			if s := indexer.LoadIndexStatus(root); s != nil {
-				s.State = "completed"
-				s.EndedAt = time.Now().UTC().Format(time.RFC3339)
-				indexer.SaveIndexStatus(root, s)
+			s := indexer.LoadIndexStatus(root)
+			if s == nil {
+				s = &indexer.IndexStatus{State: "starting"}
+			}
+			s.State = "completed"
+			s.EndedAt = time.Now().UTC().Format(time.RFC3339)
+			if started, pErr := time.Parse(time.RFC3339, s.StartedAt); pErr == nil {
+				s.Elapsed = time.Since(started).Round(time.Second).String()
 			}
+			indexer.SaveIndexStatus(root, s)
 		}
 	}()
 }
@@ -921,6 +931,10 @@ func (e *Engine) IndexWorkspace(ctx context.Context, path string, recreate bool)
 			ExcludePatterns: excludePatterns,
 			Recreate:        recreate,
 			Progress: func(doneFiles, totalFiles int) {
+				// Throttle disk I/O: write every 10 files or on the last file
+				if doneFiles%10 != 0 && doneFiles != totalFiles {
+					return
+				}
 				if s := indexer.LoadIndexStatus(wctx.Root); s != nil {
 					s.State = "running"
 					if s.Languages == nil {
diff --git a/internal/service/engine/engine_searchcode_test.go b/internal/service/engine/engine_searchcode_test.go
@@ -2,6 +2,8 @@ package engine
 
 import (
 	"context"
+	"os"
+	"path/filepath"
 	"sync/atomic"
 	"testing"
 
@@ -278,6 +280,9 @@ func TestSearchCodeResumeInterruptedIndexing(t *testing.T) {
 	rootDir := t.TempDir()
 	eng.SetResolver(resolver.New(resolver.Dependencies{Detector: &mockDirDetector{root: rootDir}}))
 
+	// Clean up .ragcode dir created by auto-triggered StartIndexingAsync
+	t.Cleanup(func() { os.RemoveAll(filepath.Join(rootDir, ".ragcode")) })
+
 	// Get workspace ID early
 	wctx, _ := eng.DetectContext(context.Background(), "dummy.go")
 	if wctx == nil {
diff --git a/internal/service/tools/call_hierarchy.go b/internal/service/tools/call_hierarchy.go
@@ -168,7 +168,7 @@ func (t *CallHierarchyTool) Execute(ctx context.Context, args map[string]interfa
 			WorkspaceRoot:    wctx.Root,
 			DetectionSource:  wctx.DetectionSource,
 			Telemetry:        telemetry.CalculateSavings(baselineBytes, actualBytes),
-			IndexingStatus: nil,
+			IndexingStatus:   idx,
 		},
 	}
 	return resp.JSON()
diff --git a/internal/service/tools/find_usages.go b/internal/service/tools/find_usages.go
@@ -256,7 +256,7 @@ func (t *FindUsagesTool) Execute(ctx context.Context, args map[string]interface{
 			WorkspaceRoot:    wctx.Root,
 			DetectionSource:  wctx.DetectionSource,
 			Telemetry:        telemetry.CalculateSavings(baselineBytes, actualBytes),
-			IndexingStatus: nil,
+			IndexingStatus:   idx,
 		},
 	}
 	return resp.JSON()
diff --git a/internal/service/tools/response.go b/internal/service/tools/response.go
@@ -26,7 +26,7 @@ type ContextMetadata struct {
 	Language        string                       `json:"language,omitempty"`
 	Collection      string                       `json:"collection,omitempty"`
 	Telemetry       *telemetry.Savings           `json:"telemetry,omitempty"`
-	IndexingStatus  *indexer.IndexStatus         `json:"indexing_progress,omitempty"` // present when indexing is in progress or just completed
+	IndexingStatus  *indexer.IndexStatus         `json:"indexing_progress,omitempty"` // JSON tag kept as "indexing_progress" for backward compatibility; present when indexing is in progress or just completed
 	SessionMetrics  *telemetry.AggregatedMetrics `json:"session_metrics,omitempty"`   // cumulative search stats from .ragcode/search_metrics.jsonl
 }
 
@@ -60,7 +60,7 @@ func ContextFromWorkspace(wctx *engine.WorkspaceContext) ContextMetadata {
 // ContextFromWorkspaceWithStatus builds ContextMetadata and attaches indexing status from disk.
 func ContextFromWorkspaceWithStatus(wctx *engine.WorkspaceContext, eng *engine.Engine) ContextMetadata {
 	ctx := ContextFromWorkspace(wctx)
-	if eng != nil {
+	if eng != nil && wctx != nil {
 		ctx.IndexingStatus = eng.GetIndexStatus(wctx.Root)
 	}
 	return ctx
diff --git a/internal/service/tools/smart_search_pipeline.go b/internal/service/tools/smart_search_pipeline.go
@@ -205,7 +205,7 @@ func (t *SmartSearchTool) buildResponseMeta(meta searchMetadata) ToolResponse {
 
 	var idxStatus *indexer.IndexStatus
 	if meta.workspaceRoot != "" {
-			idxStatus = indexer.LoadIndexStatus(meta.workspaceRoot)
+		idxStatus = indexer.LoadIndexStatus(meta.workspaceRoot)
 	}
 
 	response := ToolResponse{

Original file line number	Diff line number	Diff line change
`@@ -168,7 +168,7 @@ func (t *CallHierarchyTool) Execute(ctx context.Context, args map[string]interfa`
`168`	`168`	`WorkspaceRoot: wctx.Root,`
`169`	`169`	`DetectionSource: wctx.DetectionSource,`
`170`	`170`	`Telemetry: telemetry.CalculateSavings(baselineBytes, actualBytes),`
`171`		`- IndexingStatus: nil,`
	`171`	`+ IndexingStatus: idx,`
`172`	`172`	`},`
`173`	`173`	`}`
`174`	`174`	`return resp.JSON()`
Original file line number	Diff line number	Diff line change
`@@ -256,7 +256,7 @@ func (t *FindUsagesTool) Execute(ctx context.Context, args map[string]interface{`
`256`	`256`	`WorkspaceRoot: wctx.Root,`
`257`	`257`	`DetectionSource: wctx.DetectionSource,`
`258`	`258`	`Telemetry: telemetry.CalculateSavings(baselineBytes, actualBytes),`
`259`		`- IndexingStatus: nil,`
	`259`	`+ IndexingStatus: idx,`
`260`	`260`	`},`
`261`	`261`	`}`
`262`	`262`	`return resp.JSON()`
Original file line number	Diff line number	Diff line change
`@@ -205,7 +205,7 @@ func (t *SmartSearchTool) buildResponseMeta(meta searchMetadata) ToolResponse {`
`205`	`205`
`206`	`206`	`var idxStatus *indexer.IndexStatus`
`207`	`207`	`if meta.workspaceRoot != "" {`
`208`		`- idxStatus = indexer.LoadIndexStatus(meta.workspaceRoot)`
	`208`	`+ idxStatus = indexer.LoadIndexStatus(meta.workspaceRoot)`
`209`	`209`	`}`
`210`	`210`
`211`	`211`	`response := ToolResponse{`