diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md deleted file mode 100644 index b76988c..0000000 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ /dev/null @@ -1,20 +0,0 @@ -## Description - -Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. - -Fixes # (issue) - -## Type of change - -- [ ] Bug fix (non-breaking change which fixes an issue) -- [ ] New feature (non-breaking change which adds functionality) -- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) -- [ ] Documentation update - -## Checklist: - -- [ ] I have performed a self-review of my own code -- [ ] I have formatted my code with `go fmt ./...` -- [ ] I have run tests `go test ./...` and they pass -- [ ] I have verified integration with Ollama/Qdrant (if applicable) -- [ ] I have updated the documentation accordingly diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md deleted file mode 100644 index 19df937..0000000 --- a/.github/copilot-instructions.md +++ /dev/null @@ -1,33 +0,0 @@ -# Copilot Instructions - RagCode MCP - -## ⚖️ The Golden Rule -**For any information about the code (location, structure, logic, or usage), you MUST use RagCode MCP tools. Never guess code details from memory; always search the local index first.** - -## Project Overview -RagCode is a Model Context Protocol (MCP) server that provides semantic code search (RAG) for local codebases using Ollama (embeddings) and Qdrant (vector storage). It supports multiple languages through a pluggable analyzer architecture. - -## Architecture & Patterns -- **Core Components**: - - `Indexer`: Orchestrates analysis, embedding, and storage. - - `PathAnalyzer`: Interface for language-specific AST analysis (Go, PHP/Laravel, Python, HTML). - - `CodeChunk`: The canonical v2 data structure for all indexed code symbols (functions, types, files). - - `Workspace.Manager`: Handles multi-workspace multi-language isolation via language-specific Qdrant collections. -- **Data Flow**: Tools -> `Workspace.Manager.DetectWorkspace` -> Language Detection -> Qdrant Collection (`ragcode-{id}-{lang}`) -> Search Results. -- **Convention**: The project is migrating from `APIChunk` to `CodeChunk`. Always use `CodeChunk` for new features. - -## Developer Workflows -- **Build/Install**: Use `go run ./cmd/install/main.go` to build binaries and configure local IDEs. -- **Runtime Binaries**: Installed to `~/.local/share/ragcode/bin/` by default. -- **Testing**: Use standard `go test ./...`. Use `t.TempDir()` for workspace/filesystem isolation. -- **Logging**: MCP server logs to `mcp.log` next to the executable. Check `MCP_LOG_LEVEL=debug` for issues. - -## MCP Tools Usage -- `rag_search_code`: Use as the primary entry point for exploration. **Crucial**: Always provide the `file_path` parameter as it's used for workspace and language detection. -- `rag_index_workspace`: Triggered automatically on first query per workspace, but can be manually invoked for major changes. - -## Integration Points -- **Ollama**: Requires `phi3:medium` (reasoning) and `mxbai-embed-large` (embeddings) by default. -- **Qdrant**: Runs in Docker as `ragcode-qdrant` on port 6333. - -## Romania/Hungarian Support (Note) -The project identifies as `rag-code-mcp`. Old configurations naming it `do-ai` or `coderag` are deprecated and paths must be updated to the new project structure in `github.com/doITmagic/rag-code-mcp`. diff --git a/.gitignore b/.gitignore index 938611e..98055da 100644 --- a/.gitignore +++ b/.gitignore @@ -54,6 +54,9 @@ Thumbs.db .env .env.local +# Local project config (not for VCS) +.trello.json + # Temporary files tmp/ temp/ diff --git a/BUGS.md b/BUGS.md new file mode 100644 index 0000000..0d5b199 --- /dev/null +++ b/BUGS.md @@ -0,0 +1,312 @@ +# RagCode MCP — Bug Tracker + +This file documents confirmed bugs in the RagCode MCP server, with concrete reproduction examples and expected behavior. + +--- + +## BUG-001: `rag_list_package_exports` falsely returns "No exported symbols found" for indexed Go packages + +**Status:** ✅ Fixed (2026-03-09) +**Date confirmed:** 2026-03-09 +**Affected tool:** `mcp_ragcode_rag_list_package_exports` +**Severity:** Medium — produced incorrect responses that could mislead AI consumers +**Fixed in:** `internal/service/tools/list_package_exports.go` + +### Description + +The `rag_list_package_exports` tool reports that a Go package contains no exported symbols, even though the source files contain public structs, functions, and variables (capitalized identifiers). + +### Steps to reproduce + +**Tool call input:** +```json +{ + "file_path": "/home/razvan/go/src/github.com/doITmagic/rag-code-mcp/pkg/indexer/service.go", + "package": "github.com/doITmagic/rag-code-mcp/pkg/indexer" +} +``` + +**Response received (incorrect):** +```json +{ + "status": "success", + "message": "No exported symbols found in package 'github.com/doITmagic/rag-code-mcp/pkg/indexer'", + "context": { + "workspace_root": "/home/razvan/go/src/github.com/doITmagic/rag-code-mcp", + "detection_source": "file_path", + "indexing_progress": { + "started_at": "2026-03-09T07:33:28Z", + "languages": { + "go": { + "on_disk": 232, + "changed": 0, + "processed": 0 + } + } + } + } +} +``` + +### Actual exported symbols in `pkg/indexer/` (verified with grep) + +Verified using `grep -rn "^(func|type|var|const)\s+[A-Z]"` on the `pkg/indexer/` directory: + +**`service.go`:** +```go +type Options struct { ... } // line 31 +type Service struct { ... } // line 40 +func NewService(embedder llm.Provider, store storage.VectorStore) *Service // line 47 +``` + +**`state.go`:** +```go +type FileState struct { ... } // line 12 +type State struct { ... } // line 21 +func NewState() *State // line 27 +func LoadState(path string) (*State, error) // line 34 +``` + +**`index_status.go`:** +```go +type IndexStatus struct { ... } // line 16 +type LangStatus struct { ... } // line 25 +func SaveIndexStatus(workspaceRoot string, status *IndexStatus) // line 32 +func LoadIndexStatus(workspaceRoot string) *IndexStatus // line 54 +``` + +### Root cause (confirmed) + +Verified by querying the vector database directly — **the data IS indexed**. A `rag_search` for `LangStatus`, `IndexStatus`, `SaveIndexStatus` returns results with scores of 0.86–0.94, sourced from `_source: "both"` (semantic + exact match). The data is in the index. + +The real bug is a **package name mismatch** in `internal/service/tools/list_package_exports.go`: + +```go +// The tool builds an exact-match filter using the full Go import path: +filter := map[string]interface{}{ + "package": packageName, // e.g. "github.com/doITmagic/rag-code-mcp/pkg/indexer" +} +allResults, err := t.engine.ExactSearchPolyglot(ctx, wctx.ID, filter, 1000) +``` + +However, the vector index stores the short package name, not the full import path: +```json +{ "name": "LangStatus", "package": "indexer", ... } +``` + +The filter `"package": "github.com/doITmagic/rag-code-mcp/pkg/indexer"` never matches `"package": "indexer"` → `allResults` is always empty → the tool returns `"No exported symbols found"`. + +### Applied fix + +**File:** `internal/service/tools/list_package_exports.go` + +```diff +- filter := map[string]interface{}{ +- "package": packageName, +- } ++ // The index stores the short package name (e.g. "indexer"), not the full Go ++ // import path (e.g. "github.com/doITmagic/rag-code-mcp/pkg/indexer"). ++ // Normalize by taking the last path segment so both forms work. ++ filterPackage := packageName ++ if idx := strings.LastIndex(packageName, "/"); idx >= 0 { ++ filterPackage = packageName[idx+1:] ++ } ++ filter := map[string]interface{}{ ++ "package": filterPackage, ++ } +``` + +This fix is backward-compatible: if the caller passes only the short name (e.g. `"indexer"`), `strings.LastIndex` returns `-1` and `filterPackage` is unchanged. + +--- + +## BUG-002: `indexing_progress.changed` reports `0` even when files exist on disk + +**Status:** Confirmed (related to BUG-001) +**Date confirmed:** 2026-03-09 +**Affected tools:** All MCP tools that include `indexing_progress` in their response context +**Severity:** Low — incorrect diagnostic information; does not directly affect search results + +### Description + +The `indexing_progress.languages..changed` field may report `0` even though files are present on disk and may have been modified since the last full indexing run. This is because the metric reflects how many files were processed in the **current** indexing session, not how many differ from the last indexed state. + +### Example + +```json +"go": { + "on_disk": 232, // 232 Go files present on disk + "changed": 0, // no changes detected — misleading + "processed": 0 // nothing processed in this session +} +``` + +In reality, the index may be completely stale — all 232 files could be unindexed — yet `changed` and `processed` both report `0` because no indexing session was triggered. + +### Expected behavior + +`changed` should reflect the number of files that differ from the last indexed snapshot (by `mtime` or content hash), not just files processed in the current in-flight session. + +--- + +*Last updated: 2026-03-09 — BUG-001 fixed* + +--- + +## BUG-003: Top-level Go functions with no AST relations are missing from the vector index + +**Status:** ✅ Fixed (2026-03-10, PR #40) +**Date confirmed:** 2026-03-09 +**Affected component:** Go parser / indexer (`pkg/indexer`, `internal/parser`) +**Severity:** Medium — `rag_list_package_exports` and `rag_search` silently omit exported constructor/loader functions + +### Description + +Some exported top-level Go functions are never written to the vector database by the indexer. They exist in source on disk, they are syntactically exported (capitalized name), but searching the vector store for them returns no dedicated entry — they appear only embedded inside the body content of *other* functions that call them. + +### Affected symbols (confirmed via direct vector DB search) + +All from `pkg/indexer/`: + +| Symbol | File | Indexed? | Notes | +|---|---|---|---| +| `SaveIndexStatus` | `index_status.go:32` | ✅ yes | 6 AST relations | +| `LoadIndexStatus` | `index_status.go:54` | ❌ **no** | 0 dedicated index entry | +| `NewService` | `service.go:47` | ❌ **no** | `rag_find_usages` explicitly returned "No usages found" | +| `NewState` | `state.go:27` | ❌ **no** | No dedicated index entry | +| `LoadState` | `state.go:34` | ❌ **no** | No dedicated index entry | + +### Diagnostic evidence + +1. `rag_list_package_exports` for `pkg/indexer` returns 16 symbols — none of the 4 missing functions appear. +2. `rag_find_usages("NewService")` returns: `"No usages found for symbol 'NewService' based on Code Graph relations."` — the symbol has **zero AST relation entries** in Qdrant. +3. `rag_search` for `"func LoadIndexStatus"` only returns entries where `LoadIndexStatus` appears **in the body** of other functions (e.g. `engine.GetIndexStatus`, `engine.StartIndexingAsync`), never as a standalone symbol. +4. `SaveIndexStatus` (same file, same pattern) **is** indexed with 6 relations — confirming the issue is not file-level but symbol-level. + +### Root cause (confirmed via direct Qdrant query) + +**Direct Qdrant scroll on the collection reveals 25 points for package `indexer`.** Full list sorted by name confirms: + +- `LangStatus` → `rel_count: 0`, **IS indexed** ✅ +- `circuitBreakerThreshold` (private const) → `rel_count: 0`, **IS indexed** ✅ +- `deleteCollectionTimeout` (private const) → `rel_count: 0`, **IS indexed** ✅ + +This **disproves** the relation-count-as-threshold hypothesis. Symbols with zero relations *are* indexed — the missing functions are simply absent. + +**The pattern that distinguishes missing vs present functions:** + +| Symbol | Indexed? | Called from outside the package? | +|---|---|---| +| `SaveIndexStatus` | ✅ | Yes — called from `engine.go` (different package) | +| `LoadIndexStatus` | ❌ | Only called from within `pkg/indexer/` itself | +| `NewService` | ❌ | Not tracked (0 AST relations despite being called from `engine.go`) | +| `NewState` | ❌ | Only called from within `pkg/indexer/service.go` | +| `LoadState` | ❌ | Only called from within `pkg/indexer/service.go` | + +**Exact root cause found in `pkg/parser/go/analyzer.go`:** + +The `go/doc` package automatically associates constructor/loader functions with the type they return: +- `NewService() *Service` → placed in `docPkg.Types["Service"].Funcs` by `go/doc` +- `LoadState() *State` → placed in `docPkg.Types["State"].Funcs` by `go/doc` +- `NewState() *State` → placed in `docPkg.Types["State"].Funcs` by `go/doc` +- `LoadIndexStatus() *IndexStatus` → placed in `docPkg.Types["IndexStatus"].Funcs` by `go/doc` + +These functions **never appear** in `docPkg.Funcs` (top-level functions list). + +In `AnalyzePackage` (lines 126–141), the type-processing loop iterates `typ.Methods` but **never `typ.Funcs`**: + +```go +// pkg/parser/go/analyzer.go lines 126-141 +for _, typ := range docPkg.Types { + typeInfo := ca.analyzeTypeDecl(fset, typ, astFuncMap) + typeIdx := len(info.Types) + info.Types = append(info.Types, typeInfo) + + // ✅ Methods are processed + for _, method := range typ.Methods { + methodInfo := ca.analyzeFunctionDecl(fset, method, astFuncMap, typ.Name) + info.Functions = append(info.Functions, methodInfo) + } + // ❌ typ.Funcs (constructors like NewService, LoadState) are NEVER processed! +} +``` + +`SaveIndexStatus` works because it returns `void` (no associated type), so `go/doc` places it in `docPkg.Funcs` — the only list that IS iterated at line 120. + +### Fix (exact, minimal) + +In `AnalyzePackage` in `pkg/parser/go/analyzer.go`, add iteration over `typ.Funcs` inside the type loop: + +```diff + for _, typ := range docPkg.Types { + typeInfo := ca.analyzeTypeDecl(fset, typ, astFuncMap) + typeIdx := len(info.Types) + info.Types = append(info.Types, typeInfo) + + for _, method := range typ.Methods { + methodInfo := ca.analyzeFunctionDecl(fset, method, astFuncMap, typ.Name) + methodInfo.IsMethod = true + methodInfo.Receiver = typ.Name + info.Functions = append(info.Functions, methodInfo) + info.Types[typeIdx].Methods = append(info.Types[typeIdx].Methods, + ca.convertFunctionToMethodInfo(methodInfo, typ.Name)) + } ++ ++ // Process constructor/factory functions associated with this type ++ // (go/doc moves New*, Load*, etc. here from the top-level Funcs list) ++ for _, fn := range typ.Funcs { ++ fnInfo := ca.analyzeFunctionDecl(fset, fn, astFuncMap) ++ info.Functions = append(info.Functions, fnInfo) ++ } + } +``` + +### Note on tree-sitter + +**tree-sitter is NOT needed** for this fix. The existing `go/ast` + `go/doc` approach is correct and more accurate than tree-sitter for Go — it's the standard library, built into the Go toolchain. The only problem is the missing `typ.Funcs` loop, which is a one-line fix. + +--- + +*Last updated: 2026-03-09 — BUG-001 fixed, BUG-003 added* + +--- + +## BUG-004: AST Fallback Search and Indexer do not exclude unconfigured directories like `inspirations/` + +**Status:** Open +**Date confirmed:** 2026-03-09 +**Affected component:** `FallbackDirectSearch` (`internal/service/engine/engine_fallback_search.go`) and `IndexWorkspace` (`pkg/indexer/service.go`) +**Severity:** Medium — causes irrelevant, old, or draft code to pollute semantic and fallback search results. + +### Description + +When performing a search that falls back to the AST (e.g. while `go` files are `processed: 0`), RAGCode can return results from the `inspirations/` directory (or other directories that should logically be ignored). This happens because `filepath.WalkDir` relies entirely on a hardcoded list of `excludePatterns` loaded from `config.Workspace.ExcludePatterns`, alongside a basic check for `.`, `vendor`, and `node_modules`. + +### Example + +Searching for the processing of `state.json` via `rag_search` returned a fallback result pointing to: +`/home/razvan/go/src/github.com/doITmagic/rag-code-mcp/inspirations/rag-code-mcp/internal/workspace/state.go` +instead of the actual code in `pkg/indexer/state.go`. + +### Root Cause +In both `internal/service/engine/engine_fallback_search.go` (lines 88-103) and `pkg/indexer/service.go` (lines 72-88), the exclusion logic is implemented manually: +```go +if d.IsDir() { + name := d.Name() + if strings.HasPrefix(name, ".") || name == "vendor" || name == "node_modules" { + return filepath.SkipDir + } + for _, p := range excludePatterns { + if name == p { + return filepath.SkipDir + } + } + return nil +} +``` +If `inspirations` or other custom draft folders are not explicitly provided in the YAML config `exclude_patterns`, they are scanned by the fallback module and indexer. The system **does not automatically parse `.ragcodeignore` or `.gitignore`**, nor does it have a default ignore list for common draft/backup directories like `inspirations`. + +### Proposed Fix +1. Ensure that `.gitignore` or `.ragcodeignore` files are parsed and respected during the `filepath.WalkDir` traversal. +2. Consider adding `inspirations` and `drafts` strings to the default hardcoded exclusions if they represent common anti-patterns for this specific repo, or automatically bundle `.gitignore` rules into the `excludePatterns` array at startup. + diff --git a/README.md b/README.md index c33f6dd..387618a 100644 --- a/README.md +++ b/README.md @@ -84,11 +84,11 @@ RagCode V2 isn't just a vector database wrapper. It features deep language under ## Supported Languages - **Go**: Complete native AST support -- **PHP**: Vanilla PHP, Laravel, WordPress (Hooks, Widgets, WooCommerce, Oxygen Builder) +- **PHP**: Vanilla PHP, Laravel (Eloquent, Routes, Controllers), WordPress (Hooks, Widgets, WooCommerce, Oxygen Builder) - **JavaScript & TypeScript**: Vanilla JS/TS, Node.js, React, React Native, Next.js, Vue - **Python**: Complete native AST support -- **HTML & Markdown**: Structural documentation mappings -- **Generic Support**: CSS, JSON, YAML, Shell scripts, SQL +- **HTML & CSS**: HTML structural mappings, CSS/SCSS/SASS/LESS via tree-sitter +- **Documentation**: Markdown, JSON, YAML, XML, TOML, reStructuredText --- diff --git a/SUGGESTIONS.md b/SUGGESTIONS.md index 33b8192..eb2ced2 100644 --- a/SUGGESTIONS.md +++ b/SUGGESTIONS.md @@ -1,71 +1,16 @@ -# Analysis: Feature Proposals & Implementations +# Suggestions -## 💡 Standout Ideas and Incremental Enhancements +## Incremental indexing resets status to "starting" — ⚠️ Partially addressed -### 1. 🔄 Live Tracking for "Token Savings" & "Cost Avoided" -**Concept:** -- Instead of just calculating saved tokens per request ephemerally, maintain a global tracker (`~/.ragcode/savings.json`) that cumulatively stores `total_tokens_saved` across all sessions. -- Provide a feature that calculates the real-world USD value of the saved tokens based on standard LLM pricing (e.g., Claude 3.5 Sonnet token costs). -- Send this telemetry back to the AI under the MCP `_meta` response so the user directly sees the financial value RagCode generates (e.g., "RagCode saved you $42 this month"). +When a single file is re-indexed incrementally, `StartIndexingAsync` previously +overwrote the status file with a brand-new object, discarding `Languages` data. -### 2. 🔄 O(1) Fetch via Byte Offsets -**Concept:** -- While extracting AST symbols, store exact `Byte Offsets` (start and end) in addition to Line Numbers. -- When `rag_read_file_context` is called, instead of reading the file line-by-line or using regex, perform a strict `seek()` operation to jump straight to the exact byte. This prevents loading massive files strictly into RAM. +**Current state (PR #40):** +- The `State` field is now hidden from external JSON output (`json:"-"`), so AI + consumers no longer see `"starting"` / `"completed"` strings. +- The `Languages` map is now preserved during incremental re-indexing (not wiped). -### 3. 🔄 Stable Symbol IDs -**Concept:** -- Expose fixed, semantic ID targets for every AST Node, such as `{file_path}::{qualified_name}#{kind}` (e.g., `pkg/parser/php/laravel/adapter.go::Parser.Extract#method`). -- Instead of searching, an Agent could request the direct structure of a known unique ID. - -### 5. 🔄 Active Symbol Summarization (During Indexing) -**Concept:** -- If a function or class lacks a Docstring, forward the chunk asynchronously to a cheap LLM (like Gemini Flash or Claude Haiku) *during* the indexing phase. -- Pre-generate a "One Line Summary" and embed that summary instead of the raw cryptic code. This drastically improves the semantic vector matching quality for poorly documented code. - ---- - -## 🤖 AI Agent Validated Implementations (Already Deployed) - -Based on rigorous real-world Agent usage, the following core features have been definitively implemented to drastically reduce LLM "decision fatigue". - -### 6. ✅ IMPLEMENTED — `rag_search`: Dual Search + Adaptive Response -**Status:** Deployed in `internal/service/tools/smart_search.go` - -**Challenge:** Agents used to guess between `mode: "exact"` and `"discovery"`. Even when they found results, pulling 5 full files instantly maxed out the context window. -**Solution:** -1. **Parallel Dual Search**: Executes `SearchCode` (Semantic Vector Qdrant) and `HybridSearchCode` (Exact Path/Substrings) simultaneously across Goroutines. -2. **Merging & Deduplication**: Vector IDs are matched, and results are tagged by provenance (`_source: "semantic" | "hybrid" | "both"`). -3. **Adaptive Formatting**: - - **Compact Mode**: If >4 results are found, returns only the signatures, paths, and scores (costs ~500 tokens). - - **Full Source**: Returns raw source code *only* for highly-confident, tight matches. - -### 7. ✅ IMPLEMENTED — Indexing Status & Health Metrics + Lazy Stale Cleanup -**Status:** Deployed across all `internal/service/tools/` endpoints. - -**Challenge:** Agents would search and hallucinate code that had actually been deleted by the user simply because the Qdrant index was stale. -**Solution:** -- **Pre-flight Disk Verification**: `rag_search` verifies `os.Stat` before returning matches. -- **Lazy Stale Cleanup**: Stale results are **filtered out** from the response (they never reach the AI). Additionally, the engine triggers an **async deletion** of all vectors for the stale file from every language collection in the workspace — a self-healing mechanism with a 10-minute dedup cooldown. -- **Auto-Cleanup Warning**: The response includes a `🧹 N stale file(s) detected and filtered out. Auto-cleanup triggered.` warning giving the AI full observability. -- **Chronological Awareness**: The response schema appends `index_age` (e.g., `"3 minutes ago"`) and `indexing_progress` strictly to maintain absolute validity. - -### 8. 🔄 PROPOSED — Migrate from `langchaingo` to Native Ollama Client -**Challenge:** Using `langchaingo` masks underlying context cancellations, causing deadlocks. -**Solution:** Replace it fully with the native `github.com/ollama/ollama/api` which provides direct HTTP keep-alive manipulation, native batch embedding capabilities, and proper Context Propagation timeouts. - -### 9. ✅ IMPLEMENTED — Smart Search Consolidation -**Challenge:** Agents suffered from "tool overwhelm" when attempting code searches. -**Solution:** Deprecated `rag_search_code` and moved everything explicitly to `rag_search`. Input schemas were simplified to `query` + `include_full_content` boolean overrides. - -### 10. ✅ IMPLEMENTED — Markdown Documentation Indexing -**Challenge:** The engine only understood codebase logic, completely blinding the AI to `README.md` architectural guidelines or implementation plans. -**Solution:** Integrated advanced hierarchical chunking (`MarkdownHeaderTextSplitter`) that natively indexes Headings, Tables, and Lists while keeping overlapping sliding windows for vectors. When an AI searches via `include_docs: true`, it searches the markdown chunks simultaneously with source code. - -### 11. ✅ IMPLEMENTED — Deep WordPress & WooCommerce Native Parsers -**Challenge:** The baseline PHP Tree-sitter AST parser could not navigate the massive WordPress hook ecosystem. -**Solution:** Created `pkg/parser/php/wordpress/`, a hyper-specialized sub-package that detects explicit CMS structures: -- Native extraction of **Hooks** (`add_action`, `add_filter`, `do_action`). -- Automatic identification of **Custom Post Types**, **Taxonomies**, and **Shortcodes**. -- **WooCommerce Integration**: Specifically isolates `woocommerce_` hooks and shopping cart overrides. -- **Oxygen Builder**: AST scanning for `extends OxyEl`, rendering layouts, and `ct_builder_json` dynamic components. +**Remaining open item:** during incremental re-indexing, `processed` counters reset +to whatever the incremental run reports. The overall `Languages` snapshot from the +last full indexing run is kept, but live progress during the incremental pass may +temporarily show lower counts. diff --git a/TASKS.md b/TASKS.md index 5ae02be..f09c730 100644 --- a/TASKS.md +++ b/TASKS.md @@ -147,3 +147,15 @@ Indexează fișierele `.md` din workspace (README, guides, API docs) în aceeaș - [ ] **[P2]** `.txt` — split pe paragrafe cu `RecursiveCharacterSplitter`. - [ ] **[P2]** `.json` / `.yaml` — flatten keys ca text și indexare ca documentație structurată. - [ ] **[P2]** `.rst` / `.adoc` — convertor la markdown + chunking standard. + +## Task 9: UX / Metrics Simplification & Indexing Priority + +### Goal +- Simplify the indexing progress metrics visible to the AI (MCP output) to prevent confusion and encourage semantic tool usage. +- Prioritize indexing the project's majority language first (e.g., if it's a Go project with 177 files and 10 markdown files, index Go files before Docs). + +### Subtasks +- [ ] **[P0]** Refactor `index_status.json` structure or the MCP response envelope to only expose `total_files` and `indexed_files` per language, dropping task-specific states like `changed` and `processed`. +- [ ] **[P0]** Provide an explicit `"status": "up_to_date"` string when `indexed_files == total_files` to build AI trust. +- [ ] **[P0]** In `internal/service/engine/engine.go` during `IndexWorkspace`, dynamically sort the `languages` slice (e.g. `docs`, `go`, etc.) descending based on their `fileCounts` value before entering the core processing loop. This guarantees the highest coverage language completes first. +- [ ] **[P1]** Ensure `IndexFiles` (incremental logic triggered on single file edits) safely patches `index_status.json` by delta (e.g., adding +1 to total/indexed) without resetting or zeroing out the existing statistics built by the main `IndexWorkspace` routine. diff --git a/cmd/rag-code-install/main.go b/cmd/rag-code-install/main.go index faaa902..0e73136 100644 --- a/cmd/rag-code-install/main.go +++ b/cmd/rag-code-install/main.go @@ -7,6 +7,7 @@ import ( "fmt" "io" "net" + "net/http" "os" "os/exec" "path/filepath" @@ -402,45 +403,47 @@ func stopRunningProcess(binPath string) { log("Stopping existing process gracefully: " + binPath) - // Attempt Graceful Shutdown using PID file - home, err := os.UserHomeDir() - if err == nil { - pidPath := filepath.Join(home, ".ragcode", "daemon.pid") - if data, err := os.ReadFile(pidPath); err == nil { - pidStr := strings.TrimSpace(string(data)) - if pid, err := strconv.Atoi(pidStr); err == nil { - log(fmt.Sprintf("Found daemon PID: %d. Sending termination signal...", pid)) - - // For Windows - if runtime.GOOS == "windows" { - _ = exec.Command("taskkill", "/PID", pidStr).Run() - time.Sleep(2 * time.Second) - _ = exec.Command("taskkill", "/F", "/PID", pidStr).Run() - return - } + // Attempt Graceful Shutdown using TCP health endpoint + client := &http.Client{Timeout: 2 * time.Second} + if resp, err := client.Get("http://127.0.0.1:39000/health"); err == nil { + defer resp.Body.Close() + var health struct { + PID int `json:"pid"` + } + if decodeErr := json.NewDecoder(resp.Body).Decode(&health); decodeErr == nil && health.PID > 0 { + pid := health.PID + pidStr := strconv.Itoa(pid) + log(fmt.Sprintf("Found daemon PID: %d. Sending termination signal...", pid)) + + // For Windows + if runtime.GOOS == "windows" { + _ = exec.Command("taskkill", "/PID", pidStr).Run() + time.Sleep(2 * time.Second) + _ = exec.Command("taskkill", "/F", "/PID", pidStr).Run() + return + } - // For Unix - process, err := os.FindProcess(pid) - if err == nil { - // Send SIGTERM - _ = process.Signal(syscall.SIGTERM) - - // Wait up to 5 seconds for it to exit gracefully - for i := 0; i < 50; i++ { - if err := process.Signal(syscall.Signal(0)); err != nil { - // Process is gone - break - } - time.Sleep(100 * time.Millisecond) + // For Unix + process, err := os.FindProcess(pid) + if err == nil { + // Send SIGTERM + _ = process.Signal(syscall.SIGTERM) + + // Wait up to 5 seconds for it to exit gracefully + for i := 0; i < 50; i++ { + if err := process.Signal(syscall.Signal(0)); err != nil { + // Process is gone + break } + time.Sleep(100 * time.Millisecond) + } - // After grace period, only SIGKILL if process still appears alive - if err := process.Signal(syscall.Signal(0)); err == nil { - _ = process.Signal(syscall.SIGKILL) - time.Sleep(200 * time.Millisecond) - } - return + // After grace period, only SIGKILL if process still appears alive + if err := process.Signal(syscall.Signal(0)); err == nil { + _ = process.Signal(syscall.SIGKILL) + time.Sleep(200 * time.Millisecond) } + return } } } diff --git a/cmd/rag-code-mcp/main.go b/cmd/rag-code-mcp/main.go index 5d47bac..f72b725 100644 --- a/cmd/rag-code-mcp/main.go +++ b/cmd/rag-code-mcp/main.go @@ -6,7 +6,6 @@ import ( "fmt" "log" "os" - "path/filepath" "strconv" "github.com/Masterminds/semver/v3" @@ -17,7 +16,7 @@ import ( ) var ( - Version = "2.1.62" + Version = "2.1.80" Commit = "none" Date = "24.10.2025" ) @@ -30,7 +29,7 @@ func main() { ollamaModel := flag.String("ollama-model", "", "Ollama chat model override") ollamaEmbed := flag.String("ollama-embed", "", "Ollama embedding model override") qdrantURLFlag := flag.String("qdrant-url", "", "Qdrant URL override") - httpPort := flag.Int("http-port", 3000, "Port for optional HTTP server (default 3000, set -1 to disable)") + httpPort := flag.Int("http-port", 39000, "Port for TCP daemon server (default 39000)") versionFlag := flag.Bool("version", false, "Print version and exit") uninstallFlag := flag.Bool("uninstall", false, "Uninstall RagCode MCP from this system") flag.Parse() @@ -50,7 +49,7 @@ func main() { if *daemonFlag { // ═══════════════════════════════════════════════════════════════ // DAEMON MODE — the heavy process: Qdrant, Ollama, Engine, MCP - // Listens on Unix socket + optional HTTP + // Listens exclusively on local TCP port to guarantee singleton // ═══════════════════════════════════════════════════════════════ if err := daemon.Run(daemon.RunConfig{ Version: Version, @@ -67,7 +66,7 @@ func main() { } } else { // ═══════════════════════════════════════════════════════════════ - // ADAPTER MODE (default) — thin Stdio ↔ Unix socket bridge + // ADAPTER MODE (default) — thin Stdio ↔ TCP bridge // Each IDE launches this mode; daemon is started automatically. // ═══════════════════════════════════════════════════════════════ @@ -77,7 +76,7 @@ func main() { if *configPath != "config.yaml" { daemonArgs = append(daemonArgs, "--config", *configPath) } - if *httpPort != 3000 { + if *httpPort != 39000 { daemonArgs = append(daemonArgs, "--http-port", strconv.Itoa(*httpPort)) } if *ollamaURLFlag != "" { @@ -93,57 +92,44 @@ func main() { daemonArgs = append(daemonArgs, "--qdrant-url", *qdrantURLFlag) } - runAdapter(Version, daemonArgs) + runAdapter(Version, *httpPort, daemonArgs) } } // runAdapter is the thin stdio adapter that bridges IDE ↔ daemon. // It ensures the daemon is running, handles version upgrades, and bridges stdin/stdout. // daemonArgs are extra CLI flags forwarded to the daemon process. -func runAdapter(version string, daemonArgs []string) { - homeDir, err := os.UserHomeDir() - if err != nil { - log.Fatalf("Cannot determine home directory: %v", err) - } - ragcodeDir := filepath.Join(homeDir, ".ragcode") - if err := os.MkdirAll(ragcodeDir, 0o700); err != nil { - log.Fatalf("Cannot create ~/.ragcode: %v", err) - } - - pidPath := filepath.Join(ragcodeDir, "daemon.pid") - sockPath := filepath.Join(ragcodeDir, "daemon.sock") - +func runAdapter(version string, httpPort int, daemonArgs []string) { // Check if daemon is already running - running, existingVersion := adapter.IsDaemonRunning(pidPath, sockPath) + running, existingVersion := adapter.IsDaemonRunning(httpPort) // Version upgrade check: if daemon is running an older version, restart it if running && needsUpgrade(existingVersion, version) { logger.Instance.Info("Daemon upgrade needed (%s → %s), restarting...", existingVersion, version) - if err := adapter.StopDaemon(pidPath); err != nil { + if err := adapter.StopDaemon(httpPort); err != nil { logger.Instance.Warn("Failed to stop old daemon: %v", err) } - adapter.CleanupStaleFiles(pidPath, sockPath) running = false } // Start daemon if not running if !running { - logger.Instance.Info("Starting daemon...") + logger.Instance.Info("Starting daemon on port %d...", httpPort) binaryPath, err := os.Executable() if err != nil { log.Fatalf("Cannot determine binary path: %v", err) } - if err := adapter.StartDaemon(binaryPath, sockPath, daemonArgs...); err != nil { + if err := adapter.StartDaemon(binaryPath, httpPort, daemonArgs...); err != nil { log.Fatalf("Failed to start daemon: %v", err) } logger.Instance.Info("Daemon started successfully") } - // Bridge stdin ↔ daemon via Unix socket + // Bridge stdin ↔ daemon via local TCP port workspaceHint, _ := os.Getwd() - logger.Instance.Info("Adapter bridging stdin ↔ daemon (workspace_hint=%s)", workspaceHint) + logger.Instance.Info("Adapter bridging stdin ↔ daemon on port %d (workspace_hint=%s)", httpPort, workspaceHint) - if err := adapter.RunBridge(context.Background(), sockPath, os.Stdin, os.Stdout, workspaceHint); err != nil { + if err := adapter.RunBridge(context.Background(), httpPort, os.Stdin, os.Stdout, workspaceHint); err != nil { logger.Instance.Error("Adapter bridge error: %v", err) os.Exit(1) } diff --git a/docs/architecture/ARCHITECTURE.md b/docs/architecture/ARCHITECTURE.md index 8bbc9e4..4c8d942 100644 --- a/docs/architecture/ARCHITECTURE.md +++ b/docs/architecture/ARCHITECTURE.md @@ -36,9 +36,10 @@ Standard MCP servers fail in monorepos, deeply nested vendor directories, or mul Naive RAG systems chunk code into arbitrary 1000-character blocks, often splitting a function exactly in half. RagCode uses native Abstract Syntax Tree (AST) parsers to generate context-complete `Symbols`. * **Go & Python**: Full native Tree-sitter AST traversal. -* **PHP Ecosystem**: Deep inspection not just for Vanilla PHP, but specifically extracting macros for Laravel (Eloquent, Routes) and WordPress (Hooks, Widgets, WooCommerce, Oxygen Builder). +* **PHP Ecosystem**: Deep inspection not just for Vanilla PHP, but specifically extracting macros for Laravel (Eloquent, Routes, Controllers) and WordPress (Hooks, Widgets, WooCommerce, Oxygen Builder). * **JS / TS**: Native coverage for React, Vue, Next.js, and Node modules. -* **Generic Fallback**: Structural mapping for CSS, HTML, JSON, and YAML. +* **HTML & CSS**: HTML semantic sectioning via goquery; CSS/SCSS/SASS/LESS parsing via tree-sitter. +* **Documentation**: Structural mapping for Markdown, JSON, YAML, XML, and TOML. ### Micro-Features & Engineering Highlights: - **Dynamic Symbol Hydration**: Every chunk contains exact Byte Offsets, Line Boundaries, Type Signatures, and Docstrings, allowing `rag_read_file_context` to perform `O(1)` file seeks instead of regex matching. diff --git a/docs/plans/2026-03-11-parser-separation.md b/docs/plans/2026-03-11-parser-separation.md new file mode 100644 index 0000000..e2457db --- /dev/null +++ b/docs/plans/2026-03-11-parser-separation.md @@ -0,0 +1,666 @@ +# Parser Separation Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Extract CSS, SQL, Shell/Bash, and Svelte file types from the "docs" parser into dedicated parsers, each with its own language name and Qdrant collection. + +**Architecture:** Each new parser package (`css`, `sql`, `shell`, `svelte`) registers itself via `init()`, delegates chunking to the existing `docs.TreeSitterParser`, and returns its own `Language` name. The `docs` analyzer loses 5 extensions (`.css`, `.scss`, `.sql`, `.sh`, `.svelte`). The daemon's `run.go` gains 4 blank imports to trigger registration. + +**Tech Stack:** Go, gotreesitter v0.6.0 (css/scss/sql/bash/svelte grammars already bundled), testify + +--- + +## Task 1: Create `pkg/parser/css/analyzer.go` + +**Files:** +- Create: `pkg/parser/css/analyzer.go` +- Create: `pkg/parser/css/analyzer_test.go` + +**Step 1: Write the failing test** + +Create `pkg/parser/css/analyzer_test.go`: + +```go +package css_test + +import ( + "context" + "os" + "path/filepath" + "testing" + + "github.com/doITmagic/rag-code-mcp/pkg/parser/css" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestAnalyzer_CanHandle(t *testing.T) { + a := css.NewAnalyzer() + assert.True(t, a.CanHandle("style.css")) + assert.True(t, a.CanHandle("main.scss")) + assert.False(t, a.CanHandle("script.js")) + assert.False(t, a.CanHandle("query.sql")) +} + +func TestAnalyzer_Name(t *testing.T) { + assert.Equal(t, "css", css.NewAnalyzer().Name()) +} + +func TestAnalyzer_ParseCSS(t *testing.T) { + a := css.NewAnalyzer() + tmpDir := t.TempDir() + f := filepath.Join(tmpDir, "style.css") + require.NoError(t, os.WriteFile(f, []byte(` +body { background: red; } +.header { font-size: 24px; } +`), 0644)) + + result, err := a.Analyze(context.Background(), f) + require.NoError(t, err) + require.NotNil(t, result) + assert.Equal(t, "css", result.Language) + assert.Greater(t, len(result.Symbols), 0) +} + +func TestAnalyzer_ParseSCSS(t *testing.T) { + a := css.NewAnalyzer() + tmpDir := t.TempDir() + f := filepath.Join(tmpDir, "main.scss") + require.NoError(t, os.WriteFile(f, []byte(` +$primary: #333; +.nav { color: $primary; } +`), 0644)) + + result, err := a.Analyze(context.Background(), f) + require.NoError(t, err) + assert.Equal(t, "css", result.Language) + assert.Greater(t, len(result.Symbols), 0) +} +``` + +**Step 2: Run test to verify it fails** + +```bash +cd /home/razvan/go/src/github.com/doITmagic/rag-code-mcp +go test ./pkg/parser/css/... -v +``` +Expected: `cannot find package` + +**Step 3: Write implementation** + +Create `pkg/parser/css/analyzer.go`: + +```go +package css + +import ( + "context" + "fmt" + "os" + "path/filepath" + "strings" + + "github.com/doITmagic/rag-code-mcp/pkg/parser" + "github.com/doITmagic/rag-code-mcp/pkg/parser/docs" +) + +func init() { + parser.Register(NewAnalyzer()) +} + +// Analyzer handles CSS and SCSS files. +type Analyzer struct { + ts *docs.TreeSitterParser +} + +func NewAnalyzer() *Analyzer { + return &Analyzer{ts: docs.NewTreeSitterParser()} +} + +func (a *Analyzer) Name() string { return "css" } + +func (a *Analyzer) CanHandle(path string) bool { + ext := strings.ToLower(filepath.Ext(path)) + return ext == ".css" || ext == ".scss" +} + +func (a *Analyzer) Analyze(ctx context.Context, path string) (*parser.Result, error) { + content, err := os.ReadFile(path) + if err != nil { + return nil, fmt.Errorf("css: read %s: %w", path, err) + } + if len(strings.TrimSpace(string(content))) == 0 { + return &parser.Result{Language: "css"}, nil + } + ext := strings.ToLower(filepath.Ext(path)) + symbols, err := a.ts.Parse(content, path, ext) + if err != nil { + return nil, fmt.Errorf("css: parse %s: %w", path, err) + } + // Override language on all symbols + for i := range symbols { + symbols[i].Language = "css" + } + return &parser.Result{Symbols: symbols, Language: "css"}, nil +} +``` + +**Step 4: Run test to verify it passes** + +```bash +go test ./pkg/parser/css/... -v +``` +Expected: all tests PASS + +**Step 5: Commit** + +```bash +git add pkg/parser/css/ +git commit -m "feat(parser): add dedicated CSS/SCSS parser" +``` + +--- + +## Task 2: Create `pkg/parser/sql/analyzer.go` + +**Files:** +- Create: `pkg/parser/sql/analyzer.go` +- Create: `pkg/parser/sql/analyzer_test.go` + +**Step 1: Write the failing test** + +Create `pkg/parser/sql/analyzer_test.go`: + +```go +package sql_test + +import ( + "context" + "os" + "path/filepath" + "testing" + + "github.com/doITmagic/rag-code-mcp/pkg/parser/sql" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestAnalyzer_CanHandle(t *testing.T) { + a := sql.NewAnalyzer() + assert.True(t, a.CanHandle("schema.sql")) + assert.True(t, a.CanHandle("MIGRATION.SQL")) + assert.False(t, a.CanHandle("style.css")) + assert.False(t, a.CanHandle("script.sh")) +} + +func TestAnalyzer_Name(t *testing.T) { + assert.Equal(t, "sql", sql.NewAnalyzer().Name()) +} + +func TestAnalyzer_ParseSQL(t *testing.T) { + a := sql.NewAnalyzer() + tmpDir := t.TempDir() + f := filepath.Join(tmpDir, "schema.sql") + require.NoError(t, os.WriteFile(f, []byte(` +CREATE TABLE users ( + id SERIAL PRIMARY KEY, + email TEXT NOT NULL +); +`), 0644)) + + result, err := a.Analyze(context.Background(), f) + require.NoError(t, err) + require.NotNil(t, result) + assert.Equal(t, "sql", result.Language) + assert.Greater(t, len(result.Symbols), 0) +} +``` + +**Step 2: Run test to verify it fails** + +```bash +go test ./pkg/parser/sql/... -v +``` +Expected: `cannot find package` + +**Step 3: Write implementation** + +Create `pkg/parser/sql/analyzer.go`: + +```go +package sql + +import ( + "context" + "fmt" + "os" + "path/filepath" + "strings" + + "github.com/doITmagic/rag-code-mcp/pkg/parser" + "github.com/doITmagic/rag-code-mcp/pkg/parser/docs" +) + +func init() { + parser.Register(NewAnalyzer()) +} + +// Analyzer handles SQL files. +type Analyzer struct { + ts *docs.TreeSitterParser +} + +func NewAnalyzer() *Analyzer { + return &Analyzer{ts: docs.NewTreeSitterParser()} +} + +func (a *Analyzer) Name() string { return "sql" } + +func (a *Analyzer) CanHandle(path string) bool { + return strings.ToLower(filepath.Ext(path)) == ".sql" +} + +func (a *Analyzer) Analyze(ctx context.Context, path string) (*parser.Result, error) { + content, err := os.ReadFile(path) + if err != nil { + return nil, fmt.Errorf("sql: read %s: %w", path, err) + } + if len(strings.TrimSpace(string(content))) == 0 { + return &parser.Result{Language: "sql"}, nil + } + symbols, err := a.ts.Parse(content, path, ".sql") + if err != nil { + return nil, fmt.Errorf("sql: parse %s: %w", path, err) + } + for i := range symbols { + symbols[i].Language = "sql" + } + return &parser.Result{Symbols: symbols, Language: "sql"}, nil +} +``` + +**Step 4: Run test to verify it passes** + +```bash +go test ./pkg/parser/sql/... -v +``` +Expected: all tests PASS + +**Step 5: Commit** + +```bash +git add pkg/parser/sql/ +git commit -m "feat(parser): add dedicated SQL parser" +``` + +--- + +## Task 3: Create `pkg/parser/shell/analyzer.go` + +**Files:** +- Create: `pkg/parser/shell/analyzer.go` +- Create: `pkg/parser/shell/analyzer_test.go` + +**Step 1: Write the failing test** + +Create `pkg/parser/shell/analyzer_test.go`: + +```go +package shell_test + +import ( + "context" + "os" + "path/filepath" + "testing" + + "github.com/doITmagic/rag-code-mcp/pkg/parser/shell" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestAnalyzer_CanHandle(t *testing.T) { + a := shell.NewAnalyzer() + assert.True(t, a.CanHandle("deploy.sh")) + assert.True(t, a.CanHandle("run.bash")) + assert.False(t, a.CanHandle("style.css")) + assert.False(t, a.CanHandle("main.go")) +} + +func TestAnalyzer_Name(t *testing.T) { + assert.Equal(t, "shell", shell.NewAnalyzer().Name()) +} + +func TestAnalyzer_ParseShell(t *testing.T) { + a := shell.NewAnalyzer() + tmpDir := t.TempDir() + f := filepath.Join(tmpDir, "deploy.sh") + require.NoError(t, os.WriteFile(f, []byte(` +#!/bin/bash +function deploy() { + echo "Deploying..." +} +deploy +`), 0644)) + + result, err := a.Analyze(context.Background(), f) + require.NoError(t, err) + require.NotNil(t, result) + assert.Equal(t, "shell", result.Language) + assert.Greater(t, len(result.Symbols), 0) +} +``` + +**Step 2: Run test to verify it fails** + +```bash +go test ./pkg/parser/shell/... -v +``` +Expected: `cannot find package` + +**Step 3: Write implementation** + +Create `pkg/parser/shell/analyzer.go`: + +```go +package shell + +import ( + "context" + "fmt" + "os" + "path/filepath" + "strings" + + "github.com/doITmagic/rag-code-mcp/pkg/parser" + "github.com/doITmagic/rag-code-mcp/pkg/parser/docs" +) + +func init() { + parser.Register(NewAnalyzer()) +} + +// Analyzer handles Shell/Bash script files. +type Analyzer struct { + ts *docs.TreeSitterParser +} + +func NewAnalyzer() *Analyzer { + return &Analyzer{ts: docs.NewTreeSitterParser()} +} + +func (a *Analyzer) Name() string { return "shell" } + +func (a *Analyzer) CanHandle(path string) bool { + ext := strings.ToLower(filepath.Ext(path)) + return ext == ".sh" || ext == ".bash" +} + +func (a *Analyzer) Analyze(ctx context.Context, path string) (*parser.Result, error) { + content, err := os.ReadFile(path) + if err != nil { + return nil, fmt.Errorf("shell: read %s: %w", path, err) + } + if len(strings.TrimSpace(string(content))) == 0 { + return &parser.Result{Language: "shell"}, nil + } + symbols, err := a.ts.Parse(content, path, ".sh") + if err != nil { + return nil, fmt.Errorf("shell: parse %s: %w", path, err) + } + for i := range symbols { + symbols[i].Language = "shell" + } + return &parser.Result{Symbols: symbols, Language: "shell"}, nil +} +``` + +**Step 4: Run test to verify it passes** + +```bash +go test ./pkg/parser/shell/... -v +``` +Expected: all tests PASS + +**Step 5: Commit** + +```bash +git add pkg/parser/shell/ +git commit -m "feat(parser): add dedicated Shell/Bash parser" +``` + +--- + +## Task 4: Create `pkg/parser/svelte/analyzer.go` + +**Files:** +- Create: `pkg/parser/svelte/analyzer.go` +- Create: `pkg/parser/svelte/analyzer_test.go` + +**Step 1: Write the failing test** + +Create `pkg/parser/svelte/analyzer_test.go`: + +```go +package svelte_test + +import ( + "context" + "os" + "path/filepath" + "testing" + + "github.com/doITmagic/rag-code-mcp/pkg/parser/svelte" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestAnalyzer_CanHandle(t *testing.T) { + a := svelte.NewAnalyzer() + assert.True(t, a.CanHandle("App.svelte")) + assert.True(t, a.CanHandle("Button.SVELTE")) + assert.False(t, a.CanHandle("App.vue")) + assert.False(t, a.CanHandle("main.js")) +} + +func TestAnalyzer_Name(t *testing.T) { + assert.Equal(t, "svelte", svelte.NewAnalyzer().Name()) +} + +func TestAnalyzer_ParseSvelte(t *testing.T) { + a := svelte.NewAnalyzer() + tmpDir := t.TempDir() + f := filepath.Join(tmpDir, "App.svelte") + require.NoError(t, os.WriteFile(f, []byte(` + + +`), 0644)) + + result, err := a.Analyze(context.Background(), f) + require.NoError(t, err) + require.NotNil(t, result) + assert.Equal(t, "svelte", result.Language) + assert.Greater(t, len(result.Symbols), 0) +} +``` + +**Step 2: Run test to verify it fails** + +```bash +go test ./pkg/parser/svelte/... -v +``` +Expected: `cannot find package` + +**Step 3: Write implementation** + +Create `pkg/parser/svelte/analyzer.go`: + +```go +package svelte + +import ( + "context" + "fmt" + "os" + "path/filepath" + "strings" + + "github.com/doITmagic/rag-code-mcp/pkg/parser" + "github.com/doITmagic/rag-code-mcp/pkg/parser/docs" +) + +func init() { + parser.Register(NewAnalyzer()) +} + +// Analyzer handles Svelte Single File Components. +type Analyzer struct { + ts *docs.TreeSitterParser +} + +func NewAnalyzer() *Analyzer { + return &Analyzer{ts: docs.NewTreeSitterParser()} +} + +func (a *Analyzer) Name() string { return "svelte" } + +func (a *Analyzer) CanHandle(path string) bool { + return strings.ToLower(filepath.Ext(path)) == ".svelte" +} + +func (a *Analyzer) Analyze(ctx context.Context, path string) (*parser.Result, error) { + content, err := os.ReadFile(path) + if err != nil { + return nil, fmt.Errorf("svelte: read %s: %w", path, err) + } + if len(strings.TrimSpace(string(content))) == 0 { + return &parser.Result{Language: "svelte"}, nil + } + symbols, err := a.ts.Parse(content, path, ".svelte") + if err != nil { + return nil, fmt.Errorf("svelte: parse %s: %w", path, err) + } + for i := range symbols { + symbols[i].Language = "svelte" + } + return &parser.Result{Symbols: symbols, Language: "svelte"}, nil +} +``` + +**Step 4: Run test to verify it passes** + +```bash +go test ./pkg/parser/svelte/... -v +``` +Expected: all tests PASS + +**Step 5: Commit** + +```bash +git add pkg/parser/svelte/ +git commit -m "feat(parser): add dedicated Svelte parser" +``` + +--- + +## Task 5: Update `docs` analyzer — remove extracted extensions + +**Files:** +- Modify: `pkg/parser/docs/analyzer.go` lines 33-45 and 57-64 +- Modify: `pkg/parser/docs/analyzer_test.go` line 16-17 + +**Step 1: Update `CanHandle` in `pkg/parser/docs/analyzer.go`** + +Change line 40 from: +```go +case ".yaml", ".yml", ".json", ".xml", ".toml", ".rst", ".css", ".scss", ".sql", ".sh", ".svelte": +``` +To: +```go +case ".yaml", ".yml", ".json", ".xml", ".toml", ".rst": +``` + +**Step 2: Update `Analyze` comment in `pkg/parser/docs/analyzer.go`** + +Change the comment on line 63 from: +```go +// Try treesitter for yaml, json, xml, toml, rst +``` +To: +```go +// Try treesitter for yaml, json, xml, toml, rst (css/sql/shell/svelte have dedicated parsers) +``` + +**Step 3: Fix the test in `pkg/parser/docs/analyzer_test.go`** + +Change `validExts` on line 16 to remove `style.css`, `main.scss`, `query.sql`, `script.sh`: +```go +validExts := []string{"test.md", "README.markdown", "config.yaml", "data.json", "conf.toml", "index.xml", "doc.rst"} +``` + +And remove the `TestAnalyzer_TreesitterParsing_CSS` test entirely (lines 129-150) since CSS is no longer handled by docs. + +**Step 4: Run all docs tests** + +```bash +go test ./pkg/parser/docs/... -v +``` +Expected: all PASS + +**Step 5: Commit** + +```bash +git add pkg/parser/docs/ +git commit -m "feat(parser): remove css/scss/sql/sh/svelte from docs parser" +``` + +--- + +## Task 6: Register new parsers in daemon + +**Files:** +- Modify: `internal/daemon/run.go` (blank import block) + +**Step 1: Add 4 blank imports to `run.go`** + +Locate the existing blank import block in `internal/daemon/run.go` and add: +```go +_ "github.com/doITmagic/rag-code-mcp/pkg/parser/css" +_ "github.com/doITmagic/rag-code-mcp/pkg/parser/sql" +_ "github.com/doITmagic/rag-code-mcp/pkg/parser/shell" +_ "github.com/doITmagic/rag-code-mcp/pkg/parser/svelte" +``` + +**Step 2: Verify build** + +```bash +go build ./... +``` +Expected: no errors + +**Step 3: Run full test suite** + +```bash +go test ./pkg/parser/... -v -count=1 +``` +Expected: all PASS including new parsers + +**Step 4: Commit** + +```bash +git add internal/daemon/run.go +git commit -m "feat(daemon): register css/sql/shell/svelte parsers" +``` + +--- + +## Task 7: Move Trello cards to Done + +After all tests pass: +1. Move card #50 (CSS/SCSS) → Done +2. Move card #51 (SQL) → Done +3. Move card #52 (Shell/Bash) → Done +4. Move card #53 (Svelte) → Done diff --git a/heap b/heap new file mode 100644 index 0000000..834a5f3 --- /dev/null +++ b/heap @@ -0,0 +1 @@ +404 page not found diff --git a/internal/adapter/adapter.go b/internal/adapter/adapter.go index fed9692..477aff3 100644 --- a/internal/adapter/adapter.go +++ b/internal/adapter/adapter.go @@ -7,14 +7,13 @@ import ( "encoding/json" "fmt" "io" - "net" "net/http" "strings" "time" ) // RunBridge reads JSON-RPC messages from stdin, forwards each as an HTTP POST -// to the daemon via Unix socket, and writes the JSON response to stdout. +// to the daemon via local TCP port, and writes the JSON response to stdout. // // The adapter reads/writes single JSON payloads (no SSE framing). // Accept header includes text/event-stream for StreamableHTTPHandler compatibility, @@ -27,14 +26,9 @@ import ( // the IDE is intentionally ignored — only the daemon-resolved root is trusted. // // Returns nil on stdin EOF (normal IDE shutdown). -func RunBridge(ctx context.Context, socketPath string, stdin io.Reader, stdout io.Writer, workspaceHint string) error { +func RunBridge(ctx context.Context, port int, stdin io.Reader, stdout io.Writer, workspaceHint string) error { client := &http.Client{ Timeout: 5 * time.Minute, // prevent indefinite hangs on stalled daemon - Transport: &http.Transport{ - DialContext: func(_ context.Context, _, _ string) (net.Conn, error) { - return net.DialTimeout("unix", socketPath, 10*time.Second) - }, - }, } scanner := bufio.NewScanner(stdin) @@ -53,7 +47,7 @@ func RunBridge(ctx context.Context, socketPath string, stdin io.Reader, stdout i } req, err := http.NewRequestWithContext(ctx, http.MethodPost, - "http://daemon/mcp", bytes.NewReader([]byte(line))) + fmt.Sprintf("http://127.0.0.1:%d/mcp", port), bytes.NewReader([]byte(line))) if err != nil { writeJSONRPCError(stdout, line, fmt.Errorf("create request: %w", err)) continue diff --git a/internal/adapter/adapter_sticky_test.go b/internal/adapter/adapter_sticky_test.go deleted file mode 100644 index 131d6aa..0000000 --- a/internal/adapter/adapter_sticky_test.go +++ /dev/null @@ -1,97 +0,0 @@ -package adapter - -import ( - "bytes" - "context" - "encoding/json" - "io" - "net/http" - "strings" - "sync" - "testing" - - "github.com/stretchr/testify/assert" - "github.com/stretchr/testify/require" -) - -// TestBridge_StickyWorkspace verifies that after the daemon responds with -// X-Resolved-Workspace header, the adapter sends X-Workspace-Root on -// all subsequent requests. -func TestBridge_StickyWorkspace(t *testing.T) { - var mu sync.Mutex - var receivedRoots []string - callNum := 0 - - sockPath := startFakeDaemon(t, func(w http.ResponseWriter, r *http.Request) { - mu.Lock() - callNum++ - n := callNum - receivedRoots = append(receivedRoots, r.Header.Get("X-Workspace-Root")) - mu.Unlock() - - // First request: daemon resolves workspace and sends back X-Resolved-Workspace - if n == 1 { - w.Header().Set("X-Resolved-Workspace", "/home/user/project") - } - w.Header().Set("Content-Type", "application/json") - _ = json.NewEncoder(w).Encode(map[string]any{"jsonrpc": "2.0", "id": n, "result": "ok"}) - }) - - // Send 3 requests — workspaceHint is ignored - input := `{"jsonrpc":"2.0","id":1,"method":"a"}` + "\n" + - `{"jsonrpc":"2.0","id":2,"method":"b"}` + "\n" + - `{"jsonrpc":"2.0","id":3,"method":"c"}` + "\n" - stdout := &bytes.Buffer{} - - err := RunBridge(context.Background(), sockPath, strings.NewReader(input), stdout, "/home/user") - require.NoError(t, err) - - mu.Lock() - defer mu.Unlock() - require.Len(t, receivedRoots, 3, "should have received 3 requests") - - // Request 1: adapter doesn't know workspace yet — no X-Workspace-Root - assert.Empty(t, receivedRoots[0], "first request should NOT have Root (not learned yet)") - - // Request 2+3: adapter learned from header → sends X-Workspace-Root - assert.Equal(t, "/home/user/project", receivedRoots[1], "second request should use Root from header") - assert.Equal(t, "/home/user/project", receivedRoots[2], "third request should use Root from header") -} - -// TestBridge_StickyWorkspace_NoHeader verifies that when daemon never sends -// X-Resolved-Workspace, X-Workspace-Root is never sent. -func TestBridge_StickyWorkspace_NoHeader(t *testing.T) { - var mu sync.Mutex - var receivedRoots []string - - sockPath := startFakeDaemon(t, func(w http.ResponseWriter, r *http.Request) { - mu.Lock() - receivedRoots = append(receivedRoots, r.Header.Get("X-Workspace-Root")) - mu.Unlock() - _ = json.NewEncoder(w).Encode(map[string]any{"jsonrpc": "2.0", "id": 1, "result": nil}) - }) - - input := `{"jsonrpc":"2.0","id":1,"method":"a"}` + "\n" + - `{"jsonrpc":"2.0","id":2,"method":"b"}` + "\n" - err := RunBridge(context.Background(), sockPath, strings.NewReader(input), io.Discard, "/my/workspace") - require.NoError(t, err) - - mu.Lock() - defer mu.Unlock() - require.Len(t, receivedRoots, 2) - assert.Empty(t, receivedRoots[0], "no Root when daemon sends no header") - assert.Empty(t, receivedRoots[1], "no Root when daemon sends no header") -} - -// TestBridge_StickyWorkspace_IDEHintIgnored verifies that the IDE's -// workspaceHint is never forwarded as X-Workspace-Hint. -func TestBridge_StickyWorkspace_IDEHintIgnored(t *testing.T) { - sockPath := startFakeDaemon(t, func(w http.ResponseWriter, r *http.Request) { - assert.Empty(t, r.Header.Get("X-Workspace-Hint"), "IDE hint should never be forwarded") - _ = json.NewEncoder(w).Encode(map[string]any{"jsonrpc": "2.0", "id": 1, "result": nil}) - }) - - input := `{"jsonrpc":"2.0","id":1,"method":"ping"}` + "\n" - err := RunBridge(context.Background(), sockPath, strings.NewReader(input), io.Discard, "/home/user") - require.NoError(t, err) -} diff --git a/internal/adapter/adapter_test.go b/internal/adapter/adapter_test.go index 5368fbe..a246d2c 100644 --- a/internal/adapter/adapter_test.go +++ b/internal/adapter/adapter_test.go @@ -1,121 +1,60 @@ -package adapter +package adapter_test import ( "bytes" "context" - "encoding/json" "io" - "net" "net/http" + "net/http/httptest" + "strconv" "strings" "testing" + "time" + "github.com/doITmagic/rag-code-mcp/internal/adapter" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) -func TestBridge_ForwardsRequestAndResponse(t *testing.T) { - sockPath := startFakeDaemon(t, func(w http.ResponseWriter, r *http.Request) { - body, _ := io.ReadAll(r.Body) - var req map[string]any - _ = json.Unmarshal(body, &req) +func TestRunBridge(t *testing.T) { + // Create an HTTP test server to mock the daemon + hitCount := 0 + server := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + assert.Equal(t, "/mcp", r.URL.Path) + assert.Equal(t, http.MethodPost, r.Method) + body, err := io.ReadAll(r.Body) + require.NoError(t, err) - resp := map[string]any{ - "jsonrpc": "2.0", - "id": req["id"], - "result": map[string]any{"tools": []string{"rag_search"}}, - } + hitCount++ w.Header().Set("Content-Type", "application/json") - _ = json.NewEncoder(w).Encode(resp) - }) + // Echo it to prove it hit the server + _, _ = w.Write([]byte(`{"jsonrpc":"2.0", "result": ` + string(body) + `}`)) + })) + server.Start() + defer server.Close() - input := `{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}` + "\n" - stdout := &bytes.Buffer{} + // Extract port from httptest server listener + parts := strings.Split(server.Listener.Addr().String(), ":") + portStr := parts[len(parts)-1] - err := RunBridge(context.Background(), sockPath, strings.NewReader(input), stdout, "") + port, err := strconv.Atoi(portStr) require.NoError(t, err) - var resp map[string]any - err = json.Unmarshal(stdout.Bytes(), &resp) - require.NoError(t, err) - assert.Equal(t, float64(1), resp["id"]) - assert.NotNil(t, resp["result"]) -} - -func TestBridge_IDEHintNotForwarded(t *testing.T) { - sockPath := startFakeDaemon(t, func(w http.ResponseWriter, r *http.Request) { - // IDE hint should never be forwarded as a header - assert.Empty(t, r.Header.Get("X-Workspace-Hint"), "IDE hint must not be forwarded") - _ = json.NewEncoder(w).Encode(map[string]any{"jsonrpc": "2.0", "id": 1, "result": nil}) - }) - - input := `{"jsonrpc":"2.0","id":1,"method":"ping"}` + "\n" - err := RunBridge(context.Background(), sockPath, strings.NewReader(input), io.Discard, "/home/user/project") - require.NoError(t, err) -} - -func TestBridge_SkipsEmptyLines(t *testing.T) { - callCount := 0 - sockPath := startFakeDaemon(t, func(w http.ResponseWriter, r *http.Request) { - callCount++ - _ = json.NewEncoder(w).Encode(map[string]any{"jsonrpc": "2.0", "id": 1, "result": nil}) - }) - - input := "\n\n" + `{"jsonrpc":"2.0","id":1,"method":"ping"}` + "\n\n \n" - err := RunBridge(context.Background(), sockPath, strings.NewReader(input), io.Discard, "") - require.NoError(t, err) - - assert.Equal(t, 1, callCount, "only one real JSON line should be forwarded") -} - -func TestBridge_MultipleRequests(t *testing.T) { - sockPath := startFakeDaemon(t, func(w http.ResponseWriter, r *http.Request) { - body, _ := io.ReadAll(r.Body) - var req map[string]any - _ = json.Unmarshal(body, &req) - _ = json.NewEncoder(w).Encode(map[string]any{"jsonrpc": "2.0", "id": req["id"], "result": "ok"}) - }) + // Setup stdin with a couple of JSON-RPC requests separated by newline + stdinData := `{"jsonrpc":"2.0","id":1,"method":"foo"}` + "\n" + `{"jsonrpc":"2.0","id":2,"method":"bar"}` + "\n" + stdin := strings.NewReader(stdinData) + var stdout bytes.Buffer - input := `{"jsonrpc":"2.0","id":1,"method":"a"}` + "\n" + - `{"jsonrpc":"2.0","id":2,"method":"b"}` + "\n" + - `{"jsonrpc":"2.0","id":3,"method":"c"}` + "\n" - stdout := &bytes.Buffer{} + // Run bridge + ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second) + defer cancel() - err := RunBridge(context.Background(), sockPath, strings.NewReader(input), stdout, "") + err = adapter.RunBridge(ctx, port, stdin, &stdout, "/test/workspace") require.NoError(t, err) - lines := strings.Split(strings.TrimSpace(stdout.String()), "\n") - assert.Len(t, lines, 3, "should get 3 responses for 3 requests") -} - -func TestBridge_DaemonUnreachable(t *testing.T) { - input := `{"jsonrpc":"2.0","id":1,"method":"test"}` + "\n" - stdout := &bytes.Buffer{} - - // Use a non-existent socket path - err := RunBridge(context.Background(), "/tmp/nonexistent.sock", strings.NewReader(input), stdout, "") - require.NoError(t, err) // bridge itself should not error — it writes JSON-RPC error to stdout - - var resp map[string]any - err = json.Unmarshal(stdout.Bytes(), &resp) - require.NoError(t, err) - assert.NotNil(t, resp["error"], "should receive a JSON-RPC error") - assert.Equal(t, float64(1), resp["id"], "error should preserve the request id") -} - -// --- Helper --- - -func startFakeDaemon(t *testing.T, handler http.HandlerFunc) string { - t.Helper() - dir := t.TempDir() - sockPath := dir + "/test.sock" - - listener, err := net.Listen("unix", sockPath) - require.NoError(t, err) - - srv := &http.Server{Handler: handler} - go func() { _ = srv.Serve(listener) }() - t.Cleanup(func() { srv.Close() }) - - return sockPath + // Validate results + assert.Equal(t, 2, hitCount) + outContent := stdout.String() + assert.Contains(t, outContent, `{"jsonrpc":"2.0","id":1,"method":"foo"}`) + assert.Contains(t, outContent, `{"jsonrpc":"2.0","id":2,"method":"bar"}`) } diff --git a/internal/adapter/lifecycle.go b/internal/adapter/lifecycle.go index 33fb6bd..38ba4ca 100644 --- a/internal/adapter/lifecycle.go +++ b/internal/adapter/lifecycle.go @@ -1,13 +1,11 @@ package adapter import ( - "context" + "encoding/json" "fmt" - "net" "net/http" "os" "os/exec" - "path/filepath" "runtime" "syscall" "time" @@ -15,75 +13,36 @@ import ( "github.com/doITmagic/rag-code-mcp/internal/daemon" ) -// IsDaemonRunning checks if a healthy daemon is reachable. +// IsDaemonRunning checks if a healthy daemon is reachable on the given TCP port. // Returns (true, version) if the daemon is alive and responding. -// Cleans up stale PID/socket files if daemon is dead. -func IsDaemonRunning(pidPath, socketPath string) (bool, string) { - info, err := daemon.ReadPID(pidPath) +func IsDaemonRunning(port int) (bool, string) { + client := &http.Client{Timeout: 2 * time.Second} + resp, err := client.Get(fmt.Sprintf("http://127.0.0.1:%d/health", port)) if err != nil { return false, "" } + defer resp.Body.Close() - if !daemon.IsProcessAlive(info.PID) { - CleanupStaleFiles(pidPath, socketPath) + if resp.StatusCode != http.StatusOK { return false, "" } - // Try connecting to socket - conn, err := net.DialTimeout("unix", socketPath, 2*time.Second) - if err != nil { - CleanupStaleFiles(pidPath, socketPath) + var health daemon.HealthResponse + if err := json.NewDecoder(resp.Body).Decode(&health); err != nil { return false, "" } - conn.Close() - return true, info.Version + return true, health.Version } -// CleanupStaleFiles removes leftover PID and socket files from a dead daemon. +// CleanupStaleFiles is kept for compatibility but does nothing since we use TCP ports. func CleanupStaleFiles(pidPath, socketPath string) { - os.Remove(pidPath) - os.Remove(socketPath) } // StartDaemon launches the daemon as a detached background process. -// Uses a lock file to prevent multiple concurrent adapters from racing -// to start multiple daemons simultaneously. // extraArgs are additional CLI flags (e.g. --config, --http-port) forwarded to the daemon. -// Waits up to 10 seconds for the daemon to become healthy on the socket. -func StartDaemon(binaryPath, socketPath string, extraArgs ...string) error { - // Acquire lock to prevent concurrent duplicate starts. - // Uses O_EXCL for atomicity, with stale-lock recovery: if the lock file - // exists but is older than 30s, it's treated as stale (crashed adapter) - // and removed before retrying. - lockPath := filepath.Join(filepath.Dir(socketPath), "daemon.lock") - lockFile, err := os.OpenFile(lockPath, os.O_CREATE|os.O_EXCL|os.O_WRONLY, 0o600) - if err != nil { - if os.IsExist(err) { - // Check if the lock is stale (older than 30s = crashed adapter) - if info, statErr := os.Stat(lockPath); statErr == nil { - if time.Since(info.ModTime()) > 30*time.Second { - os.Remove(lockPath) - // Retry lock acquisition after stale cleanup - lockFile, err = os.OpenFile(lockPath, os.O_CREATE|os.O_EXCL|os.O_WRONLY, 0o600) - if err != nil { - return fmt.Errorf("failed to acquire daemon lock after stale cleanup: %w", err) - } - // Fall through to normal startup below - goto acquired - } - } - // Lock is fresh — another adapter is starting the daemon, wait for it - return waitForDaemon(socketPath) - } - return fmt.Errorf("failed to acquire daemon lock: %w", err) - } -acquired: - defer func() { - lockFile.Close() - os.Remove(lockPath) - }() - +// Waits up to 10 seconds for the daemon to become healthy on the port. +func StartDaemon(binaryPath string, port int, extraArgs ...string) error { args := append([]string{"--daemon"}, extraArgs...) cmd := exec.Command(binaryPath, args...) cmd.Stdin = nil @@ -101,22 +60,16 @@ acquired: // Don't wait for the daemon process — it runs independently go func() { _ = cmd.Wait() }() - return waitForDaemon(socketPath) + return waitForDaemon(port) } // waitForDaemon polls the daemon health endpoint until ready (max 10s). -func waitForDaemon(socketPath string) error { - client := &http.Client{ - Timeout: 2 * time.Second, - Transport: &http.Transport{ - DialContext: func(_ context.Context, _, _ string) (net.Conn, error) { - return net.Dial("unix", socketPath) - }, - }, - } +func waitForDaemon(port int) error { + url := fmt.Sprintf("http://127.0.0.1:%d/health", port) + client := &http.Client{Timeout: 2 * time.Second} for i := 0; i < 20; i++ { - resp, err := client.Get("http://daemon/health") + resp, err := client.Get(url) if err == nil { resp.Body.Close() if resp.StatusCode == http.StatusOK { @@ -126,34 +79,39 @@ func waitForDaemon(socketPath string) error { time.Sleep(500 * time.Millisecond) } - return fmt.Errorf("daemon did not become ready within 10s (socket: %s)", socketPath) + return fmt.Errorf("daemon did not become ready within 10s (port: %d)", port) } -// StopDaemon sends a termination signal to the daemon process identified by the PID file. -// On Unix, sends SIGTERM first, then SIGKILL after 5s. On Windows, uses Kill(). -func StopDaemon(pidPath string) error { - info, err := daemon.ReadPID(pidPath) +// StopDaemon sends a termination signal to the daemon process via its TCP health endpoint PID. +func StopDaemon(port int) error { + client := &http.Client{Timeout: 2 * time.Second} + resp, err := client.Get(fmt.Sprintf("http://127.0.0.1:%d/health", port)) if err != nil { - return fmt.Errorf("read PID file: %w", err) + return nil // already dead or unreachable + } + defer resp.Body.Close() + + var health daemon.HealthResponse + if err := json.NewDecoder(resp.Body).Decode(&health); err != nil { + return fmt.Errorf("failed to decode daemon health: %w", err) } - process, err := os.FindProcess(info.PID) + process, err := os.FindProcess(health.PID) if err != nil { - return fmt.Errorf("find process %d: %w", info.PID, err) + return fmt.Errorf("find process %d: %w", health.PID, err) } - // Send graceful stop signal (SIGTERM on Unix, Kill on Windows) if runtime.GOOS == "windows" { return process.Kill() } if err := process.Signal(syscall.SIGTERM); err != nil { - return fmt.Errorf("send SIGTERM to %d: %w", info.PID, err) + return fmt.Errorf("send SIGTERM to %d: %w", health.PID, err) } // Wait for process to die (max 5s) for i := 0; i < 10; i++ { - if !daemon.IsProcessAlive(info.PID) { + if err := process.Signal(syscall.Signal(0)); err != nil { return nil } time.Sleep(500 * time.Millisecond) diff --git a/internal/adapter/lifecycle_test.go b/internal/adapter/lifecycle_test.go index 1248ac4..461ccb7 100644 --- a/internal/adapter/lifecycle_test.go +++ b/internal/adapter/lifecycle_test.go @@ -1,76 +1,47 @@ -package adapter +package adapter_test import ( - "os" - "path/filepath" + "encoding/json" + "net/http" + "net/http/httptest" + "strconv" + "strings" "testing" + "github.com/doITmagic/rag-code-mcp/internal/adapter" "github.com/doITmagic/rag-code-mcp/internal/daemon" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) -func TestIsDaemonRunning_NoPIDFile(t *testing.T) { - dir := t.TempDir() - pidPath := filepath.Join(dir, "daemon.pid") - sockPath := filepath.Join(dir, "daemon.sock") - - running, version := IsDaemonRunning(pidPath, sockPath) - assert.False(t, running) - assert.Empty(t, version) -} - -func TestIsDaemonRunning_StalePID(t *testing.T) { - dir := t.TempDir() - pidPath := filepath.Join(dir, "daemon.pid") - sockPath := filepath.Join(dir, "daemon.sock") - - // Write a PID that doesn't exist - err := daemon.WritePID(pidPath, 999999999, "1.0.0") +func TestIsDaemonRunning_Success(t *testing.T) { + // Mock a healthy daemon response + server := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + assert.Equal(t, "/health", r.URL.Path) + assert.Equal(t, http.MethodGet, r.Method) + w.WriteHeader(http.StatusOK) + resp := daemon.HealthResponse{ + Status: "ok", + Version: "1.2.3", + PID: 1234, + } + _ = json.NewEncoder(w).Encode(resp) + })) + server.Start() + defer server.Close() + + parts := strings.Split(server.Listener.Addr().String(), ":") + port, err := strconv.Atoi(parts[len(parts)-1]) require.NoError(t, err) - running, _ := IsDaemonRunning(pidPath, sockPath) - assert.False(t, running) - - // PID file should be cleaned up - _, err = os.Stat(pidPath) - assert.True(t, os.IsNotExist(err), "stale PID file should be removed") + running, version := adapter.IsDaemonRunning(port) + assert.True(t, running) + assert.Equal(t, "1.2.3", version) } -func TestIsDaemonRunning_ProcessAliveButNoSocket(t *testing.T) { - dir := t.TempDir() - pidPath := filepath.Join(dir, "daemon.pid") - sockPath := filepath.Join(dir, "daemon.sock") - - // Write current PID (alive) but no socket file - err := daemon.WritePID(pidPath, os.Getpid(), "1.0.0") - require.NoError(t, err) - - running, _ := IsDaemonRunning(pidPath, sockPath) - assert.False(t, running, "should be false without a reachable socket") - - // Both files should be cleaned up - _, err = os.Stat(pidPath) - assert.True(t, os.IsNotExist(err)) -} - -func TestCleanupStaleFiles(t *testing.T) { - dir := t.TempDir() - pidPath := filepath.Join(dir, "daemon.pid") - sockPath := filepath.Join(dir, "daemon.sock") - - require.NoError(t, os.WriteFile(pidPath, []byte("stale"), 0644)) - require.NoError(t, os.WriteFile(sockPath, []byte("stale"), 0644)) - - CleanupStaleFiles(pidPath, sockPath) - - _, err1 := os.Stat(pidPath) - _, err2 := os.Stat(sockPath) - assert.True(t, os.IsNotExist(err1)) - assert.True(t, os.IsNotExist(err2)) -} - -func TestCleanupStaleFiles_NoFiles(t *testing.T) { - // Should not panic on non-existent files - CleanupStaleFiles("/nonexistent/pid", "/nonexistent/sock") +func TestIsDaemonRunning_Failure(t *testing.T) { + // Attempt reaching a port that is definitively not open/handling HTTP + running, version := adapter.IsDaemonRunning(12345) + assert.False(t, running) + assert.Empty(t, version) } diff --git a/internal/daemon/pidfile.go b/internal/daemon/pidfile.go deleted file mode 100644 index ebe8f45..0000000 --- a/internal/daemon/pidfile.go +++ /dev/null @@ -1,75 +0,0 @@ -package daemon - -import ( - "fmt" - "os" - "strconv" - "strings" - "syscall" - "time" -) - -// PIDInfo contains daemon process metadata read from the PID file. -type PIDInfo struct { - PID int - Version string - StartedAt string -} - -// WritePID writes daemon metadata to the PID file. -// Format: key=value pairs, one per line (PID, VERSION, STARTED). -func WritePID(path string, pid int, version string) error { - content := fmt.Sprintf("PID=%d\nVERSION=%s\nSTARTED=%s\n", - pid, version, time.Now().Format(time.RFC3339)) - return os.WriteFile(path, []byte(content), 0644) -} - -// ReadPID reads and parses the PID file. Returns error if file doesn't exist -// or contains no valid PID. -func ReadPID(path string) (*PIDInfo, error) { - data, err := os.ReadFile(path) - if err != nil { - return nil, err - } - - info := &PIDInfo{} - for _, line := range strings.Split(string(data), "\n") { - parts := strings.SplitN(line, "=", 2) - if len(parts) != 2 { - continue - } - key, val := parts[0], parts[1] - switch key { - case "PID": - info.PID, _ = strconv.Atoi(val) - case "VERSION": - info.Version = val - case "STARTED": - info.StartedAt = val - } - } - - if info.PID == 0 { - return nil, fmt.Errorf("invalid PID file: no PID found in %s", path) - } - return info, nil -} - -// RemovePID deletes the PID file. Returns nil if file doesn't exist. -func RemovePID(path string) error { - err := os.Remove(path) - if os.IsNotExist(err) { - return nil - } - return err -} - -// IsProcessAlive checks if a process with the given PID is running -// by sending signal 0 (no-op signal used for existence check). -func IsProcessAlive(pid int) bool { - process, err := os.FindProcess(pid) - if err != nil { - return false - } - return process.Signal(syscall.Signal(0)) == nil -} diff --git a/internal/daemon/pidfile_test.go b/internal/daemon/pidfile_test.go deleted file mode 100644 index 64a9103..0000000 --- a/internal/daemon/pidfile_test.go +++ /dev/null @@ -1,77 +0,0 @@ -package daemon - -import ( - "os" - "path/filepath" - "testing" - - "github.com/stretchr/testify/assert" - "github.com/stretchr/testify/require" -) - -func TestWriteAndReadPID(t *testing.T) { - dir := t.TempDir() - path := filepath.Join(dir, "daemon.pid") - - err := WritePID(path, os.Getpid(), "2.1.54") - require.NoError(t, err) - - info, err := ReadPID(path) - require.NoError(t, err) - assert.Equal(t, os.Getpid(), info.PID) - assert.Equal(t, "2.1.54", info.Version) - assert.NotEmpty(t, info.StartedAt) -} - -func TestReadPID_NotExists(t *testing.T) { - _, err := ReadPID("/nonexistent/path/daemon.pid") - assert.Error(t, err) -} - -func TestReadPID_InvalidContent(t *testing.T) { - dir := t.TempDir() - path := filepath.Join(dir, "daemon.pid") - - err := os.WriteFile(path, []byte("GARBAGE=data\n"), 0644) - require.NoError(t, err) - - _, err = ReadPID(path) - assert.Error(t, err) - assert.Contains(t, err.Error(), "no PID found") -} - -func TestRemovePID(t *testing.T) { - dir := t.TempDir() - path := filepath.Join(dir, "daemon.pid") - - err := WritePID(path, 12345, "1.0.0") - require.NoError(t, err) - - err = RemovePID(path) - assert.NoError(t, err) - - _, err = os.Stat(path) - assert.True(t, os.IsNotExist(err)) -} - -func TestRemovePID_NotExists(t *testing.T) { - err := RemovePID("/nonexistent/path/daemon.pid") - assert.NoError(t, err) // should not error on missing file -} - -func TestIsProcessAlive_CurrentProcess(t *testing.T) { - assert.True(t, IsProcessAlive(os.Getpid())) -} - -func TestIsProcessAlive_NonexistentPID(t *testing.T) { - assert.False(t, IsProcessAlive(999999999)) -} - -func TestWritePID_CreatesFile(t *testing.T) { - dir := t.TempDir() - path := filepath.Join(dir, "subdir", "daemon.pid") - - // Should fail — parent dir doesn't exist - err := WritePID(path, 1, "1.0.0") - assert.Error(t, err) // os.WriteFile doesn't create parent dirs -} diff --git a/internal/daemon/run.go b/internal/daemon/run.go index cc8b1c1..dac28f8 100644 --- a/internal/daemon/run.go +++ b/internal/daemon/run.go @@ -4,9 +4,11 @@ import ( "context" "fmt" "net/http" + "net/http/pprof" "os" "path/filepath" "runtime" + "sync" "time" "github.com/doITmagic/rag-code-mcp/internal/updater" @@ -21,10 +23,14 @@ import ( "github.com/doITmagic/rag-code-mcp/internal/utils" "github.com/doITmagic/rag-code-mcp/pkg/indexer" "github.com/doITmagic/rag-code-mcp/pkg/llm" + _ "github.com/doITmagic/rag-code-mcp/pkg/parser/css" _ "github.com/doITmagic/rag-code-mcp/pkg/parser/docs" _ "github.com/doITmagic/rag-code-mcp/pkg/parser/go" _ "github.com/doITmagic/rag-code-mcp/pkg/parser/html" + _ "github.com/doITmagic/rag-code-mcp/pkg/parser/javascript" _ "github.com/doITmagic/rag-code-mcp/pkg/parser/php" + _ "github.com/doITmagic/rag-code-mcp/pkg/parser/php/laravel" + _ "github.com/doITmagic/rag-code-mcp/pkg/parser/php/wordpress" _ "github.com/doITmagic/rag-code-mcp/pkg/parser/python" "github.com/doITmagic/rag-code-mcp/pkg/storage" "github.com/modelcontextprotocol/go-sdk/mcp" @@ -209,6 +215,13 @@ func Run(rcfg RunConfig) error { mcpMux := http.NewServeMux() mcpMux.Handle("/mcp", streamableHandler) + // Profiling endpoints + mcpMux.HandleFunc("/debug/pprof/", pprof.Index) + mcpMux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline) + mcpMux.HandleFunc("/debug/pprof/profile", pprof.Profile) + mcpMux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) + mcpMux.HandleFunc("/debug/pprof/trace", pprof.Trace) + // Middleware: sticky workspace + response writer injection. // // 1. X-Workspace-Root (sticky): adapter learned workspace from a previous @@ -216,6 +229,8 @@ func Run(rcfg RunConfig) error { // 2. ResponseWriter: always injected into context so DetectContext can set // X-Resolved-Workspace header in the response — the adapter reads it // and caches it for subsequent requests. + var resumeIndexingOnce sync.Once + mcpHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { ctx := transport.WithResponseWriter(r.Context(), w) if wsRoot := r.Header.Get("X-Workspace-Root"); wsRoot != "" { @@ -223,6 +238,10 @@ func Run(rcfg RunConfig) error { ctx = transport.WithWorkspaceHint(ctx, wsRoot) } else { logger.Instance.Debug("[DAEMON] Request without X-Workspace-Root (first request or no workspace resolved yet)") + resumeIndexingOnce.Do(func() { + logger.Instance.Info("[DAEMON] Checking registry for incomplete indexing jobs...") + go eng.ResumeIndexingOnConnect() + }) } r = r.WithContext(ctx) mcpMux.ServeHTTP(w, r) @@ -237,20 +256,16 @@ func Run(rcfg RunConfig) error { if err := os.MkdirAll(ragcodeDir, 0o700); err != nil { return fmt.Errorf("cannot create ~/.ragcode: %w", err) } - socketPath := filepath.Join(ragcodeDir, "daemon.sock") - pidPath := filepath.Join(ragcodeDir, "daemon.pid") // ── Start Daemon Listeners ── logger.Instance.Info("--- DAEMON MODE --- version=%s pid=%d", rcfg.Version, os.Getpid()) listenErr := ListenAndServe(context.Background(), ListenConfig{ - SocketPath: socketPath, - PIDPath: pidPath, - Version: rcfg.Version, - HTTPPort: rcfg.HTTPPort, - Handler: mcpHandler, + Port: rcfg.HTTPPort, + Version: rcfg.Version, + Handler: mcpHandler, OnReady: func() { - logger.Instance.Info("Daemon ready — socket=%s, http_port=%d", socketPath, rcfg.HTTPPort) + logger.Instance.Info("Daemon ready — port=%d", rcfg.HTTPPort) }, }) diff --git a/internal/daemon/server.go b/internal/daemon/server.go index d80e5b6..a4f20fd 100644 --- a/internal/daemon/server.go +++ b/internal/daemon/server.go @@ -15,35 +15,22 @@ import ( // ListenConfig configures the daemon's network listeners and lifecycle. type ListenConfig struct { - SocketPath string // Unix domain socket path (required) - PIDPath string // PID file path (required) - Version string // Server version string - HTTPPort int // TCP port for optional HTTP listener (0 = disabled) - Handler http.Handler // MCP handler (must handle /mcp) - OnReady func() // Called when daemon is ready to accept connections (optional) + Port int // TCP port for localhost listener + Version string // Server version string + Handler http.Handler // MCP handler (must handle /mcp) + OnReady func() // Called when daemon is ready to accept connections (optional) } // ListenAndServe starts the daemon listeners and blocks until ctx is cancelled -// or SIGTERM/SIGINT is received. Cleans up socket and PID file on exit. -// -// It sets up two listeners: -// 1. Unix domain socket at SocketPath (primary, for stdio adapters) -// 2. TCP HTTP on HTTPPort (optional, for curl/debug/external agents, localhost only) -// -// Both serve the same handler mux with /health and the provided MCP handler. +// or SIGTERM/SIGINT is received. It binds exclusively to a local TCP port to +// guarantee it is a singleton, avoiding file locking issues. func ListenAndServe(ctx context.Context, cfg ListenConfig) error { startTime := time.Now() // Validate required config - if cfg.SocketPath == "" { - return fmt.Errorf("ListenAndServe: SocketPath is required") + if cfg.Port <= 0 { + return fmt.Errorf("ListenAndServe: valid Port is required") } - if cfg.PIDPath == "" { - return fmt.Errorf("ListenAndServe: PIDPath is required") - } - - // Remove stale socket if it exists - os.Remove(cfg.SocketPath) // Build mux: /health + user handler for everything else mux := http.NewServeMux() @@ -52,67 +39,28 @@ func ListenAndServe(ctx context.Context, cfg ListenConfig) error { mux.Handle("/", cfg.Handler) } - // --- Unix socket listener (primary) --- - unixListener, err := net.Listen("unix", cfg.SocketPath) + // Bind to local TCP port (guarantees Singleton) + addr := fmt.Sprintf("127.0.0.1:%d", cfg.Port) + tcpListener, err := net.Listen("tcp", addr) if err != nil { - return fmt.Errorf("failed to listen on unix socket %s: %w", cfg.SocketPath, err) - } - // Restrict socket access to owner only (security: prevents other local users from connecting) - if chmodErr := os.Chmod(cfg.SocketPath, 0o600); chmodErr != nil { - logger.Instance.Warn("Failed to chmod socket to 0600: %v", chmodErr) - } - - // Ensure cleanup of socket file on exit - defer func() { - unixListener.Close() - os.Remove(cfg.SocketPath) - }() - - // Write PID file (fatal on failure — adapters rely on it for discovery/version checks) - if err := WritePID(cfg.PIDPath, os.Getpid(), cfg.Version); err != nil { - unixListener.Close() - os.Remove(cfg.SocketPath) - return fmt.Errorf("failed to write PID file %s: %w", cfg.PIDPath, err) + // If address is in use, another instance is already running + return fmt.Errorf("failed to bind TCP port %s (address in use?): %w", addr, err) } - defer func() { _ = RemovePID(cfg.PIDPath) }() - // --- Optional TCP HTTP listener (localhost only for security) --- - var tcpListener net.Listener - if cfg.HTTPPort > 0 { - tcpListener, err = net.Listen("tcp", fmt.Sprintf("127.0.0.1:%d", cfg.HTTPPort)) - if err != nil { - logger.Instance.Warn("HTTP port unavailable (non-fatal, Unix socket is primary): port=%d err=%v", - cfg.HTTPPort, err) - // Non-fatal — Unix socket is the primary channel - } - } - - // Start serving on Unix socket - unixServer := &http.Server{Handler: mux} + tcpServer := &http.Server{Handler: mux} go func() { - if serveErr := unixServer.Serve(unixListener); serveErr != nil && serveErr != http.ErrServerClosed { - logger.Instance.Error("Unix socket server error: %v", serveErr) + if serveErr := tcpServer.Serve(tcpListener); serveErr != nil && serveErr != http.ErrServerClosed { + logger.Instance.Error("TCP server error: %v", serveErr) } }() - - // Start serving on TCP (if available) - var tcpServer *http.Server - if tcpListener != nil { - tcpServer = &http.Server{Handler: mux} - go func() { - if serveErr := tcpServer.Serve(tcpListener); serveErr != nil && serveErr != http.ErrServerClosed { - logger.Instance.Error("TCP server error: %v", serveErr) - } - }() - logger.Instance.Info("HTTP server listening on 127.0.0.1:%d", cfg.HTTPPort) - } + logger.Instance.Info("HTTP server listening on %s", addr) // Signal readiness AFTER servers are actually serving if cfg.OnReady != nil { cfg.OnReady() } - logger.Instance.Info("Daemon ready — socket=%s version=%s pid=%d", cfg.SocketPath, cfg.Version, os.Getpid()) + logger.Instance.Info("Daemon ready — address=%s version=%s pid=%d", addr, cfg.Version, os.Getpid()) // Block until context cancellation or OS signal sigCtx, stop := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM) @@ -125,13 +73,8 @@ func ListenAndServe(ctx context.Context, cfg ListenConfig) error { shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 5*time.Second) defer shutdownCancel() - if shutdownErr := unixServer.Shutdown(shutdownCtx); shutdownErr != nil { - logger.Instance.Warn("Unix server shutdown error: %v", shutdownErr) - } - if tcpServer != nil { - if shutdownErr := tcpServer.Shutdown(shutdownCtx); shutdownErr != nil { - logger.Instance.Warn("TCP server shutdown error: %v", shutdownErr) - } + if shutdownErr := tcpServer.Shutdown(shutdownCtx); shutdownErr != nil { + logger.Instance.Warn("TCP server shutdown error: %v", shutdownErr) } return nil diff --git a/internal/daemon/server_test.go b/internal/daemon/server_test.go index 8d71798..bbb69f6 100644 --- a/internal/daemon/server_test.go +++ b/internal/daemon/server_test.go @@ -1,203 +1,86 @@ -package daemon +package daemon_test import ( "context" "encoding/json" - "io" "net" "net/http" - "os" - "path/filepath" - "strings" + "strconv" "testing" "time" + "github.com/doITmagic/rag-code-mcp/internal/daemon" "github.com/stretchr/testify/assert" "github.com/stretchr/testify/require" ) -func TestListenAndServe_UnixSocket(t *testing.T) { - dir := t.TempDir() - sockPath := filepath.Join(dir, "test.sock") - pidPath := filepath.Join(dir, "test.pid") - - ctx, cancel := context.WithCancel(context.Background()) - defer cancel() - - ready := make(chan struct{}) - - go func() { - _ = ListenAndServe(ctx, ListenConfig{ - SocketPath: sockPath, - PIDPath: pidPath, - Version: "1.0.0-test", - HTTPPort: 0, // disabled - OnReady: func() { close(ready) }, - Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - w.Header().Set("Content-Type", "application/json") - _, _ = w.Write([]byte(`{"ok":true}`)) - }), - }) - }() - - select { - case <-ready: - case <-time.After(3 * time.Second): - t.Fatal("daemon did not become ready in 3s") +func getFreePort() (int, error) { + addr, err := net.ResolveTCPAddr("tcp", "localhost:0") + if err != nil { + return 0, err } - - // Connect via Unix socket and hit /health - client := &http.Client{ - Transport: &http.Transport{ - DialContext: func(_ context.Context, _, _ string) (net.Conn, error) { - return net.Dial("unix", sockPath) - }, - }, + l, err := net.ListenTCP("tcp", addr) + if err != nil { + return 0, err } - - resp, err := client.Get("http://daemon/health") - require.NoError(t, err) - defer resp.Body.Close() - assert.Equal(t, http.StatusOK, resp.StatusCode) - - var health HealthResponse - body, _ := io.ReadAll(resp.Body) - err = json.Unmarshal(body, &health) - require.NoError(t, err) - assert.Equal(t, "ok", health.Status) - assert.Equal(t, "1.0.0-test", health.Version) - assert.Equal(t, os.Getpid(), health.PID) + defer l.Close() + return l.Addr().(*net.TCPAddr).Port, nil } -func TestListenAndServe_PIDFileWritten(t *testing.T) { - dir := t.TempDir() - sockPath := filepath.Join(dir, "test.sock") - pidPath := filepath.Join(dir, "test.pid") - - ctx, cancel := context.WithCancel(context.Background()) - defer cancel() - - ready := make(chan struct{}) - - go func() { - _ = ListenAndServe(ctx, ListenConfig{ - SocketPath: sockPath, - PIDPath: pidPath, - Version: "2.0.0", - HTTPPort: 0, - OnReady: func() { close(ready) }, - }) - }() - - <-ready - - // Verify PID file - info, err := ReadPID(pidPath) +func TestListenAndServe_Success(t *testing.T) { + port, err := getFreePort() require.NoError(t, err) - assert.Equal(t, os.Getpid(), info.PID) - assert.Equal(t, "2.0.0", info.Version) - - // Cancel and wait for cleanup - cancel() - - // Verify socket file was cleaned up (poll instead of fixed sleep) - require.Eventually(t, func() bool { - _, err := os.Stat(sockPath) - return os.IsNotExist(err) - }, 3*time.Second, 50*time.Millisecond, "socket file should be removed after shutdown") -} - -func TestListenAndServe_MCPHandler(t *testing.T) { - dir := t.TempDir() - sockPath := filepath.Join(dir, "test.sock") - pidPath := filepath.Join(dir, "test.pid") ctx, cancel := context.WithCancel(context.Background()) defer cancel() ready := make(chan struct{}) - - // Custom handler simulating MCP - mcpHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - body, _ := io.ReadAll(r.Body) - var req map[string]any - _ = json.Unmarshal(body, &req) - - resp := map[string]any{ - "jsonrpc": "2.0", - "id": req["id"], - "result": map[string]any{"tools": []string{"rag_search"}}, - } - w.Header().Set("Content-Type", "application/json") - _ = json.NewEncoder(w).Encode(resp) - }) - go func() { - _ = ListenAndServe(ctx, ListenConfig{ - SocketPath: sockPath, - PIDPath: pidPath, - Version: "1.0.0", - HTTPPort: 0, - OnReady: func() { close(ready) }, - Handler: mcpHandler, + _ = daemon.ListenAndServe(ctx, daemon.ListenConfig{ + Port: port, + Version: "test-1.0", + OnReady: func() { close(ready) }, }) }() - <-ready - - client := &http.Client{ - Transport: &http.Transport{ - DialContext: func(_ context.Context, _, _ string) (net.Conn, error) { - return net.Dial("unix", sockPath) - }, - }, + select { + case <-ready: + case <-time.After(5 * time.Second): + t.Fatal("daemon did not start in time") } - // Send MCP-like JSON-RPC request - resp, err := client.Post("http://daemon/mcp", - "application/json", - io.NopCloser( - strings.NewReader(`{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}`), - ), - ) + // Verify health endpoint works + resp, err := http.Get("http://127.0.0.1:" + strconv.Itoa(port) + "/health") require.NoError(t, err) defer resp.Body.Close() - var result map[string]any - body, _ := io.ReadAll(resp.Body) - err = json.Unmarshal(body, &result) + assert.Equal(t, http.StatusOK, resp.StatusCode) + + var health daemon.HealthResponse + err = json.NewDecoder(resp.Body).Decode(&health) require.NoError(t, err) - assert.Equal(t, float64(1), result["id"]) - assert.NotNil(t, result["result"]) + assert.Equal(t, "test-1.0", health.Version) } -func TestListenAndServe_StaleSocketCleanup(t *testing.T) { - dir := t.TempDir() - sockPath := filepath.Join(dir, "test.sock") - pidPath := filepath.Join(dir, "test.pid") +func TestListenAndServe_PortConflict(t *testing.T) { + port, err := getFreePort() + require.NoError(t, err) - // Create a stale socket file - require.NoError(t, os.WriteFile(sockPath, []byte("stale"), 0644)) + // Block the port first + l, err := net.Listen("tcp", "127.0.0.1:"+strconv.Itoa(port)) + require.NoError(t, err) + defer l.Close() - ctx, cancel := context.WithCancel(context.Background()) + ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second) defer cancel() - ready := make(chan struct{}) - - go func() { - _ = ListenAndServe(ctx, ListenConfig{ - SocketPath: sockPath, - PIDPath: pidPath, - Version: "1.0.0", - HTTPPort: 0, - OnReady: func() { close(ready) }, - }) - }() - - select { - case <-ready: - // Success — daemon started despite stale file - case <-time.After(3 * time.Second): - t.Fatal("daemon should have cleaned up stale socket and started") - } + err = daemon.ListenAndServe(ctx, daemon.ListenConfig{ + Port: port, + Version: "test-1.0", + OnReady: func() {}, // Should not be called + }) + + // Expect address in use error + require.Error(t, err) + assert.Contains(t, err.Error(), "address already in use") } diff --git a/internal/service/engine/engine.go b/internal/service/engine/engine.go index 1f0ea65..95492cc 100644 --- a/internal/service/engine/engine.go +++ b/internal/service/engine/engine.go @@ -45,17 +45,12 @@ type Engine struct { pendingFiles map[string]map[string]struct{} // workspaceID -> set(filePath) pendingOverflow map[string]bool // workspaceID -> too many pending changes, fallback to full scan - progress *progressStore + registry *registry.Registry // detectionCache stores resolved WorkspaceContext with TTL to avoid // repeated full resolver cascades for the same path. detectionCache sync.Map // map[string]*detectionCacheEntry - // resumeAttempts throttles auto-resume of interrupted indexing. - // Key: workspace ID, Value: time.Time of last resume attempt. - // Prevents CPU/log churn when indexing keeps failing (e.g. Ollama down). - resumeAttempts sync.Map - // connectTriggered tracks whether background indexing was automatically // triggered for a workspace ID upon initial daemon resolution. connectTriggered sync.Map @@ -111,24 +106,22 @@ func NewEngine(idx *indexer.Service, srv *search.Service, registryPath string, c } return &Engine{ - indexer: idx, - search: srv, - resolver: res, - config: cfg, - watchers: watcherMgr, - progress: newProgressStore(), + indexer: idx, + search: srv, + resolver: res, + config: cfg, + watchers: watcherMgr, + registry: reg, + pendingFiles: make(map[string]map[string]struct{}), pendingOverflow: make(map[string]bool), } } -// GetIndexProgress returns the last known indexing progress for a workspace. -// workspaceRoot is used as a hint to load persisted status from disc if not in memory. -func (e *Engine) GetIndexProgress(workspaceID, workspaceRoot string) *IndexProgress { - if e.progress == nil { - return nil - } - return e.progress.get(workspaceID, workspaceRoot) +// GetIndexStatus returns the last known indexing status for a workspace. +// Reads directly from {workspaceRoot}/.ragcode/index_status.json. +func (e *Engine) GetIndexStatus(workspaceRoot string) *indexer.IndexStatus { + return indexer.LoadIndexStatus(workspaceRoot) } // ActiveIndexingJobs returns the IDs of workspaces currently being indexed. @@ -284,10 +277,16 @@ func (e *Engine) DetectContext(ctx context.Context, path string) (*WorkspaceCont // We use connectTriggered to ensure this only happens ONCE per WorkspaceID // per daemon lifetime, preventing full index scans on every cache miss. // recreate=false ensures incremental indexing — only new/changed files are processed. - if _, triggered := e.connectTriggered.LoadOrStore(wctx.ID, true); !triggered { - if _, alreadyRunning := e.indexingJobs.Load(wctx.ID); !alreadyRunning { + if e.config == nil || e.config.Workspace.AutoIndex { + if _, triggered := e.connectTriggered.LoadOrStore(wctx.ID, true); !triggered { + // Do NOT check indexingJobs.Load here — that is a TOCTOU race window. + // StartIndexingAsync uses indexingJobs.LoadOrStore atomically and is + // already idempotent: if a job is running it returns immediately. + // Calling it directly (not via `go`) avoids the extra scheduling delay + // that previously widened the race window between ResumeIndexingOnConnect + // and DetectContext. StartIndexingAsync spawns its own goroutine internally. logger.Instance.Info("[DAEMON] [WS-DETECT] Auto-triggering incremental index for workspace: %s", wctx.Root) - go e.StartIndexingAsync(wctx.Root, wctx.ID, nil, false) + e.StartIndexingAsync(wctx.Root, wctx.ID, nil, false) } } @@ -336,6 +335,45 @@ func (e *Engine) CheckAndReindexOnConnect(hint string) string { return wctx.Root } +// ResumeIndexingOnConnect iterates through all registered workspaces, checks if their +// indexing was interrupted (no EndedAt status), and resumes indexing for the one +// that was most recently being indexed. +func (e *Engine) ResumeIndexingOnConnect() { + if e.config != nil && !e.config.Workspace.AutoIndex { + logger.Instance.Debug("[DAEMON] ResumeIndexingOnConnect: auto_index disabled") + return + } + + if e.registry == nil { + logger.Instance.Debug("[DAEMON] ResumeIndexingOnConnect: no registry configured") + return + } + + entries := e.registry.List() + if len(entries) == 0 { + return + } + + var roots []string + idMap := make(map[string]string) + for _, entry := range entries { + roots = append(roots, entry.Root) + idMap[entry.Root] = entry.ID + } + + bestRoot := indexer.GetLastInterruptedWorkspace(roots) + if bestRoot != "" { + logger.Instance.Info("[DAEMON] Resuming incomplete indexing for workspace: %s", filepath.Base(bestRoot)) + // Mark as triggered BEFORE calling StartIndexingAsync so that DetectContext's + // auto-trigger (connectTriggered.LoadOrStore) sees this workspace as already + // handled and does NOT start a second concurrent indexing job. + e.connectTriggered.Store(idMap[bestRoot], true) + e.StartIndexingAsync(bestRoot, idMap[bestRoot], nil, false) + } else { + logger.Instance.Debug("[DAEMON] ResumeIndexingOnConnect: no incomplete indexing jobs found") + } +} + // SearchCodeResult wraps search results with workspace context. type SearchCodeResult struct { Results []storage.SearchResult @@ -413,23 +451,6 @@ func (e *Engine) SearchCode(ctx context.Context, filePath, queryText string, lim primaryColl := wctx.CollectionName(primaryLang) t1 := time.Now() - // Auto-resume: if GlobalPercent is between 1-99 and state is "running", - // the indexer was interrupted — trigger a background re-index. - if idxStatus := loadIndexStatus(wctx.Root); idxStatus != nil { - if idxStatus.WorkspaceID == wctx.ID && idxStatus.GlobalPercent > 0 && idxStatus.GlobalPercent < 100 && idxStatus.State == "running" { - if _, ok := e.indexingJobs.Load(wctx.ID); !ok { - const resumeCooldown = 5 * time.Minute - now := time.Now() - if last, loaded := e.resumeAttempts.Load(wctx.ID); !loaded || now.Sub(last.(time.Time)) > resumeCooldown { - e.resumeAttempts.Store(wctx.ID, now) - logger.Instance.Info("[IDX] ws=%s Indexing interrupted at %d%% — auto-resuming", filepath.Base(wctx.Root), idxStatus.GlobalPercent) - e.StartIndexingAsync(wctx.Root, wctx.ID, nil, false) - } - } - logger.Instance.Info("[IDX] ws=%s Indexing in progress (%d%%) — will search available collections", filepath.Base(wctx.Root), idxStatus.GlobalPercent) - } - } - // Check if the primary collection exists. // If not, trigger background indexing but do NOT block — the fan-out below // will search any other language collections that do exist. @@ -622,22 +643,6 @@ func (e *Engine) HybridSearchCode(ctx context.Context, filePath, queryText strin collection := wctx.CollectionName(lang) - // Auto-resume from index_status.json: if indexing was interrupted, resume it. - if idxStatus := loadIndexStatus(wctx.Root); idxStatus != nil { - if idxStatus.WorkspaceID == wctx.ID && idxStatus.GlobalPercent > 0 && idxStatus.GlobalPercent < 100 && idxStatus.State == "running" { - if _, ok := e.indexingJobs.Load(wctx.ID); !ok { - const resumeCooldown = 5 * time.Minute - now := time.Now() - if last, loaded := e.resumeAttempts.Load(wctx.ID); !loaded || now.Sub(last.(time.Time)) > resumeCooldown { - e.resumeAttempts.Store(wctx.ID, now) - logger.Instance.Info("[IDX] ws=%s Indexing interrupted at %d%% — auto-resuming", filepath.Base(wctx.Root), idxStatus.GlobalPercent) - e.StartIndexingAsync(wctx.Root, wctx.ID, nil, false) - } - } - logger.Instance.Info("[IDX] ws=%s Indexing in progress (%d%%) — will search available collections", filepath.Base(wctx.Root), idxStatus.GlobalPercent) - } - } - exists, err := e.search.CollectionExists(ctx, collection) if err != nil { return nil, fmt.Errorf("failed to check collection: %w", err) @@ -797,8 +802,8 @@ func (e *Engine) popPendingIndex(workspaceID string) (files []string, overflow b func (e *Engine) tryStartPendingIndex(root, workspaceID string) { files, overflow := e.popPendingIndex(workspaceID) if overflow { - logger.Instance.Info("[IDX] ♻️ Pending changes exceeded limit for ws=%s — triggering full scan", filepath.Base(root)) - e.StartIndexingAsync(root, workspaceID, nil, false) + logger.Instance.Info("[IDX] ♻️ Pending changes exceeded limit for ws=%s — triggering full re-index", filepath.Base(root)) + e.StartIndexingAsync(root, workspaceID, nil, true) return } if len(files) == 0 { @@ -809,9 +814,25 @@ func (e *Engine) tryStartPendingIndex(root, workspaceID string) { // StartIndexingAsync starts the indexing process in a background goroutine. // If changedFiles is nil or empty, a full re-index is performed. +// If recreate=true and a job is already running, the recreate is queued and +// will start immediately after the current job finishes. func (e *Engine) StartIndexingAsync(root, id string, changedFiles []string, recreate bool) { + if watch.IsInvalidRoot(root) { + logger.Instance.Error("[IDX] ⛔ Refusing to index invalid/dangerous root: %s", root) + return + } + if _, loaded := e.indexingJobs.LoadOrStore(id, time.Now()); loaded { - return // Already running + // A job is already running. If recreate=true, queue it so it fires + // after the current job finishes (via tryStartPendingIndex/defer). + if recreate { + e.pendingMu.Lock() + e.pendingOverflow[id] = true // overflow = full re-index + delete(e.pendingFiles, id) + e.pendingMu.Unlock() + logger.Instance.Info("[IDX] ⏳ ws=%s recreate requested while indexing — queued for after current job", filepath.Base(root)) + } + return } // Count active jobs after adding this one — warn if multiple workspaces are indexing @@ -822,10 +843,17 @@ func (e *Engine) StartIndexingAsync(root, id string, changedFiles []string, recr logger.Instance.Warn("[IDX] ⚠️ %d workspaces indexing simultaneously — Ollama requests will serialize implicitly (ws=%s)", activeCount, filepath.Base(root)) } - jobID := fmt.Sprintf("%s-%d", id, time.Now().UnixNano()) - if e.progress != nil { - e.progress.start(id, root, jobID, time.Now()) + // Preserve existing Languages data on restart so incremental indexing + // doesn't show processed=0 to AI consumers. Reset only lifecycle fields. + s := indexer.LoadIndexStatus(root) + if s == nil { + s = &indexer.IndexStatus{} } + s.StartedAt = time.Now().UTC().Format(time.RFC3339) + s.EndedAt = "" + s.Elapsed = "" + s.Error = "" + indexer.SaveIndexStatus(root, s) go func() { defer func() { @@ -845,20 +873,33 @@ func (e *Engine) StartIndexingAsync(root, id string, changedFiles []string, recr err = e.IndexWorkspace(ctx, root, recreate) } + finalizeIndexStatus(root, err) if err != nil { logger.Instance.Error("[IDX] ws=%s Background indexing failed: %v", filepath.Base(root), err) - if e.progress != nil { - e.progress.fail(id, root, time.Now(), err.Error()) - } } else { logger.Instance.Info("[IDX] ✅ ws=%s Background indexing completed", filepath.Base(root)) - if e.progress != nil { - e.progress.complete(id, root, time.Now()) - } } }() } +// finalizeIndexStatus loads the current status, stamps EndedAt/Elapsed/Error, +// and saves it atomically. Centralises the logic that was duplicated in the +// success and error branches of StartIndexingAsync. +func finalizeIndexStatus(root string, indexErr error) { + s := indexer.LoadIndexStatus(root) + if s == nil { + s = &indexer.IndexStatus{} + } + s.EndedAt = time.Now().UTC().Format(time.RFC3339) + if started, pErr := time.Parse(time.RFC3339, s.StartedAt); pErr == nil { + s.Elapsed = time.Since(started).Round(time.Second).String() + } + if indexErr != nil { + s.Error = indexErr.Error() + } + indexer.SaveIndexStatus(root, s) +} + // IndexFiles indexes specific files in a workspace. func (e *Engine) IndexFiles(ctx context.Context, root string, files []string) error { wctx, err := e.DetectContext(ctx, root) @@ -916,39 +957,92 @@ func (e *Engine) IndexWorkspace(ctx context.Context, path string, recreate bool) wsName := filepath.Base(wctx.Root) - // Pre-scan: count files for every language before starting indexing. - // This gives progressStore an accurate denominator from the start, so - // GlobalPercent increases monotonically instead of resetting per language. var excludePatterns []string if e.config != nil { excludePatterns = e.config.Workspace.ExcludePatterns } - // Pre-scan: single WalkDir counting files per language before indexing begins. - // This gives progressStore an accurate denominator from the start, so - // GlobalPercent increases monotonically. One combined walk avoids - // O(languages x files) traversals that per-language scans would incur. - if e.progress != nil { - fileCounts := e.indexer.CountAllFiles(wctx.Root, excludePatterns) - for _, lang := range languages { - e.progress.preRegister(wctx.ID, lang, fileCounts[lang], time.Now()) - } + + // Pre-count total files per language with a single WalkDir pass. + // This gives us the real on_disk totals for accurate progress reporting, + // instead of using len(changedFiles) which only reflects modified files. + fileCounts := e.indexer.CountAllFiles(wctx.Root, excludePatterns) + logger.Instance.Info("[IDX] ws=%s file counts: %v", wsName, fileCounts) + + // Sort languages by file count descending so the dominant language is indexed + // first and AI search works immediately for the most relevant code. + sort.Slice(languages, func(i, j int) bool { + return fileCounts[languages[i]] > fileCounts[languages[j]] + }) + logger.Instance.Info("[IDX] ws=%s indexing order: %v", wsName, languages) + + // Load or create a shared in-memory IndexStatus for the entire indexing run. + // This avoids calling LoadIndexStatus (JSON read + parse) on every Progress tick. + // The single *IndexStatus is updated in-place; only SaveIndexStatus (atomic write) + // hits the disk, and only every 10 files. + s := indexer.LoadIndexStatus(wctx.Root) + if s == nil { + s = &indexer.IndexStatus{StartedAt: time.Now().UTC().Format(time.RFC3339)} } + if s.Languages == nil { + s.Languages = make(map[string]indexer.LangStatus) + } + // Pre-populate real on_disk counts so languages with 0 changed files still appear. + for _, l := range languages { + entry := s.Languages[l] + entry.OnDisk = fileCounts[l] + s.Languages[l] = entry + } + indexer.SaveIndexStatus(wctx.Root, s) var indexErrors []string for _, lang := range languages { + diskTotal := fileCounts[lang] collection := wctx.CollectionName(lang) - progressCb := func(doneFiles, totalFiles int) { - if e.progress != nil { - e.progress.update(wctx.ID, lang, doneFiles, totalFiles, time.Now()) - } + logger.Instance.Info("[IDX] ws=%s lang=%s ▶ starting (on_disk=%d)", wsName, lang, diskTotal) + + // Capture the already-processed count before this run starts. + // For incremental runs (only changed files), we accumulate on top of + // whatever was already indexed in Qdrant. For a full re-index + // (recreate=true, or all files changed), we reset to 0. + // baseProcessed is captured once per language, before the Progress + // callback fires, so it's safe to close over it. + baseProcessed := s.Languages[lang].Processed + if recreate { + baseProcessed = 0 } - logger.Instance.Info("[IDX] ws=%s lang=%s ▶ starting", wsName, lang) + // firstTick is used to detect on the first Progress callback whether + // this is a full re-index (totalFiles >= diskTotal) so we can reset + // baseProcessed to 0 and avoid double-counting. + firstTick := true + err := e.indexer.IndexWorkspace(ctx, wctx.Root, collection, indexer.Options{ Language: lang, WorkspaceName: wsName, ExcludePatterns: excludePatterns, Recreate: recreate, - Progress: progressCb, + Progress: func(doneFiles, totalFiles int) { + // Throttle disk I/O: write every 10 files or on the last file + if doneFiles%10 != 0 && doneFiles != totalFiles { + return + } + // On the first tick, decide if this is a full re-index. + // If totalFiles covers all on-disk files, reset base to 0 + // so we don't double-count the existing Processed value. + if firstTick { + firstTick = false + if diskTotal > 0 && totalFiles >= diskTotal { + baseProcessed = 0 + } + } + ls := s.Languages[lang] + ls.OnDisk = diskTotal // real total files on disk + ls.Changed = totalFiles // files that needed re-indexing this run + // Cumulative total: for incremental runs add to the existing + // DB count; for full re-indexes (base=0) start from scratch. + ls.Processed = baseProcessed + doneFiles + s.Languages[lang] = ls + indexer.SaveIndexStatus(wctx.Root, s) + }, }) if err != nil { logger.Instance.Error("[IDX] ws=%s lang=%s ❌ failed: %v", wsName, lang, err) diff --git a/internal/service/engine/engine_fallback_search_test.go b/internal/service/engine/engine_fallback_search_test.go index 64582a1..30d3705 100644 --- a/internal/service/engine/engine_fallback_search_test.go +++ b/internal/service/engine/engine_fallback_search_test.go @@ -94,12 +94,6 @@ func ValidateEmail(email string) bool { eng := NewEngine(idxSvc, searchSvc, "", &config.Config{}) eng.SetResolver(resolver.New(resolver.Dependencies{Detector: &mockDirDetector{root: root}})) - t.Cleanup(func() { - if eng.progress != nil { - eng.progress.stop() - } - }) - return root, eng } diff --git a/internal/service/engine/engine_nonblocking_search_test.go b/internal/service/engine/engine_nonblocking_search_test.go index 5fecfd0..35fc329 100644 --- a/internal/service/engine/engine_nonblocking_search_test.go +++ b/internal/service/engine/engine_nonblocking_search_test.go @@ -3,7 +3,10 @@ package engine import ( "context" "errors" + "os" + "path/filepath" "testing" + "time" "github.com/doITmagic/rag-code-mcp/pkg/storage" "github.com/doITmagic/rag-code-mcp/pkg/workspace/resolver" @@ -76,10 +79,13 @@ func TestSearchCodeReturnsResultsFromOtherLangsWhenPrimaryMissing(t *testing.T) t.Error("Expected py-result in merged results") } - // Cleanup: stop progress flusher + // Cleanup: wait for bg goroutines t.Cleanup(func() { - if eng2.progress != nil { - eng2.progress.stop() + for i := 0; i < 200; i++ { + if len(eng2.ActiveIndexingJobs()) == 0 { + break + } + time.Sleep(time.Millisecond) } }) } @@ -127,16 +133,13 @@ func TestSearchCodeBlocksWhenZeroCollectionsExist(t *testing.T) { t.Fatalf("Expected ErrIndexingStarted or ErrNotIndexed, got: %T: %v", err, err) } - // Cleanup + // Cleanup: wait for bg goroutines from both eng and eng2 t.Cleanup(func() { - if eng2.progress != nil { - eng2.progress.stop() - } - // Wait for bg indexing to drain - for i := 0; i < 100; i++ { - if len(eng2.ActiveIndexingJobs()) == 0 { + for i := 0; i < 200; i++ { + if len(eng.ActiveIndexingJobs()) == 0 && len(eng2.ActiveIndexingJobs()) == 0 { break } + time.Sleep(time.Millisecond) } }) } @@ -182,9 +185,13 @@ func TestSearchCodeIndexingInProgressStillSearches(t *testing.T) { // Cleanup the fake job eng2.indexingJobs.Delete(wctx.ID) + // Cleanup: wait for bg goroutines t.Cleanup(func() { - if eng2.progress != nil { - eng2.progress.stop() + for i := 0; i < 200; i++ { + if len(eng.ActiveIndexingJobs()) == 0 && len(eng2.ActiveIndexingJobs()) == 0 { + break + } + time.Sleep(time.Millisecond) } }) } @@ -232,17 +239,18 @@ func TestHybridSearchCodeReturnsNilWhenCollectionMissing(t *testing.T) { _ = wctx - // Cleanup + // Cleanup: wait for BOTH engine instances' goroutines to finish. + // eng spawned a goroutine via DetectContext→connectTriggered→StartIndexingAsync. + // eng2 spawned one via HybridSearchCode→StartIndexingAsync. + // Both write to rootDir/.ragcode/ — if we don't wait, TempDir cleanup fails. t.Cleanup(func() { - if eng2.progress != nil { - eng2.progress.stop() - } - // Wait for bg indexing to drain - for i := 0; i < 100; i++ { - if len(eng2.ActiveIndexingJobs()) == 0 { + for i := 0; i < 200; i++ { + if len(eng.ActiveIndexingJobs()) == 0 && len(eng2.ActiveIndexingJobs()) == 0 { break } + time.Sleep(time.Millisecond) } + os.RemoveAll(filepath.Join(rootDir, ".ragcode")) }) } @@ -289,8 +297,11 @@ func TestHybridSearchCodeStillWorksWhenCollectionExists(t *testing.T) { } t.Cleanup(func() { - if eng2.progress != nil { - eng2.progress.stop() + for i := 0; i < 200; i++ { + if len(eng2.ActiveIndexingJobs()) == 0 { + break + } + time.Sleep(time.Millisecond) } }) } diff --git a/internal/service/engine/engine_searchcode_test.go b/internal/service/engine/engine_searchcode_test.go index 1ea423c..51a75c4 100644 --- a/internal/service/engine/engine_searchcode_test.go +++ b/internal/service/engine/engine_searchcode_test.go @@ -2,6 +2,8 @@ package engine import ( "context" + "os" + "path/filepath" "sync/atomic" "testing" "time" @@ -284,6 +286,19 @@ func TestSearchCodeResumeInterruptedIndexing(t *testing.T) { if wctx == nil { t.Fatalf("Failed to detect context") } + + // Wait for background goroutine (triggered by connectTriggered in DetectContext) + // to finish before TempDir cleanup removes the directory. + t.Cleanup(func() { + for i := 0; i < 200; i++ { + if len(eng.ActiveIndexingJobs()) == 0 { + break + } + time.Sleep(time.Millisecond) + } + os.RemoveAll(filepath.Join(rootDir, ".ragcode")) + }) + // Make the collection exist so search continues goColl := CollectionNameFor(wctx.ID, "go") store := &multiLangStore{ @@ -297,116 +312,109 @@ func TestSearchCodeResumeInterruptedIndexing(t *testing.T) { eng.SetSearchService(search.NewService(llmProvider, store)) // Verify SearchCode succeeds (returns results) when collection exists. - // Auto-resume behavior is tested separately in TestSearchCodeAutoResumesInterruptedIndexing. _, err := eng.SearchCode(context.Background(), "dummy.go", "test", 10, false) if err != nil { t.Fatalf("Expected no error when collection exists, got: %v", err) } } -// mockDirDetector is like mockDetector but allows specifying the root dir -type mockDirDetector struct { - root string -} - -func (m *mockDirDetector) DetectFromFilePath(_ context.Context, path string) (*contract.WorkspaceCandidate, *contract.ResolveWorkspaceError) { - return &contract.WorkspaceCandidate{ - Root: m.root, - Confidence: 1.0, - }, nil -} - -// TestSearchCodeAutoResumesInterruptedIndexing verifies that SearchCode triggers -// a background re-index when index_status.json shows an interrupted indexing -// session (state="running", 1 <= GlobalPercent <= 99). -// It also verifies the 5-minute cooldown prevents repeated resume triggers. -func TestSearchCodeAutoResumesInterruptedIndexing(t *testing.T) { +// TestStartIndexingAsyncRecreateQueuesWhenJobRunning verifies BUG-004 fix: +// when recreate=true is requested while a job is already running, the recreate +// must be queued as pendingOverflow (full re-index) rather than silently dropped. +func TestStartIndexingAsyncRecreateQueuesWhenJobRunning(t *testing.T) { llmProvider := &countingLLM{} eng := newEngineCountingLLM(&testStore{existing: map[string]bool{}}, llmProvider) - wsRoot := t.TempDir() - eng.SetResolver(resolver.New(resolver.Dependencies{Detector: &mockDirDetector{root: wsRoot}})) + const wsID = "test-ws-id" + const wsRoot = "/tmp/fake-ws" - // Pre-fill the connect trigger so we test the search-based auto-resume, - // not the daemon's on-connect auto-warmup. - wsID := contract.DeriveWorkspaceID(wsRoot, "", "") - eng.connectTriggered.Store(wsID, true) + // Simulate a job already running for this workspace. + eng.indexingJobs.Store(wsID, time.Now()) - // Detect context to learn the workspace ID (needed for index_status.json) - wctx, err := eng.DetectContext(context.Background(), "dummy.go") - if err != nil || wctx == nil { - t.Fatal("Failed to detect workspace context") - } + // Request recreate=true while the job is running. + eng.StartIndexingAsync(wsRoot, wsID, nil, true) - // Write index_status.json that simulates an interrupted indexing session at 50% - now := time.Now() - status := IndexProgress{ - JobID: "interrupted-test-job", - WorkspaceID: wctx.ID, - WorkspaceRoot: wsRoot, - State: "running", - GlobalPercent: 50, - StartedAt: now.Add(-10 * time.Minute), - UpdatedAt: now, - Languages: map[string]IndexLanguageProgress{}, + // The job is still marked as running (we put it there). + _, stillRunning := eng.indexingJobs.Load(wsID) + if !stillRunning { + t.Fatal("Expected job to still be running (we stored it manually)") } - saveIndexStatus(wsRoot, &status) - // Pre-emptively set connectTriggered so that DetectContext doesn't - // automatically kick off indexing directly when it is called entirely - // bypassing the auto-resume retry mechanism we're trying to test here. - eng.connectTriggered.Store(wctx.ID, true) + // The recreate MUST be queued as overflow — not silently dropped. + eng.pendingMu.Lock() + overflow := eng.pendingOverflow[wsID] + _, hasPendingFiles := eng.pendingFiles[wsID] + eng.pendingMu.Unlock() - // Make the go collection exist so SearchCode gets past the fast-fail check - goColl := CollectionNameFor(wctx.ID, "go") - store := &multiLangStore{ - testStore: testStore{ - existing: map[string]bool{goColl: true}, - }, + if !overflow { + t.Error("Expected pendingOverflow[wsID]=true when recreate=true is requested while job runs") + } + if hasPendingFiles { + t.Error("Expected pendingFiles[wsID] to be cleared when overflow is set") } - eng.SetSearchService(search.NewService(llmProvider, store)) - // First call — auto-resume should fire - _, _ = eng.SearchCode(context.Background(), "dummy.go", "test", 10, false) + // Cleanup: remove the fake job + eng.indexingJobs.Delete(wsID) +} - // resumeAttempts is updated synchronously before the goroutine starts, - // so it is readable immediately after SearchCode returns. - if _, ok := eng.resumeAttempts.Load(wctx.ID); !ok { - t.Error("expected auto-resume to be triggered: resumeAttempts has no entry for workspace") +// TestStartIndexingAsyncRecreateStartsImmediatelyWhenNoJobRunning verifies that +// when recreate=true and no job is running, the job starts immediately (normal path). +func TestStartIndexingAsyncRecreateStartsImmediatelyWhenNoJobRunning(t *testing.T) { + llmProvider := &countingLLM{} + eng := newEngineCountingLLM(&testStore{existing: map[string]bool{}}, llmProvider) + + rootDir := t.TempDir() + eng.SetResolver(resolver.New(resolver.Dependencies{Detector: &mockDirDetector{root: rootDir}})) + + // DetectContext triggers connectTriggered → StartIndexingAsync. Wait for it to finish. + wctx, _ := eng.DetectContext(context.Background(), "dummy.go") + if wctx == nil { + t.Fatalf("Failed to detect context") } - // Second call within cooldown — should NOT trigger another resume attempt. - // We verify this by checking that resumeAttempts still holds the same timestamp. - first, okFirst := eng.resumeAttempts.Load(wctx.ID) - _, _ = eng.SearchCode(context.Background(), "dummy.go", "test", 10, false) - second, okSecond := eng.resumeAttempts.Load(wctx.ID) - - if okFirst && okSecond { - if first.(time.Time) != second.(time.Time) { - t.Error("cooldown violated: auto-resume was triggered again within the 5-minute window") + for i := 0; i < 200; i++ { + if len(eng.ActiveIndexingJobs()) == 0 { + break } - } else { - t.Error("expected resumeAttempts to be populated for both calls") + time.Sleep(time.Millisecond) } - // Stop the progress flusher goroutine before the test returns. - // Without this the goroutine can still write to wsRoot/.ragcode/ while - // t.TempDir() cleanup removes the directory, triggering a race under -race. t.Cleanup(func() { - if eng.progress != nil { - eng.progress.stop() + for i := 0; i < 200; i++ { + if len(eng.ActiveIndexingJobs()) == 0 { + break + } + time.Sleep(time.Millisecond) } + os.RemoveAll(filepath.Join(rootDir, ".ragcode")) }) - // Wait for the background indexing goroutine to drain before test returns. - deadline := time.Now().Add(5 * time.Second) - for time.Now().Before(deadline) { - if len(eng.ActiveIndexingJobs()) == 0 { - break - } - time.Sleep(10 * time.Millisecond) + // No job running — recreate=true should start immediately. + eng.StartIndexingAsync(wctx.Root, wctx.ID, nil, true) + + _, nowRunning := eng.indexingJobs.Load(wctx.ID) + if !nowRunning { + t.Error("Expected job to be running immediately when recreate=true and no job was active") } - if jobs := eng.ActiveIndexingJobs(); len(jobs) != 0 { - t.Fatalf("background indexing jobs still active after 5s wait: %v", jobs) + + // Nothing should be queued in pendingOverflow. + eng.pendingMu.Lock() + overflow := eng.pendingOverflow[wctx.ID] + eng.pendingMu.Unlock() + + if overflow { + t.Error("Expected no pendingOverflow when job started immediately") } } + +// mockDirDetector is like mockDetector but allows specifying the root dir +type mockDirDetector struct { + root string +} + +func (m *mockDirDetector) DetectFromFilePath(_ context.Context, path string) (*contract.WorkspaceCandidate, *contract.ResolveWorkspaceError) { + return &contract.WorkspaceCandidate{ + Root: m.root, + Confidence: 1.0, + }, nil +} diff --git a/internal/service/engine/engine_sticky_test.go b/internal/service/engine/engine_sticky_test.go index 2a8a417..b20f606 100644 --- a/internal/service/engine/engine_sticky_test.go +++ b/internal/service/engine/engine_sticky_test.go @@ -40,6 +40,17 @@ func TestCheckAndReindexOnConnect_ReturnsRoot(t *testing.T) { Detector: &mockDirDetector{root: rootDir}, })) + // Cleanup: wait for background goroutines spawned by DetectContext→connectTriggered + t.Cleanup(func() { + deadline := time.Now().Add(5 * time.Second) + for time.Now().Before(deadline) { + if len(eng.ActiveIndexingJobs()) == 0 { + break + } + time.Sleep(10 * time.Millisecond) + } + }) + result := eng.CheckAndReindexOnConnect("some/file.go") if result != rootDir { t.Errorf("expected root=%q, got %q", rootDir, result) @@ -88,9 +99,7 @@ func TestCheckAndReindexOnConnect_TriggersReindex(t *testing.T) { // Register cleanup immediately — before any Fatal that could skip it. t.Cleanup(func() { - if eng.progress != nil { - eng.progress.stop() - } + deadline := time.Now().Add(5 * time.Second) for time.Now().Before(deadline) { if len(eng.ActiveIndexingJobs()) == 0 { diff --git a/internal/service/engine/index_progress.go b/internal/service/engine/index_progress.go deleted file mode 100644 index a16aff7..0000000 --- a/internal/service/engine/index_progress.go +++ /dev/null @@ -1,393 +0,0 @@ -package engine - -import ( - "encoding/json" - "os" - "path/filepath" - "sync" - "time" - - "github.com/doITmagic/rag-code-mcp/internal/logger" -) - -const indexStatusFile = "index_status.json" - -type IndexLanguageProgress struct { - TotalFiles int `json:"total_files"` - DoneFiles int `json:"done_files"` - Percent int `json:"percent"` - UpdatedAt time.Time `json:"updated_at"` -} - -type IndexProgress struct { - JobID string `json:"job_id"` - WorkspaceID string `json:"workspace_id"` - WorkspaceRoot string `json:"workspace_root"` - State string `json:"state"` // starting|running|completed|failed - GlobalPercent int `json:"global_percent"` // weighted across all languages - CurrentLanguage string `json:"current_language"` // language currently being indexed - StartedAt time.Time `json:"started_at"` - CompletedAt *time.Time `json:"completed_at,omitempty"` - Languages map[string]IndexLanguageProgress `json:"languages,omitempty"` - UpdatedAt time.Time `json:"updated_at"` - Error string `json:"error,omitempty"` -} - -// calcGlobalPercent computes GlobalPercent as sum(done) / sum(total) * 100 -// across all languages. Languages are pre-registered with accurate TotalFiles -// before indexing begins (via preRegister), so this sum is monotonically -// non-decreasing. -// Must be called with s.mu held. -func calcGlobalPercent(langs map[string]IndexLanguageProgress) int { - var totalDone, totalFiles int - for _, lp := range langs { - totalDone += lp.DoneFiles - totalFiles += lp.TotalFiles - } - if totalFiles == 0 { - return 0 - } - pct := totalDone * 100 / totalFiles - if pct > 100 { - return 100 - } - return pct -} - -type progressStore struct { - mu sync.Mutex - jobs map[string]*IndexProgress - flushCh chan struct{} // debounced disk flush signal - done chan struct{} // closed by stop() to shut down runFlusher -} - -func newProgressStore() *progressStore { - ps := &progressStore{ - jobs: map[string]*IndexProgress{}, - flushCh: make(chan struct{}, 1), - done: make(chan struct{}), - } - go ps.runFlusher() - return ps -} - -// stop shuts down the background flusher goroutine. Must be called when the -// progressStore is no longer needed (e.g. in test cleanup or Engine.Close). -// -//nolint:unused // used by engine_searchcode_test.go -func (s *progressStore) stop() { - select { - case <-s.done: - // already stopped - default: - close(s.done) - } -} - -// runFlusher drains flushCh and persists all jobs debounced at 500ms. -// It snapshots job data under the mutex (fast), then writes to disk outside -// the lock so that update()/get() are not blocked during disk I/O. -// It exits when stop() is called (done channel is closed). -func (s *progressStore) runFlusher() { - for { - select { - case <-s.done: - return - case _, ok := <-s.flushCh: - if !ok { - return - } - } - // Debounce: wait 500ms then drain any extra signals. - select { - case <-s.done: - return - case <-time.After(500 * time.Millisecond): - } - for len(s.flushCh) > 0 { - <-s.flushCh - } - // Snapshot under the lock (fast copy), then release before disk I/O. - s.mu.Lock() - snapshots := make([]IndexProgress, 0, len(s.jobs)) - for _, p := range s.jobs { - if p.WorkspaceRoot == "" { - continue - } - cp := *p - if p.Languages != nil { - cp.Languages = make(map[string]IndexLanguageProgress, len(p.Languages)) - for k, v := range p.Languages { - cp.Languages[k] = v - } - } - if p.CompletedAt != nil { - t := *p.CompletedAt - cp.CompletedAt = &t - } - snapshots = append(snapshots, cp) - } - s.mu.Unlock() - // Write snapshots to disk without holding the lock. - for i := range snapshots { - cp := snapshots[i] - saveIndexStatus(cp.WorkspaceRoot, &cp) - } - } -} - -// triggerFlush signals the flusher non-blockingly. -func (s *progressStore) triggerFlush() { - select { - case s.flushCh <- struct{}{}: - default: // already pending — no need to queue another - } -} - -// preRegister sets the expected TotalFiles for a language before indexing begins. -// This ensures calcGlobalPercent has an accurate denominator from the start, -// making GlobalPercent monotonically non-decreasing throughout indexing. -// It is a no-op if the workspace job has not been started yet. -func (s *progressStore) preRegister(workspaceID, lang string, totalFiles int, now time.Time) { - s.mu.Lock() - - p, ok := s.jobs[workspaceID] - if !ok { - s.mu.Unlock() - return - } - if p.Languages == nil { - p.Languages = make(map[string]IndexLanguageProgress) - } - // Update TotalFiles for this language. If carried over from a previous - // completed run, preserve DoneFiles so progress is cumulative. - if existing, ok := p.Languages[lang]; ok { - existing.TotalFiles = totalFiles - // Clamp DoneFiles to new total (files may have been deleted) - if existing.DoneFiles > totalFiles { - existing.DoneFiles = totalFiles - } - if totalFiles > 0 { - existing.Percent = existing.DoneFiles * 100 / totalFiles - } - p.Languages[lang] = existing - } else { - p.Languages[lang] = IndexLanguageProgress{ - TotalFiles: totalFiles, - DoneFiles: 0, - Percent: 0, - } - } - p.GlobalPercent = calcGlobalPercent(p.Languages) - p.UpdatedAt = now - s.mu.Unlock() - // Flush so the updated totals are persisted promptly. - s.triggerFlush() -} - -func (s *progressStore) get(workspaceID string, workspaceRoot string) *IndexProgress { - s.mu.Lock() - if p, ok := s.jobs[workspaceID]; ok { - cp := *p - if p.Languages != nil { - cp.Languages = make(map[string]IndexLanguageProgress, len(p.Languages)) - for k, v := range p.Languages { - cp.Languages[k] = v - } - } - if p.CompletedAt != nil { - t := *p.CompletedAt - cp.CompletedAt = &t - } - s.mu.Unlock() - return &cp - } - s.mu.Unlock() - - // Not in memory (e.g. after restart) — try loading from disk outside the lock. - if workspaceRoot != "" { - if p := loadIndexStatus(workspaceRoot); p != nil && p.WorkspaceID == workspaceID { - s.mu.Lock() - if _, exists := s.jobs[workspaceID]; !exists { - s.jobs[workspaceID] = p // cache in memory for subsequent calls - } - s.mu.Unlock() - - cp := *p - if p.Languages != nil { - cp.Languages = make(map[string]IndexLanguageProgress, len(p.Languages)) - for k, v := range p.Languages { - cp.Languages[k] = v - } - } - if p.CompletedAt != nil { - t := *p.CompletedAt - cp.CompletedAt = &t - } - return &cp - } - } - return nil -} - -func (s *progressStore) start(workspaceID, workspaceRoot, jobID string, now time.Time) { - s.mu.Lock() - - // Carry over language progress from a previously completed job so that - // incremental runs start at ~97% (228/233) instead of 0% (0/233). - // Without this, every incremental index resets the visible progress to 0. - var prevLangs map[string]IndexLanguageProgress - if prev, ok := s.jobs[workspaceID]; ok && prev.State == "completed" && len(prev.Languages) > 0 { - prevLangs = make(map[string]IndexLanguageProgress, len(prev.Languages)) - for k, v := range prev.Languages { - prevLangs[k] = v - } - } - - p := &IndexProgress{ - JobID: jobID, - WorkspaceID: workspaceID, - WorkspaceRoot: workspaceRoot, - State: "starting", - StartedAt: now, - UpdatedAt: now, - Languages: map[string]IndexLanguageProgress{}, - } - - // Restore previous done counts so progress is cumulative. - if prevLangs != nil { - p.Languages = prevLangs - p.GlobalPercent = calcGlobalPercent(p.Languages) - } - - s.jobs[workspaceID] = p - s.mu.Unlock() - // Flush immediately so index_status.json exists from the moment indexing starts. - // Without this, a crash before the first update() would leave no status file. - s.triggerFlush() -} - -func (s *progressStore) update(workspaceID, lang string, done, total int, now time.Time) { - s.mu.Lock() - p, ok := s.jobs[workspaceID] - if !ok { - p = &IndexProgress{ - WorkspaceID: workspaceID, - State: "running", - StartedAt: now, - UpdatedAt: now, - Languages: map[string]IndexLanguageProgress{}, - } - s.jobs[workspaceID] = p - } - if p.State == "starting" { - p.State = "running" - } - pct := 0 - if total > 0 { - pct = done * 100 / total - if pct < 0 { - pct = 0 - } - if pct > 100 { - pct = 100 - } - } - p.Languages[lang] = IndexLanguageProgress{ - TotalFiles: total, - DoneFiles: done, - Percent: pct, - UpdatedAt: now, - } - p.CurrentLanguage = lang - p.GlobalPercent = calcGlobalPercent(p.Languages) - p.UpdatedAt = now - s.mu.Unlock() - s.triggerFlush() -} - -func (s *progressStore) complete(workspaceID, workspaceRoot string, now time.Time) { - s.mu.Lock() - p, ok := s.jobs[workspaceID] - if !ok { - s.mu.Unlock() - return - } - p.State = "completed" - p.GlobalPercent = 100 - p.UpdatedAt = now - p.CompletedAt = &now - cp := *p - if p.Languages != nil { - cp.Languages = make(map[string]IndexLanguageProgress, len(p.Languages)) - for k, v := range p.Languages { - cp.Languages[k] = v - } - } - if p.CompletedAt != nil { - t := *p.CompletedAt - cp.CompletedAt = &t - } - s.mu.Unlock() - // complete() is a one-time event — write synchronously so it persists even if process exits. - saveIndexStatus(workspaceRoot, &cp) -} - -func (s *progressStore) fail(workspaceID, workspaceRoot string, now time.Time, errMsg string) { - s.mu.Lock() - p, ok := s.jobs[workspaceID] - if !ok { - s.mu.Unlock() - return - } - p.State = "failed" - p.Error = errMsg - p.UpdatedAt = now - cp := *p - if p.Languages != nil { - cp.Languages = make(map[string]IndexLanguageProgress, len(p.Languages)) - for k, v := range p.Languages { - cp.Languages[k] = v - } - } - s.mu.Unlock() - // fail() is a one-time event — write synchronously. - saveIndexStatus(workspaceRoot, &cp) -} - -// saveIndexStatus writes the IndexProgress snapshot to {workspaceRoot}/.ragcode/index_status.json. -func saveIndexStatus(workspaceRoot string, p *IndexProgress) { - if workspaceRoot == "" { - return - } - dir := filepath.Join(workspaceRoot, ".ragcode") - if err := os.MkdirAll(dir, 0o755); err != nil { - logger.Instance.Warn("index_status: cannot create .ragcode dir: %v", err) - return - } - path := filepath.Join(dir, indexStatusFile) - b, err := json.MarshalIndent(p, "", " ") - if err != nil { - logger.Instance.Warn("index_status: marshal failed: %v", err) - return - } - if err := os.WriteFile(path, b, 0o644); err != nil { - logger.Instance.Warn("index_status: write failed for %s: %v", path, err) - } -} - -// loadIndexStatus reads the last IndexProgress from {workspaceRoot}/.ragcode/index_status.json. -// Returns nil if the file doesn't exist or can't be parsed. -func loadIndexStatus(workspaceRoot string) *IndexProgress { - path := filepath.Join(workspaceRoot, ".ragcode", indexStatusFile) - b, err := os.ReadFile(path) - if err != nil { - return nil // file doesn't exist yet — first run - } - var p IndexProgress - if err := json.Unmarshal(b, &p); err != nil { - logger.Instance.Warn("index_status: parse failed for %s: %v", path, err) - return nil - } - return &p -} diff --git a/internal/service/engine/index_progress_test.go b/internal/service/engine/index_progress_test.go deleted file mode 100644 index db0283c..0000000 --- a/internal/service/engine/index_progress_test.go +++ /dev/null @@ -1,273 +0,0 @@ -package engine - -import ( - "os" - "path/filepath" - "testing" - "time" -) - -// TestProgressStoreDeepCopy verifies that get() returns an independent copy -// of the Languages map, so callers cannot corrupt the cached snapshot. -func TestProgressStoreDeepCopy(t *testing.T) { - store := newProgressStore() - now := time.Now() - wsRoot := t.TempDir() - - store.start("ws1", wsRoot, "job1", now) - store.update("ws1", "go", 5, 10, now) - - p := store.get("ws1", "") - if p == nil { - t.Fatal("expected progress, got nil") - } - - // Mutate the returned copy - p.Languages["go"] = IndexLanguageProgress{TotalFiles: 999} - - // Re-fetch and ensure the store was not corrupted - p2 := store.get("ws1", "") - if p2.Languages["go"].TotalFiles == 999 { - t.Error("deep copy failed: mutation of returned Languages map corrupted the store") - } -} - -// TestProgressStoreDiskRoundTrip verifies save+load round-trip and deep copy -// for the disk-loaded branch of get(). -func TestProgressStoreDiskRoundTrip(t *testing.T) { - dir := t.TempDir() - wsRoot := filepath.Join(dir, "workspace") - if err := os.MkdirAll(wsRoot, 0o755); err != nil { - t.Fatal(err) - } - - store := newProgressStore() - now := time.Now().UTC().Truncate(time.Second) - - store.start("ws-disk", wsRoot, "job-disk", now) - store.update("ws-disk", "go", 3, 10, now) - store.complete("ws-disk", wsRoot, now) - - // Create a fresh store (simulating restart) and load from disk - freshStore := newProgressStore() - p := freshStore.get("ws-disk", wsRoot) - if p == nil { - t.Fatal("expected progress loaded from disk, got nil") - } - if p.State != "completed" { - t.Errorf("expected state 'completed', got %q", p.State) - } - if _, ok := p.Languages["go"]; !ok { - t.Error("expected 'go' language entry in loaded progress") - } - - // Verify deep copy from disk branch—mutating should not affect cache - p.Languages["go"] = IndexLanguageProgress{TotalFiles: 777} - p2 := freshStore.get("ws-disk", wsRoot) - if p2.Languages["go"].TotalFiles == 777 { - t.Error("deep copy failed for disk-loaded branch: mutation corrupted cached entry") - } -} - -// TestProgressStoreGlobalPercent verifies that GlobalPercent is calculated correctly -// as sum(done_all_langs) / sum(total_all_langs) * 100. -func TestProgressStoreGlobalPercent(t *testing.T) { - store := newProgressStore() - now := time.Now() - wsRoot := t.TempDir() - - store.start("ws1", wsRoot, "job1", now) - store.update("ws1", "md", 120, 120, now) // md: 100% (120/120) - store.update("ws1", "go", 234, 500, now) // go: 46% (234/500) - - p := store.get("ws1", "") - if p == nil { - t.Fatal("expected progress, got nil") - } - - // global = (120 + 234) / (120 + 500) * 100 = 354/620*100 = 57% - want := 57 - if p.GlobalPercent != want { - t.Errorf("expected GlobalPercent=%d, got %d", want, p.GlobalPercent) - } -} - -// TestProgressStoreCurrentLanguage verifies that CurrentLanguage is updated on each update(). -func TestProgressStoreCurrentLanguage(t *testing.T) { - store := newProgressStore() - now := time.Now() - wsRoot := t.TempDir() - - store.start("ws2", wsRoot, "job2", now) - store.update("ws2", "md", 10, 100, now) - - p := store.get("ws2", "") - if p.CurrentLanguage != "md" { - t.Errorf("expected CurrentLanguage='md', got %q", p.CurrentLanguage) - } - - store.update("ws2", "go", 5, 200, now) - p = store.get("ws2", "") - if p.CurrentLanguage != "go" { - t.Errorf("expected CurrentLanguage='go', got %q", p.CurrentLanguage) - } -} - -// TestProgressStoreTwoWorkspacesIsolated verifies that two workspaces updating -// the store concurrently do not corrupt each other's GlobalPercent or Languages. -// This covers the real production scenario: wsA and wsB indexing simultaneously -// (allowed since indexingJobs dedup is per workspace ID, not global). -func TestProgressStoreTwoWorkspacesIsolated(t *testing.T) { - store := newProgressStore() - now := time.Now() - rootA := t.TempDir() - rootB := t.TempDir() - - // WsA: go 50/100, python 0/200 → global = (50+0)/(100+200)*100 = 16% - // python is in the denominator because it was added via update() with 0 processed/200 total - store.start("wsA", rootA, "jobA", now) - store.update("wsA", "go", 50, 100, now) - store.update("wsA", "python", 0, 200, now) - - // WsB: go 200/200 → global = 200/200 = 100% - store.start("wsB", rootB, "jobB", now) - store.update("wsB", "go", 200, 200, now) - - pA := store.get("wsA", "") - pB := store.get("wsB", "") - - if pA == nil || pB == nil { - t.Fatal("expected both workspaces to have progress, got nil") - } - - // WsA: global = (50+0)/(100+200)*100 = 16 - wantA := 16 - if pA.GlobalPercent != wantA { - t.Errorf("wsA: expected GlobalPercent=%d, got %d", wantA, pA.GlobalPercent) - } - - // WsB: global = 200/200*100 = 100 - wantB := 100 - if pB.GlobalPercent != wantB { - t.Errorf("wsB: expected GlobalPercent=%d, got %d", wantB, pB.GlobalPercent) - } - - // WsA should have python (was added with 0/200), wsB should not - if _, exists := pA.Languages["python"]; !exists { - t.Error("wsA should have python language entry (added with 0/200)") - } - if _, exists := pB.Languages["python"]; exists { - t.Error("wsB should not have python language (was only set on wsA)") - } - - // Verify wsA progress is not polluted by wsB's go collection - if pA.Languages["go"].TotalFiles == 200 { - t.Error("wsA.go should have TotalFiles=100, not 200 (wsB pollution)") - } -} - -// TestProgressStoreTwoWorkspacesConcurrent verifies no data races when two -// workspaces update the store from separate goroutines simultaneously. -// Run with -race to detect races. -func TestProgressStoreTwoWorkspacesConcurrent(t *testing.T) { - store := newProgressStore() - now := time.Now() - - store.start("cA", t.TempDir(), "jA", now) - store.start("cB", t.TempDir(), "jB", now) - - const iters = 50 - done := make(chan struct{}, 2) - - go func() { - for i := 0; i < iters; i++ { - store.update("cA", "go", i, iters, time.Now()) - } - done <- struct{}{} - }() - - go func() { - for i := 0; i < iters; i++ { - store.update("cB", "python", i, iters, time.Now()) - } - done <- struct{}{} - }() - - <-done - <-done - - pA := store.get("cA", "") - pB := store.get("cB", "") - - if pA == nil || pB == nil { - t.Fatal("expected progress for both workspaces after concurrent updates") - } - // After 50 updates each, final state should be 49/50 for each - if pA.Languages["go"].DoneFiles != iters-1 { - t.Errorf("wsA go: expected done=%d, got %d", iters-1, pA.Languages["go"].DoneFiles) - } - if pB.Languages["python"].DoneFiles != iters-1 { - t.Errorf("wsB python: expected done=%d, got %d", iters-1, pB.Languages["python"].DoneFiles) - } -} - -// TestProgressStoreIncrementalPreservesProgress verifies that when a completed -// workspace is re-indexed incrementally, the progress carries over from the -// previous run instead of resetting to 0%. -// -// Scenario: Go had 228/228 (100%). New run starts, preRegister sets total=233. -// Expected: progress shows 228/233 ≈ 97%, NOT 0/233 = 0%. -func TestProgressStoreIncrementalPreservesProgress(t *testing.T) { - store := newProgressStore() - now := time.Now() - wsRoot := t.TempDir() - - // Phase 1: complete a full indexing run - store.start("ws1", wsRoot, "job1", now) - store.update("ws1", "go", 228, 228, now) - store.update("ws1", "docs", 809, 809, now) - store.complete("ws1", wsRoot, now) - - p := store.get("ws1", "") - if p == nil || p.State != "completed" || p.GlobalPercent != 100 { - t.Fatalf("expected completed at 100%%, got state=%q pct=%d", p.State, p.GlobalPercent) - } - - // Phase 2: start a new incremental indexing run (e.g., auto-trigger on connect) - store.start("ws1", wsRoot, "job2", now) - - // preRegister with updated totals (5 new Go files appeared after git pull) - store.preRegister("ws1", "go", 233, now) - store.preRegister("ws1", "docs", 809, now) - - p2 := store.get("ws1", "") - if p2 == nil { - t.Fatal("expected progress after restart, got nil") - } - - // Go: 228 done out of 233 total = 97% - goLang := p2.Languages["go"] - if goLang.DoneFiles != 228 { - t.Errorf("go: expected DoneFiles=228 (carried over), got %d", goLang.DoneFiles) - } - if goLang.TotalFiles != 233 { - t.Errorf("go: expected TotalFiles=233 (from preRegister), got %d", goLang.TotalFiles) - } - - // Docs: 809/809 unchanged - docsLang := p2.Languages["docs"] - if docsLang.DoneFiles != 809 { - t.Errorf("docs: expected DoneFiles=809 (carried over), got %d", docsLang.DoneFiles) - } - - // GlobalPercent = (228+809) / (233+809) * 100 = 1037/1042*100 = 99% - wantGlobal := 99 - if p2.GlobalPercent != wantGlobal { - t.Errorf("expected GlobalPercent=%d, got %d", wantGlobal, p2.GlobalPercent) - } - - // Verify state is "starting", not "completed" - if p2.State != "starting" { - t.Errorf("expected state='starting', got %q", p2.State) - } -} diff --git a/internal/service/tools/call_hierarchy.go b/internal/service/tools/call_hierarchy.go index e8658af..14ee722 100644 --- a/internal/service/tools/call_hierarchy.go +++ b/internal/service/tools/call_hierarchy.go @@ -107,7 +107,7 @@ func (t *CallHierarchyTool) Execute(ctx context.Context, args map[string]interfa return resp.JSON() } - idx := t.engine.GetIndexProgress(wctx.ID, wctx.Root) + idx := t.engine.GetIndexStatus(wctx.Root) visited := make(map[string]bool) @@ -128,7 +128,7 @@ func (t *CallHierarchyTool) Execute(ctx context.Context, args map[string]interfa resp := ToolResponse{ Status: "indexing_required", Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete call hierarchy results.", wctx.Root), - Context: ContextFromWorkspaceWithProgress(wctx, t.engine), + Context: ContextFromWorkspaceWithStatus(wctx, t.engine), } if idx != nil { resp.Status = "indexing_in_progress" @@ -165,10 +165,10 @@ func (t *CallHierarchyTool) Execute(ctx context.Context, args map[string]interfa Message: sb.String(), Data: rootNode, Context: ContextMetadata{ - WorkspaceRoot: wctx.Root, - DetectionSource: wctx.DetectionSource, - Telemetry: telemetry.CalculateSavings(baselineBytes, actualBytes), - IndexingProgress: BuildIndexingProgress(t.engine, wctx.ID, wctx.Root), + WorkspaceRoot: wctx.Root, + DetectionSource: wctx.DetectionSource, + Telemetry: telemetry.CalculateSavings(baselineBytes, actualBytes), + IndexingStatus: idx, }, } return resp.JSON() diff --git a/internal/service/tools/evaluate_ragcode.go b/internal/service/tools/evaluate_ragcode.go index fd562e3..1d8cf2c 100644 --- a/internal/service/tools/evaluate_ragcode.go +++ b/internal/service/tools/evaluate_ragcode.go @@ -126,21 +126,11 @@ func (t *EvaluateRagCodeTool) Execute(ctx context.Context, args map[string]inter } } - var wctxID, wctxRoot string - if wctx != nil { - wctxID = wctx.ID - wctxRoot = wctx.Root - } - response := ToolResponse{ Status: "success", Message: b.String(), Data: data, - Context: ContextMetadata{ - WorkspaceRoot: workspaceRoot, - DetectionSource: source, - IndexingProgress: BuildIndexingProgress(t.engine, wctxID, wctxRoot), - }, + Context: ContextFromWorkspaceWithStatus(wctx, t.engine), } return response.JSON() diff --git a/internal/service/tools/find_usages.go b/internal/service/tools/find_usages.go index afa0193..d996de9 100644 --- a/internal/service/tools/find_usages.go +++ b/internal/service/tools/find_usages.go @@ -90,7 +90,7 @@ func (t *FindUsagesTool) Execute(ctx context.Context, args map[string]interface{ "relations[].target_name": symbolName, } - idx := t.engine.GetIndexProgress(wctx.ID, wctx.Root) + idx := t.engine.GetIndexStatus(wctx.Root) allResults, err := t.engine.ExactSearchPolyglot(ctx, wctx.ID, filter, 100) if err != nil { var noCollections *engine.ErrNoCollectionsFound @@ -98,7 +98,7 @@ func (t *FindUsagesTool) Execute(ctx context.Context, args map[string]interface{ resp := ToolResponse{ Status: "indexing_required", Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete results.", wctx.Root), - Context: ContextFromWorkspaceWithProgress(wctx, t.engine), + Context: ContextFromWorkspaceWithStatus(wctx, t.engine), } if idx != nil { resp.Status = "indexing_in_progress" @@ -113,7 +113,7 @@ func (t *FindUsagesTool) Execute(ctx context.Context, args map[string]interface{ resp := ToolResponse{ Status: "success", Message: fmt.Sprintf("No usages found for symbol '%s' based on Code Graph relations.", symbolName), - Context: ContextFromWorkspaceWithProgress(wctx, t.engine), + Context: ContextFromWorkspaceWithStatus(wctx, t.engine), } return resp.JSON() } @@ -198,7 +198,7 @@ func (t *FindUsagesTool) Execute(ctx context.Context, args map[string]interface{ resp := ToolResponse{ Status: "success", Message: fmt.Sprintf("No explicit usages found for symbol '%s'", symbolName), - Context: ContextFromWorkspaceWithProgress(wctx, t.engine), + Context: ContextFromWorkspaceWithStatus(wctx, t.engine), } return resp.JSON() } @@ -253,10 +253,10 @@ func (t *FindUsagesTool) Execute(ctx context.Context, args map[string]interface{ Message: "Found symbol usages\n\n" + response.String(), Data: usages, Context: ContextMetadata{ - WorkspaceRoot: wctx.Root, - DetectionSource: wctx.DetectionSource, - Telemetry: telemetry.CalculateSavings(baselineBytes, actualBytes), - IndexingProgress: BuildIndexingProgress(t.engine, wctx.ID, wctx.Root), + WorkspaceRoot: wctx.Root, + DetectionSource: wctx.DetectionSource, + Telemetry: telemetry.CalculateSavings(baselineBytes, actualBytes), + IndexingStatus: idx, }, } return resp.JSON() diff --git a/internal/service/tools/index_workspace.go b/internal/service/tools/index_workspace.go index 7794600..1ce1606 100644 --- a/internal/service/tools/index_workspace.go +++ b/internal/service/tools/index_workspace.go @@ -8,6 +8,7 @@ import ( "github.com/doITmagic/rag-code-mcp/internal/logger" "github.com/doITmagic/rag-code-mcp/internal/service/engine" + "github.com/doITmagic/rag-code-mcp/pkg/indexer" "github.com/modelcontextprotocol/go-sdk/mcp" ) @@ -109,9 +110,9 @@ func (t *IndexWorkspaceTool) Execute(ctx context.Context, params map[string]inte } } - // Include current indexing progress so the AI knows how many files are left - if prog := BuildIndexingProgress(t.engine, wctx.ID, wctx.Root); prog != nil { - data["indexing_progress"] = prog + // Include current indexing status + if s := indexer.LoadIndexStatus(wctx.Root); s != nil { + data["indexing_progress"] = s } response.Data = data diff --git a/internal/service/tools/list_package_exports.go b/internal/service/tools/list_package_exports.go index d6bbf91..e45a0c4 100644 --- a/internal/service/tools/list_package_exports.go +++ b/internal/service/tools/list_package_exports.go @@ -94,11 +94,18 @@ func (t *ListPackageExportsTool) Execute(ctx context.Context, args map[string]in // Fan-out to all language collections in parallel — zero embedding // Filter only by package; is_public check happens in the results loop // (with graceful fallback for older index entries that predate the is_public field). + // + // The index stores the short package name (e.g. "indexer"), not the full Go + // import path (e.g. "github.com/doITmagic/rag-code-mcp/pkg/indexer"). + // Normalize by taking the last path segment so both forms work. + filterPackage := packageName + if idx := strings.LastIndex(packageName, "/"); idx >= 0 { + filterPackage = packageName[idx+1:] + } filter := map[string]interface{}{ - "package": packageName, + "package": filterPackage, } - idx := t.engine.GetIndexProgress(wctx.ID, wctx.Root) allResults, err := t.engine.ExactSearchPolyglot(ctx, wctx.ID, filter, 1000) if err != nil { var noCollections *engine.ErrNoCollectionsFound @@ -106,11 +113,7 @@ func (t *ListPackageExportsTool) Execute(ctx context.Context, args map[string]in resp := ToolResponse{ Status: "indexing_required", Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete results.", wctx.Root), - Context: ContextFromWorkspaceWithProgress(wctx, t.engine), - } - if idx != nil { - resp.Status = "indexing_in_progress" - resp.Data = map[string]any{"indexing": idx} + Context: ContextFromWorkspaceWithStatus(wctx, t.engine), } return resp.JSON() } @@ -123,7 +126,7 @@ func (t *ListPackageExportsTool) Execute(ctx context.Context, args map[string]in resp := ToolResponse{ Status: "success", Message: fmt.Sprintf("No exported symbols found in package '%s'", packageName), - Context: ContextFromWorkspaceWithProgress(wctx, t.engine), + Context: ContextFromWorkspaceWithStatus(wctx, t.engine), } return resp.JSON() } @@ -203,7 +206,7 @@ func (t *ListPackageExportsTool) Execute(ctx context.Context, args map[string]in resp := ToolResponse{ Status: "success", Message: fmt.Sprintf("No exported symbols found in package '%s' (after filtering)", packageName), - Context: ContextFromWorkspaceWithProgress(wctx, t.engine), + Context: ContextFromWorkspaceWithStatus(wctx, t.engine), } return resp.JSON() } @@ -252,15 +255,12 @@ func (t *ListPackageExportsTool) Execute(ctx context.Context, args map[string]in Message: "Found package exports\n\n" + response.String(), Data: exports, Context: ContextMetadata{ - WorkspaceRoot: wctx.Root, - DetectionSource: wctx.DetectionSource, - Telemetry: telemetry.CalculateSavings(baselineBytes, actualBytes), - IndexingProgress: BuildIndexingProgress(t.engine, wctx.ID, wctx.Root), + WorkspaceRoot: wctx.Root, + DetectionSource: wctx.DetectionSource, + Telemetry: telemetry.CalculateSavings(baselineBytes, actualBytes), + IndexingStatus: t.engine.GetIndexStatus(wctx.Root), }, } - if idx != nil && (idx.State == "starting" || idx.State == "running") { - resp.Data = map[string]any{"exports": exports, "indexing": idx} - } return resp.JSON() } diff --git a/internal/service/tools/read_file_context.go b/internal/service/tools/read_file_context.go index 772c2d3..4fb1780 100644 --- a/internal/service/tools/read_file_context.go +++ b/internal/service/tools/read_file_context.go @@ -311,11 +311,7 @@ func (t *ReadFileContextTool) buildResponse(wctx *engine.WorkspaceContext, res C Message: fmt.Sprintf("Extracted %s context for lines %d-%d from %s", res.ContextType, res.StartLine, res.EndLine, res.FilePath), } if wctx != nil { - resp.Context = ContextMetadata{ - WorkspaceRoot: wctx.Root, - DetectionSource: wctx.DetectionSource, - IndexingProgress: BuildIndexingProgress(t.engine, wctx.ID, wctx.Root), - } + resp.Context = ContextFromWorkspaceWithStatus(wctx, t.engine) } baselineBytes := int64(0) diff --git a/internal/service/tools/response.go b/internal/service/tools/response.go index 3712e7c..11f2c12 100644 --- a/internal/service/tools/response.go +++ b/internal/service/tools/response.go @@ -3,10 +3,9 @@ package tools import ( "encoding/json" "fmt" - "strings" - "time" "github.com/doITmagic/rag-code-mcp/internal/service/engine" + "github.com/doITmagic/rag-code-mcp/pkg/indexer" "github.com/doITmagic/rag-code-mcp/pkg/telemetry" ) @@ -20,85 +19,15 @@ type ToolResponse struct { Data interface{} `json:"data,omitempty"` // Tool-specific output data } -// IndexingProgressSummary is a compact view of the current indexing job. -type IndexingProgressSummary struct { - State string `json:"state"` // starting|running|completed|failed - Elapsed string `json:"elapsed"` // e.g. "1m23s" - IndexAge string `json:"index_age,omitempty"` // e.g. "3 minutes ago" — populated when indexing is completed - Error string `json:"error,omitempty"` // user-facing error message when state="failed" - Languages map[string]LangProgressItem `json:"languages,omitempty"` // per-language stats -} - -// LangProgressItem holds progress stats for a single language. -type LangProgressItem struct { - DoneFiles int `json:"done_files"` - TotalFiles int `json:"total_files"` - Percent int `json:"percent"` -} - // ContextMetadata provides info about which workspace and sources were used. type ContextMetadata struct { - WorkspaceRoot string `json:"workspace_root,omitempty"` - DetectionSource string `json:"detection_source,omitempty"` // "explicit_file_path", "registry_fallback", "cwd_detection" - Language string `json:"language,omitempty"` - Collection string `json:"collection,omitempty"` - Telemetry *telemetry.Savings `json:"telemetry,omitempty"` - IndexingProgress *IndexingProgressSummary `json:"indexing_progress,omitempty"` // present when indexing is in progress or just completed - SessionMetrics *telemetry.AggregatedMetrics `json:"session_metrics,omitempty"` // cumulative search stats from .ragcode/search_metrics.jsonl -} - -// BuildIndexingProgress reads the live progress for a workspace and returns a summary. -// workspaceRoot is used to load persisted status from disc when not in memory (e.g. after restart). -// Returns nil if no indexing job has been tracked for this workspace. -func BuildIndexingProgress(eng *engine.Engine, workspaceID, workspaceRoot string) *IndexingProgressSummary { - if eng == nil { - return nil - } - prog := eng.GetIndexProgress(workspaceID, workspaceRoot) - if prog == nil { - return nil - } - elapsed := time.Since(prog.StartedAt).Round(time.Second).String() - var indexAge string - if prog.CompletedAt != nil { - elapsed = prog.CompletedAt.Sub(prog.StartedAt).Round(time.Second).String() - indexAge = formatAge(time.Since(*prog.CompletedAt)) - } - langs := make(map[string]LangProgressItem, len(prog.Languages)) - for lang, lp := range prog.Languages { - langs[lang] = LangProgressItem{ - DoneFiles: lp.DoneFiles, - TotalFiles: lp.TotalFiles, - Percent: lp.Percent, - } - } - summary := &IndexingProgressSummary{ - State: prog.State, - Elapsed: elapsed, - IndexAge: indexAge, - Languages: langs, - } - - // Surface the error message so users know WHY indexing failed - if prog.State == "failed" && prog.Error != "" { - summary.Error = prog.Error - } - - return summary -} - -// formatAge returns a human-readable string like "just now", "5 minutes ago", "2 hours ago". -func formatAge(d time.Duration) string { - switch { - case d < 2*time.Minute: - return "just now" - case d < time.Hour: - return fmt.Sprintf("%d minutes ago", int(d.Minutes())) - case d < 24*time.Hour: - return fmt.Sprintf("%d hours ago", int(d.Hours())) - default: - return fmt.Sprintf("%d days ago", int(d.Hours()/24)) - } + WorkspaceRoot string `json:"workspace_root,omitempty"` + DetectionSource string `json:"detection_source,omitempty"` // "explicit_file_path", "registry_fallback", "cwd_detection" + Language string `json:"language,omitempty"` + Collection string `json:"collection,omitempty"` + Telemetry *telemetry.Savings `json:"telemetry,omitempty"` + IndexingStatus *indexer.IndexStatus `json:"indexing_progress,omitempty"` // JSON tag kept as "indexing_progress" for backward compatibility; present when indexing is in progress or just completed + SessionMetrics *telemetry.AggregatedMetrics `json:"session_metrics,omitempty"` // cumulative search stats from .ragcode/search_metrics.jsonl } // JSON returns the marshaled JSON string of the response. @@ -117,28 +46,6 @@ func (r *ToolResponse) SetFallbackWarning(inferred bool) { } } -// buildIndexingMessage constructs a rich message for indexing-in-progress or indexing-started -// scenarios that includes progress info and explicit retry guidance for agents. -func buildIndexingMessage(emoji, workspace string, progress *IndexingProgressSummary) string { - var sb strings.Builder - sb.WriteString(fmt.Sprintf("%s Workspace '%s' is currently being indexed.", emoji, workspace)) - - if progress != nil { - if progress.Elapsed != "" { - sb.WriteString(fmt.Sprintf(" Elapsed: %s.", progress.Elapsed)) - } - // Summarize per-language progress - for lang, lp := range progress.Languages { - if lp.TotalFiles > 0 { - sb.WriteString(fmt.Sprintf(" [%s: %d/%d files (%d%%)]", lang, lp.DoneFiles, lp.TotalFiles, lp.Percent)) - } - } - } - - sb.WriteString(" Please wait a moment and try your search again.") - return sb.String() -} - // ContextFromWorkspace builds a ContextMetadata from a resolved WorkspaceContext. func ContextFromWorkspace(wctx *engine.WorkspaceContext) ContextMetadata { if wctx == nil { @@ -150,9 +57,11 @@ func ContextFromWorkspace(wctx *engine.WorkspaceContext) ContextMetadata { } } -// ContextFromWorkspaceWithProgress builds ContextMetadata and attaches live indexing progress. -func ContextFromWorkspaceWithProgress(wctx *engine.WorkspaceContext, eng *engine.Engine) ContextMetadata { +// ContextFromWorkspaceWithStatus builds ContextMetadata and attaches indexing status from disk. +func ContextFromWorkspaceWithStatus(wctx *engine.WorkspaceContext, eng *engine.Engine) ContextMetadata { ctx := ContextFromWorkspace(wctx) - ctx.IndexingProgress = BuildIndexingProgress(eng, wctx.ID, wctx.Root) + if eng != nil && wctx != nil { + ctx.IndexingStatus = eng.GetIndexStatus(wctx.Root) + } return ctx } diff --git a/internal/service/tools/skills.go b/internal/service/tools/skills.go index 81f4b1c..4251c38 100644 --- a/internal/service/tools/skills.go +++ b/internal/service/tools/skills.go @@ -80,12 +80,8 @@ func (t *ListSkillsTool) Execute(ctx context.Context, args map[string]interface{ } response := ToolResponse{ - Status: "success", - Context: ContextMetadata{ - WorkspaceRoot: workspaceRoot, - DetectionSource: source, - IndexingProgress: BuildIndexingProgress(t.engine, wctx.ID, wctx.Root), - }, + Status: "success", + Context: ContextFromWorkspaceWithStatus(wctx, t.engine), } response.SetFallbackWarning(source == "registry_fallback") @@ -211,12 +207,8 @@ func (t *InstallSkillTool) Execute(ctx context.Context, args map[string]interfac } response := ToolResponse{ - Status: "success", - Context: ContextMetadata{ - WorkspaceRoot: workspaceRoot, - DetectionSource: source, - IndexingProgress: BuildIndexingProgress(t.engine, wctx.ID, wctx.Root), - }, + Status: "success", + Context: ContextFromWorkspaceWithStatus(wctx, t.engine), } response.SetFallbackWarning(source != "explicit_file_path") diff --git a/internal/service/tools/smart_search.go b/internal/service/tools/smart_search.go index 5c2374d..cae49c1 100644 --- a/internal/service/tools/smart_search.go +++ b/internal/service/tools/smart_search.go @@ -267,20 +267,16 @@ func (t *SmartSearchTool) handleSearchError(err error, workspaceRoot, workspaceI if errors.As(err, &indexingStarted) { response.Status = "indexing_started" response.Context.WorkspaceRoot = indexingStarted.WorkspaceRoot - if indexingStarted.WorkspaceID != "" { - response.Context.IndexingProgress = BuildIndexingProgress(t.engine, indexingStarted.WorkspaceID, indexingStarted.WorkspaceRoot) - } - response.Message = buildIndexingMessage("🚀", indexingStarted.WorkspaceRoot, response.Context.IndexingProgress) + response.Context.IndexingStatus = t.engine.GetIndexStatus(indexingStarted.WorkspaceRoot) + response.Message = fmt.Sprintf("🚀 Indexing started for workspace '%s'. Results will appear as indexing progresses.", indexingStarted.WorkspaceRoot) return response.JSON() } if errors.As(err, &indexingInProgress) { response.Status = "indexing_in_progress" response.Context.WorkspaceRoot = indexingInProgress.WorkspaceRoot - if indexingInProgress.WorkspaceID != "" { - response.Context.IndexingProgress = BuildIndexingProgress(t.engine, indexingInProgress.WorkspaceID, indexingInProgress.WorkspaceRoot) - } - response.Message = buildIndexingMessage("⏳", indexingInProgress.WorkspaceRoot, response.Context.IndexingProgress) + response.Context.IndexingStatus = t.engine.GetIndexStatus(indexingInProgress.WorkspaceRoot) + response.Message = fmt.Sprintf("⏳ Indexing in progress for workspace '%s'. Results will improve as indexing completes.", indexingInProgress.WorkspaceRoot) return response.JSON() } @@ -481,4 +477,3 @@ func (t *SmartSearchTool) groupDocsByTree(results []mergedResult) []mergedResult return out } - diff --git a/internal/service/tools/smart_search_pipeline.go b/internal/service/tools/smart_search_pipeline.go index 37dc050..4fcd258 100644 --- a/internal/service/tools/smart_search_pipeline.go +++ b/internal/service/tools/smart_search_pipeline.go @@ -10,6 +10,7 @@ import ( "github.com/doITmagic/rag-code-mcp/internal/logger" "github.com/doITmagic/rag-code-mcp/internal/service/engine" + "github.com/doITmagic/rag-code-mcp/pkg/indexer" "github.com/doITmagic/rag-code-mcp/pkg/scoring" "github.com/doITmagic/rag-code-mcp/pkg/telemetry" ) @@ -202,23 +203,20 @@ func applyScoreFilter(merged []mergedResult, minScore float32) []mergedResult { func (t *SmartSearchTool) buildResponseMeta(meta searchMetadata) ToolResponse { isFallback := meta.collection == "fallback" - idxProgress := BuildIndexingProgress(t.engine, meta.workspaceID, meta.workspaceRoot) - if isFallback && idxProgress == nil { - idxProgress = &IndexingProgressSummary{ - State: "starting", - Elapsed: "0s", - } + var idxStatus *indexer.IndexStatus + if meta.workspaceRoot != "" { + idxStatus = indexer.LoadIndexStatus(meta.workspaceRoot) } response := ToolResponse{ Status: "success", Context: ContextMetadata{ - WorkspaceRoot: meta.workspaceRoot, - DetectionSource: meta.detectionSource, - Language: meta.language, - Collection: meta.collection, - IndexingProgress: idxProgress, - SessionMetrics: telemetry.ReadAggregatedMetrics(meta.workspaceRoot), + WorkspaceRoot: meta.workspaceRoot, + DetectionSource: meta.detectionSource, + Language: meta.language, + Collection: meta.collection, + IndexingStatus: idxStatus, + SessionMetrics: telemetry.ReadAggregatedMetrics(meta.workspaceRoot), }, } @@ -240,61 +238,16 @@ func (t *SmartSearchTool) buildResponseMeta(meta searchMetadata) ToolResponse { } if isFallback { - fallbackNote := buildFallbackNote(idxProgress) + fallbackNote := "⚡ Fallback results (AST/lexical, not vector). Indexing in background — retry shortly for semantic vector results. Current results may miss semantically related code." if response.Warning != "" { response.Warning += " | " + fallbackNote } else { response.Warning = fallbackNote } } - return response } -// buildFallbackNote constructs a dynamic fallback warning that includes -// live indexing progress data (per-language %, elapsed, ready languages). -func buildFallbackNote(progress *IndexingProgressSummary) string { - var sb strings.Builder - sb.WriteString("⚡ Fallback results (AST/lexical, not vector). ") - - if progress != nil && progress.Elapsed != "" && progress.Elapsed != "0s" { - sb.WriteString(fmt.Sprintf("Indexing elapsed: %s. ", progress.Elapsed)) - } - - // Report per-language progress and collect fully-indexed langs - var readyLangs []string - if progress != nil && len(progress.Languages) > 0 { - var langParts []string - for lang, lp := range progress.Languages { - if lp.TotalFiles == 0 { - continue - } - entry := fmt.Sprintf("%s %d%%", lang, lp.Percent) - if lp.Percent == 100 { - entry += " ✓" - readyLangs = append(readyLangs, lang) - } - langParts = append(langParts, entry) - } - if len(langParts) > 0 { - sb.WriteString("Progress: ") - sb.WriteString(strings.Join(langParts, " · ")) - sb.WriteString(". ") - } - } - - // Actionable hint: tell agent which langs can use vector search now - if len(readyLangs) > 0 { - sb.WriteString(fmt.Sprintf("Vector search ready for: %s — retry for higher-quality results on those files. ", - strings.Join(readyLangs, ", "))) - } else { - sb.WriteString("Indexing in background — retry shortly for semantic vector results. ") - } - - sb.WriteString("Current results may miss semantically related code.") - return sb.String() -} - // ─── Result Serialization ──────────────────────────────────────────────────── // resultToMap converts a mergedResult to the output map format. diff --git a/internal/service/tools/smart_search_test.go b/internal/service/tools/smart_search_test.go index b78673d..da2c3d7 100644 --- a/internal/service/tools/smart_search_test.go +++ b/internal/service/tools/smart_search_test.go @@ -11,7 +11,7 @@ func TestGroupDocsByTree(t *testing.T) { // Create a temporary file for testing disk reads tempDir := t.TempDir() tempFilePath := filepath.Join(tempDir, "test_doc.md") - + // Create a dummy document with 20 lines lines := make([]string, 20) for i := 0; i < 20; i++ { @@ -72,22 +72,22 @@ func TestGroupDocsByTree(t *testing.T) { if len(results) != 2 { t.Fatalf("Expected 2 results, got %d", len(results)) } - + // results are sorted by score. The merged "### Intro" should have max score 0.9 if results[0].score != 0.9 { t.Errorf("Expected max score 0.9, got %v", results[0].score) } - + // Check start/end line bounds for the merged "### Intro" if results[0].startLine != 1 || results[0].endLine != 7 { t.Errorf("Expected lines 1-7, got %d-%d", results[0].startLine, results[0].endLine) } - + // Verify the content was loaded from disk and contains Lines A to G (1 to 7) if !strings.Contains(results[0].content, "Line A") || !strings.Contains(results[0].content, "Line G") { t.Errorf("Merged content does not match expected lines from disk") } - + // Setup should be untouched if results[1].signature != "### Setup" { t.Errorf("Expected second result to be '### Setup'") @@ -125,7 +125,7 @@ func TestGroupDocsByTree(t *testing.T) { startLine: 19, endLine: 20, score: 0.9, // Line S, T }, }, - expected: 1, + expected: 1, check: func(t *testing.T, results []mergedResult) { if len(results) != 1 { t.Fatalf("Expected 1 result after merge, got %d", len(results)) diff --git a/internal/service/tools/tests/health_metrics_test.go b/internal/service/tools/tests/health_metrics_test.go index 637ed92..8caef86 100644 --- a/internal/service/tools/tests/health_metrics_test.go +++ b/internal/service/tools/tests/health_metrics_test.go @@ -5,7 +5,6 @@ import ( "encoding/json" "os" "path/filepath" - "time" . "github.com/onsi/ginkgo/v2" . "github.com/onsi/gomega" @@ -29,63 +28,6 @@ var _ = Describe("Health Metrics & Index Status", func() { ctx = context.Background() }) - // ─── 1. index_age field ───────────────────────────────────────────────────── - - Describe("BuildIndexingProgress — index_age field", func() { - It("returns nil when no indexing has been started", func() { - eng := setupTestEngine(mockStore) - result := tools.BuildIndexingProgress(eng, "non-existent-workspace", "") - Expect(result).To(BeNil()) - }) - - It("populates index_age as 'just now' immediately after completion", func() { - // We test the IndexingProgressSummary struct behavior directly - // since engine state requires internal access. - - // Use a real duration of 0s — just now - summary := &tools.IndexingProgressSummary{ - State: "completed", - Elapsed: "5s", - IndexAge: "just now", - } - Expect(summary.IndexAge).To(Equal("just now")) - Expect(summary.State).To(Equal("completed")) - }) - - It("returns empty index_age when indexing is still running", func() { - // IndexAge is only set when CompletedAt != nil - summary := &tools.IndexingProgressSummary{ - State: "running", - Elapsed: "10s", - // IndexAge left empty intentionally - } - Expect(summary.IndexAge).To(BeEmpty()) - }) - - It("serializes index_age correctly as omitempty", func() { - // When IndexAge is empty, it should NOT appear in JSON - summary := &tools.IndexingProgressSummary{ - State: "running", - Elapsed: "10s", - } - b, err := json.Marshal(summary) - Expect(err).NotTo(HaveOccurred()) - Expect(string(b)).NotTo(ContainSubstring("index_age")) - }) - - It("serializes index_age when present", func() { - summary := &tools.IndexingProgressSummary{ - State: "completed", - Elapsed: "5s", - IndexAge: "3 minutes ago", - } - b, err := json.Marshal(summary) - Expect(err).NotTo(HaveOccurred()) - Expect(string(b)).To(ContainSubstring("index_age")) - Expect(string(b)).To(ContainSubstring("3 minutes ago")) - }) - }) - // ─── 2. Stale chunk detection ──────────────────────────────────────────────── Describe("rag_search (SmartSearchTool) — stale chunk detection", func() { @@ -204,80 +146,4 @@ var _ = Describe("Health Metrics & Index Status", func() { Expect(resp.Warning).To(ContainSubstring("2 stale file(s)")) }) }) - - // ─── 3. Uniform IndexingProgress exposure ─────────────────────────────────── - - Describe("Uniform IndexingProgress across tools", func() { - It("ContextMetadata contains indexing_progress field (JSON tag verified)", func() { - meta := tools.ContextMetadata{ - WorkspaceRoot: "/test", - DetectionSource: "explicit_file_path", - IndexingProgress: &tools.IndexingProgressSummary{ - State: "running", - Elapsed: "5s", - }, - } - b, err := json.Marshal(meta) - Expect(err).NotTo(HaveOccurred()) - Expect(string(b)).To(ContainSubstring("indexing_progress")) - Expect(string(b)).To(ContainSubstring("running")) - }) - - It("ContextMetadata omits indexing_progress when nil (no noise in responses)", func() { - meta := tools.ContextMetadata{ - WorkspaceRoot: "/test", - DetectionSource: "explicit_file_path", - } - b, err := json.Marshal(meta) - Expect(err).NotTo(HaveOccurred()) - Expect(string(b)).NotTo(ContainSubstring("indexing_progress")) - }) - - It("rag_read_file_context returns indexing_progress in context when no indexing is running", func() { - eng := setupTestEngine(mockStore) - tool := tools.NewReadFileContextTool(eng) - - // Create a real temp file to read - tmpFile, err := os.CreateTemp("", "ragcode_test_read_*.go") - Expect(err).NotTo(HaveOccurred()) - _, _ = tmpFile.WriteString("package main\n\nfunc Hello() {}\n") - tmpFile.Close() - defer os.Remove(tmpFile.Name()) - - resJSON, err := tool.Execute(ctx, map[string]interface{}{ - "file_path": tmpFile.Name(), - "line_number": 3, - }) - Expect(err).NotTo(HaveOccurred()) - - var resp tools.ToolResponse - Expect(json.Unmarshal([]byte(resJSON), &resp)).NotTo(HaveOccurred()) - Expect(resp.Status).To(Equal("success")) - - // Context should have workspace root but indexing_progress nil (not running) - ctxB, err := json.Marshal(resp.Context) - Expect(err).NotTo(HaveOccurred()) - // No active indexing → field omitted - Expect(string(ctxB)).NotTo(ContainSubstring("indexing_progress")) - }) - }) - - // ─── 4. formatAge helper (via IndexingProgressSummary) ─────────────────────── - - Describe("formatAge human-readable durations", func() { - DescribeTable("formats duration correctly", - func(d time.Duration, expected string) { - summary := &tools.IndexingProgressSummary{ - State: "completed", - Elapsed: "1s", - IndexAge: expected, // we just verify the struct accepts it - } - Expect(summary.IndexAge).To(Equal(expected)) - }, - Entry("just now for <2 min", 30*time.Second, "just now"), - Entry("minutes ago for 2-59 min", 5*time.Minute, "5 minutes ago"), - Entry("hours ago for 1-23h", 3*time.Hour, "3 hours ago"), - Entry("days ago for >=24h", 48*time.Hour, "2 days ago"), - ) - }) }) diff --git a/internal/service/tools/tests/tools_suite_test.go b/internal/service/tools/tests/tools_suite_test.go index aa9e7fe..8e2d64f 100644 --- a/internal/service/tools/tests/tools_suite_test.go +++ b/internal/service/tools/tests/tools_suite_test.go @@ -4,7 +4,10 @@ import ( "testing" _ "github.com/doITmagic/rag-code-mcp/pkg/parser/go" + _ "github.com/doITmagic/rag-code-mcp/pkg/parser/javascript" _ "github.com/doITmagic/rag-code-mcp/pkg/parser/php" + _ "github.com/doITmagic/rag-code-mcp/pkg/parser/php/laravel" + _ "github.com/doITmagic/rag-code-mcp/pkg/parser/php/wordpress" _ "github.com/doITmagic/rag-code-mcp/pkg/parser/python" . "github.com/onsi/ginkgo/v2" diff --git a/pkg/indexer/index_status.go b/pkg/indexer/index_status.go new file mode 100644 index 0000000..e60cea0 --- /dev/null +++ b/pkg/indexer/index_status.go @@ -0,0 +1,157 @@ +package indexer + +import ( + "encoding/json" + "fmt" + "os" + "path/filepath" + "runtime" + "strings" + + "github.com/doITmagic/rag-code-mcp/internal/logger" +) + +const indexStatusFile = "index_status.json" + +// IndexStatus represents the current state of indexing for a workspace. +// Written by the indexer to {workspaceRoot}/.ragcode/index_status.json. +// Read by tools to include progress in MCP responses. +type IndexStatus struct { + StartedAt string `json:"started_at"` // RFC3339 + EndedAt string `json:"ended_at,omitempty"` // RFC3339 + Elapsed string `json:"elapsed,omitempty"` // human-readable duration + Error string `json:"error,omitempty"` + Languages map[string]LangStatus `json:"languages,omitempty"` +} + +// LangStatus holds indexing stats for a single language. +type LangStatus struct { + OnDisk int `json:"on_disk"` // total files on disk for this language + Changed int `json:"-"` // internal: files that need processing (hidden from AI consumers) + Processed int `json:"processed"` // files processed so far +} + +// callerChain returns a compact caller stack (skipping skip frames) for debugging. +func callerChain(skip, depth int) string { + pcs := make([]uintptr, depth) + n := runtime.Callers(skip+1, pcs) + if n == 0 { + return "" + } + frames := runtime.CallersFrames(pcs[:n]) + var parts []string + for { + frame, more := frames.Next() + // Use short function name + fn := frame.Function + if idx := strings.LastIndex(fn, "/"); idx >= 0 { + fn = fn[idx+1:] + } + parts = append(parts, fmt.Sprintf("%s:%d", fn, frame.Line)) + if !more { + break + } + } + return strings.Join(parts, " ← ") +} + +// SaveIndexStatus writes the IndexStatus to {workspaceRoot}/.ragcode/index_status.json. +// The write is atomic: data is written to a temp file first, then renamed into place, +// so concurrent readers always see a complete JSON file. +func SaveIndexStatus(workspaceRoot string, status *IndexStatus) { + if workspaceRoot == "" || status == nil { + return + } + dir := filepath.Join(workspaceRoot, ".ragcode") + dirExisted := true + if _, statErr := os.Stat(dir); os.IsNotExist(statErr) { + dirExisted = false + } + if err := os.MkdirAll(dir, 0o755); err != nil { + logger.Instance.Warn("index_status: cannot create .ragcode dir: %v", err) + return + } + + // Log every write with caller stack to trace the source of spurious .ragcode directories + callers := callerChain(2, 4) + if !dirExisted { + logger.Instance.Warn("[INDEX_STATUS] 🆕 CREATED .ragcode dir: workspace=%s, started_at=%s, ended_at=%s, callers=[%s]", + workspaceRoot, status.StartedAt, status.EndedAt, callers) + } else { + logger.Instance.Debug("[INDEX_STATUS] 📝 Writing index_status.json: workspace=%s, started_at=%s, ended_at=%s, langs=%d, callers=[%s]", + workspaceRoot, status.StartedAt, status.EndedAt, len(status.Languages), callers) + } + b, err := json.MarshalIndent(status, "", " ") + if err != nil { + logger.Instance.Warn("index_status: marshal failed: %v", err) + return + } + // Write to a temp file in the same directory so that rename is atomic. + tmp, err := os.CreateTemp(dir, indexStatusFile+".tmp-*") + if err != nil { + logger.Instance.Warn("index_status: cannot create temp file: %v", err) + return + } + tmpName := tmp.Name() + if _, err := tmp.Write(b); err != nil { + tmp.Close() + os.Remove(tmpName) + logger.Instance.Warn("index_status: write to temp failed: %v", err) + return + } + if err := tmp.Close(); err != nil { + os.Remove(tmpName) + logger.Instance.Warn("index_status: close temp failed: %v", err) + return + } + path := filepath.Join(dir, indexStatusFile) + if err := os.Rename(tmpName, path); err != nil { + // Windows-safe fallback: os.Rename fails when dest exists on Windows. + _ = os.Remove(path) + if err2 := os.Rename(tmpName, path); err2 != nil { + os.Remove(tmpName) + logger.Instance.Warn("index_status: rename failed for %s: %v", path, err2) + } + } +} + +// LoadIndexStatus reads the IndexStatus from {workspaceRoot}/.ragcode/index_status.json. +// Returns nil if the file doesn't exist or can't be parsed. +func LoadIndexStatus(workspaceRoot string) *IndexStatus { + path := filepath.Join(workspaceRoot, ".ragcode", indexStatusFile) + b, err := os.ReadFile(path) + if err != nil { + return nil + } + var s IndexStatus + if err := json.Unmarshal(b, &s); err != nil { + logger.Instance.Warn("index_status: parse failed for %s: %v", path, err) + return nil + } + return &s +} + +// GetLastInterruptedWorkspace checks a list of roots and picks the one +// that is incomplete (StartedAt without EndedAt) with the most recent Start time. +func GetLastInterruptedWorkspace(roots []string) string { + var bestRoot string + var bestStartedAt string + + for _, root := range roots { + status := LoadIndexStatus(root) + if status == nil { + continue + } + if status.EndedAt != "" { + continue // Already finished + } + + if status.StartedAt != "" && status.StartedAt > bestStartedAt { + // Basic lexicographical comparison works for RFC3339 timestamps + bestStartedAt = status.StartedAt + bestRoot = root + } + } + + return bestRoot +} diff --git a/pkg/indexer/index_status_test.go b/pkg/indexer/index_status_test.go new file mode 100644 index 0000000..9a0275a --- /dev/null +++ b/pkg/indexer/index_status_test.go @@ -0,0 +1,43 @@ +package indexer + +import ( + "testing" +) + +func TestIndexStatusRoundTrip(t *testing.T) { + wsRoot := t.TempDir() + + // Ensure no file exists yet + if s := LoadIndexStatus(wsRoot); s != nil { + t.Fatal("expected nil before first save") + } + + status := &IndexStatus{ + StartedAt: "2025-01-01T00:00:00Z", + Elapsed: "5s", + Languages: map[string]LangStatus{ + // Changed is json:"-" (internal-only field) and is not persisted. + "go": {OnDisk: 100, Processed: 5}, + }, + } + + SaveIndexStatus(wsRoot, status) + + loaded := LoadIndexStatus(wsRoot) + if loaded == nil { + t.Fatal("expected non-nil after save") + } + if loaded.Languages["go"].OnDisk != 100 { + t.Errorf("OnDisk: got %d, want 100", loaded.Languages["go"].OnDisk) + } + if loaded.Languages["go"].Processed != 5 { + t.Errorf("Processed: got %d, want 5", loaded.Languages["go"].Processed) + } +} + +func TestLoadIndexStatusMissing(t *testing.T) { + s := LoadIndexStatus(t.TempDir()) + if s != nil { + t.Fatal("expected nil for missing file") + } +} diff --git a/pkg/indexer/minified.go b/pkg/indexer/minified.go new file mode 100644 index 0000000..4838806 --- /dev/null +++ b/pkg/indexer/minified.go @@ -0,0 +1,74 @@ +package indexer + +import ( + "bytes" + "io" + "os" + "path/filepath" + "strings" +) + +// maxAvgLineLen is the average-line-length threshold above which a file +// is considered minified. Normal hand-written source rarely exceeds +// 120 characters per line; minified bundles routinely exceed 1 000. +const maxAvgLineLen = 500 + +// sampleBytes caps the amount of data we read for the density heuristic. +const sampleBytes = 256 << 10 // 256 KiB + +// minifiedSuffixes are well-known filename patterns that unambiguously +// mark a file as a minified/bundled asset. +var minifiedSuffixes = []string{ + ".min.js", ".min.css", ".min.mjs", + ".bundle.js", ".bundle.css", ".bundle.mjs", + ".packed.js", ".chunk.js", ".chunk.css", + "-min.js", "-min.css", +} + +// isMinifiedOrVendored reports whether path points to machine-generated, +// bundled, or minified code that should be skipped before tree-sitter parsing. +// +// Directory-level exclusions (vendor/, node_modules/, etc.) are already +// handled by the WalkDir filter in IndexWorkspace, so this function only +// covers two remaining cases: +// 1. Filename suffix (.min.js, .bundle.css, …) — no I/O. +// 2. Content density — reads ≤256 KiB, counts newlines. +func isMinifiedOrVendored(path string) bool { + if matchesMinifiedName(path) { + return true + } + return exceedsLineDensity(path) +} + +// matchesMinifiedName checks well-known minified filename patterns. +func matchesMinifiedName(path string) bool { + base := strings.ToLower(filepath.Base(path)) + for _, s := range minifiedSuffixes { + if strings.HasSuffix(base, s) { + return true + } + } + return false +} + +// exceedsLineDensity reads the first sampleBytes of path and returns +// true when the average line length exceeds maxAvgLineLen. +func exceedsLineDensity(path string) bool { + f, err := os.Open(path) + if err != nil { + return false + } + defer f.Close() + + buf := make([]byte, sampleBytes) + n, err := io.ReadFull(f, buf) + if n == 0 && err != nil { + return false + } + + lines := bytes.Count(buf[:n], []byte{'\n'}) + if lines == 0 { + lines = 1 + } + return n/lines > maxAvgLineLen +} diff --git a/pkg/indexer/service.go b/pkg/indexer/service.go index e19e899..d2685ec 100644 --- a/pkg/indexer/service.go +++ b/pkg/indexer/service.go @@ -10,6 +10,7 @@ import ( "os/exec" "path/filepath" "runtime" + "runtime/debug" "strings" "sync" "sync/atomic" @@ -19,6 +20,7 @@ import ( "github.com/doITmagic/rag-code-mcp/internal/logger" "github.com/doITmagic/rag-code-mcp/pkg/llm" "github.com/doITmagic/rag-code-mcp/pkg/parser" + "github.com/doITmagic/rag-code-mcp/pkg/parser/arenagc" "github.com/doITmagic/rag-code-mcp/pkg/storage" ) @@ -221,7 +223,7 @@ func (s *Service) IndexWorkspace(ctx context.Context, root string, collection st var fileErrs []string for _, path := range changedFiles { fileNum := int(doneFiles.Load()) + 1 - logger.Instance.Debug("[IDX] ws=%s lang=%s [%d/%d] %s (indexing...)", + logger.Instance.Info("[IDX] ws=%s lang=%s [%d/%d] %s (indexing...)", wsName, opts.Language, fileNum, totalFiles, filepath.Base(path)) symCount, indexErr := s.IndexFile(ctx, collection, path, state) @@ -229,13 +231,17 @@ func (s *Service) IndexWorkspace(ctx context.Context, root string, collection st logger.Instance.Warn("[IDX] ws=%s lang=%s ⚠️ %s: %v", wsName, opts.Language, filepath.Base(path), indexErr) fileErrs = append(fileErrs, fmt.Sprintf("%s: %v", path, indexErr)) } else { - logger.Instance.Debug("[IDX] ws=%s lang=%s %s → %d symbol(s)", wsName, opts.Language, filepath.Base(path), symCount) + logger.Instance.Info("[IDX] ws=%s lang=%s %s → %d symbol(s)", wsName, opts.Language, filepath.Base(path), symCount) } + // Release gotreesitter arena memory after each file so it doesn't + // accumulate across hundreds of files. + arenagc.DrainArenaPools() + // Increment after IndexFile so 100% is only reported once the last file is done. n := int(doneFiles.Add(1)) pct := n * 100 / totalFiles - logger.Instance.Debug("[IDX] ws=%s lang=%s [%d/%d] done (%d%%)", wsName, opts.Language, n, totalFiles, pct) + logger.Instance.Info("[IDX] ws=%s lang=%s [%d/%d] done (%d%%)", wsName, opts.Language, n, totalFiles, pct) if opts.Progress != nil { opts.Progress(n, totalFiles) } @@ -253,6 +259,17 @@ func (s *Service) IndexWorkspace(ctx context.Context, root string, collection st } logger.Instance.Info("[IDX] ws=%s lang=%s ✅ DONE %d file(s)", wsName, opts.Language, totalFiles) + + // Release cached tree-sitter parsers so the GC can reclaim arena memory. + // gotreesitter's nodeArena pools retain large pre-allocated slabs (up to 128MB + // for CSS files like bootstrap.css). Dropping parser references allows the GC + // to collect them. Parsers will be lazily recreated if needed later. + parser.ReleaseAllResources() + arenagc.DrainArenaPools() + runtime.GC() + debug.FreeOSMemory() + logger.Instance.Info("[IDX] ws=%s lang=%s 🧹 Released parser caches, drained arena pools, forced GC", wsName, opts.Language) + return nil } @@ -343,6 +360,13 @@ func (s *Service) IndexFile(ctx context.Context, collection, path string, state return 0, nil } + // Skip minified/vendored files — tree-sitter GLR parsing on dense + // machine-generated code can allocate 500MB+ of arena memory. + if isMinifiedOrVendored(path) { + logger.Instance.Debug("[IDX] Skipping minified/vendored file: %s", filepath.Base(path)) + return 0, nil + } + res, err := a.Analyze(ctx, path) if err != nil { logger.Instance.Error("Analyze failed for %s: %v", path, err) @@ -514,7 +538,7 @@ func (s *Service) IndexItems(ctx context.Context, collection string, symbols []p // Throttle: small pause between embeds to avoid overwhelming Ollama. // 150ms adds ~15s per 100 symbols — negligible vs total indexing time, // but prevents Ollama from freezing under sustained concurrent load. - time.Sleep(10 * time.Millisecond) + time.Sleep(150 * time.Millisecond) vector := make([]float32, len(vector64)) for i, v := range vector64 { diff --git a/pkg/parser/README.md b/pkg/parser/README.md index d3fb354..01613cd 100644 --- a/pkg/parser/README.md +++ b/pkg/parser/README.md @@ -45,9 +45,10 @@ Click on each language to see the detailed technical documentation for its speci |----------|-----------|-------------|--------| | **Go** | [`/go`](./go/README.md) | Native AST parsing with full documentation support. | ✅ Production | | **Python** | [`/python`](./python/README.md) | Robust regex & indentation analysis. | ✅ Production | -| **PHP** | [`/php`](./php/README.md) | Deep Laravel integration & Eloquent analysis. | ✅ Production | -| **HTML** | [`/html`](./html/README.md) | Semantic sectioning based on header hierarchy. | ✅ Production | -| **JavaScript** | [`/javascript`](./javascript/README.md) | React & TypeScript support. | 🚧 In Progress | +| **PHP** | [`/php`](./php/README.md) | Deep Laravel integration (Eloquent, Routes, Controllers) & WordPress. | ✅ Production | +| **HTML & CSS** | [`/html`](./html/README.md) | HTML semantic sectioning + CSS/SCSS/SASS/LESS via tree-sitter. | ✅ Production | +| **JavaScript** | [`/javascript`](./javascript/README.md) | React, Vue, & TypeScript support. | ✅ Production | +| **Docs** | [`/docs`](./docs/README.md) | Markdown, JSON, YAML, XML, TOML, reStructuredText. | ✅ Production | | **Generic** | [`/generic`](./generic/README.md) | Universal regex-based fallback for other languages. | ✅ Production | ## 🚀 Unified Symbol Model diff --git a/pkg/parser/arenagc/drain.go b/pkg/parser/arenagc/drain.go new file mode 100644 index 0000000..2ba0bf4 --- /dev/null +++ b/pkg/parser/arenagc/drain.go @@ -0,0 +1,42 @@ +// Package arenagc provides a mechanism to drain gotreesitter's global +// nodeArena pools. These pools retain large pre-allocated slabs (up to +// 128MB for CSS files) indefinitely because they are package-level +// variables that the GC cannot collect. This package uses go:linkname +// to access the unexported pool variables and clear their free lists. +package arenagc + +import ( + "sync" + _ "unsafe" // required for go:linkname + + _ "github.com/odvcencio/gotreesitter" // ensure the package is linked +) + +// nodeArenaPool mirrors the internal gotreesitter struct layout. +// Only the fields we need to access (mu + free) are declared. +type nodeArenaPool struct { + mu sync.Mutex + _ uint8 // class + _ int // maxSize + free []*struct{} // opaque; we just need to nil the slice +} + +//go:linkname incrementalPool github.com/odvcencio/gotreesitter.incrementalArenaPool +var incrementalPool nodeArenaPool + +//go:linkname fullPool github.com/odvcencio/gotreesitter.fullArenaPool +var fullPool nodeArenaPool + +// DrainArenaPools clears the free lists of both gotreesitter arena pools, +// allowing the GC to reclaim the large node slabs they retain. +// This should be called after indexing completes, followed by runtime.GC() +// and debug.FreeOSMemory(). +func DrainArenaPools() { + incrementalPool.mu.Lock() + incrementalPool.free = nil + incrementalPool.mu.Unlock() + + fullPool.mu.Lock() + fullPool.free = nil + fullPool.mu.Unlock() +} diff --git a/pkg/parser/css/analyzer.go b/pkg/parser/css/analyzer.go new file mode 100644 index 0000000..29ef370 --- /dev/null +++ b/pkg/parser/css/analyzer.go @@ -0,0 +1,157 @@ +package css + +import ( + "context" + "fmt" + "os" + "path/filepath" + "strings" + + "github.com/doITmagic/rag-code-mcp/pkg/parser" +) + +func init() { + parser.Register(NewAnalyzer()) +} + +// Analyzer implementeaza procesarea pe bucati (chunk-based) a fisierelor CSS/SCSS/LESS. +// Fara sa depinda de Tree-sitter, nu face OOM nici macar la bundle-uri gigantice. +type Analyzer struct{} + +func NewAnalyzer() *Analyzer { + return &Analyzer{} +} + +func (a *Analyzer) Name() string { + return "css" +} + +func (a *Analyzer) CanHandle(filePath string) bool { + ext := strings.ToLower(filepath.Ext(filePath)) + return ext == ".css" || ext == ".scss" || ext == ".less" || ext == ".sass" +} + +func (a *Analyzer) Analyze(ctx context.Context, path string) (*parser.Result, error) { + content, err := os.ReadFile(path) + if err != nil { + return nil, fmt.Errorf("read css: %w", err) + } + + text := string(content) + if strings.TrimSpace(text) == "" { + return &parser.Result{Language: "css"}, nil + } + + var symbols []parser.Symbol + + var selectorBuilder strings.Builder + var contentBuilder strings.Builder + braceDepth := 0 + startLine := 1 + currentLine := 1 + inComment := false + + baseName := filepath.Base(path) + + for i := 0; i < len(text); i++ { + c := text[i] + + if c == '\n' { + currentLine++ + } + + // Skip over slash-star comments + if !inComment && c == '/' && i+1 < len(text) && text[i+1] == '*' { + inComment = true + i++ // skip '*' + continue + } + if inComment && c == '*' && i+1 < len(text) && text[i+1] == '/' { + inComment = false + i++ // skip '/' + continue + } + if inComment { + continue + } + + if c == '{' { + if braceDepth == 0 { + contentBuilder.WriteByte(c) + } else { + contentBuilder.WriteByte(c) + } + braceDepth++ + continue + } + + if c == '}' { + braceDepth-- + contentBuilder.WriteByte(c) + + if braceDepth == 0 { + selector := strings.TrimSpace(selectorBuilder.String()) + if len(selector) > 200 { + selector = selector[:197] + "..." + } + + blockContent := strings.TrimSpace(contentBuilder.String()) + if len(blockContent) > 8192 { + blockContent = blockContent[:8192] + "\n...[TRUNCATED]" + } + + if selector != "" { + symbols = append(symbols, parser.Symbol{ + Name: baseName, + Type: "style_rule", + FilePath: path, + Language: "css", + Content: selector + " " + blockContent, + Signature: selector, + StartLine: startLine, + EndLine: currentLine, + IsPublic: true, + Metadata: map[string]interface{}{ + "selector": selector, + }, + }) + } + + selectorBuilder.Reset() + contentBuilder.Reset() + startLine = currentLine + } + continue + } + + if braceDepth == 0 { + selectorBuilder.WriteByte(c) + } else { + contentBuilder.WriteByte(c) + } + } + + // Any leftover content (like variables at the root without braces) + leftover := strings.TrimSpace(selectorBuilder.String()) + if leftover != "" && len(symbols) == 0 { + if len(leftover) > 8192 { + leftover = leftover[:8192] + "\n...[TRUNCATED]" + } + symbols = append(symbols, parser.Symbol{ + Name: baseName, + Type: "style_rule", + FilePath: path, + Language: "css", + Content: leftover, + Signature: "global", + StartLine: startLine, + EndLine: currentLine, + IsPublic: true, + }) + } + + return &parser.Result{ + Symbols: symbols, + Language: "css", + }, nil +} diff --git a/pkg/parser/docs/analyzer.go b/pkg/parser/docs/analyzer.go index b36b4eb..393f930 100644 --- a/pkg/parser/docs/analyzer.go +++ b/pkg/parser/docs/analyzer.go @@ -26,6 +26,13 @@ func NewAnalyzer() *Analyzer { } } +// ReleaseResources drops cached tree-sitter parsers so the GC can reclaim arena memory. +func (a *Analyzer) ReleaseResources() { + if a.tsParser != nil { + a.tsParser.ReleaseResources() + } +} + func (a *Analyzer) Name() string { return "docs" } @@ -36,8 +43,8 @@ func (a *Analyzer) CanHandle(path string) bool { // Markdown case ".md", ".markdown": return true - // Tree-sitter supported structured / config / markup / scripts - case ".yaml", ".yml", ".json", ".xml", ".toml", ".rst", ".css", ".scss", ".sql", ".sh", ".svelte": + // Tree-sitter supported structured / config / markup + case ".yaml", ".yml", ".json", ".xml", ".toml", ".rst": return true default: return false diff --git a/pkg/parser/docs/analyzer_test.go b/pkg/parser/docs/analyzer_test.go index 149df74..f3e7403 100644 --- a/pkg/parser/docs/analyzer_test.go +++ b/pkg/parser/docs/analyzer_test.go @@ -13,8 +13,8 @@ import ( func TestAnalyzer_CanHandle(t *testing.T) { analyzer := NewAnalyzer() - validExts := []string{"test.md", "README.markdown", "config.yaml", "data.json", "conf.toml", "index.xml", "doc.rst", "style.css", "main.scss", "query.sql", "script.sh"} - invalidExts := []string{"main.go", "script.js", "style.less", "data.csv"} + validExts := []string{"test.md", "README.markdown", "config.yaml", "data.json", "conf.toml", "index.xml", "doc.rst"} + invalidExts := []string{"main.go", "script.js", "style.css", "main.scss", "style.less", "query.sql", "script.sh", "app.svelte", "data.csv"} for _, ext := range validExts { assert.True(t, analyzer.CanHandle(ext), "Should handle %s", ext) @@ -126,25 +126,11 @@ func TestAnalyzer_TreesitterParsing_JSON(t *testing.T) { assert.Greater(t, len(result.Symbols), 0, "Should extract symbols from json via treesitter") } -func TestAnalyzer_TreesitterParsing_CSS(t *testing.T) { +func TestAnalyzer_DoesNotHandle_CSS(t *testing.T) { analyzer := NewAnalyzer() - - tmpDir := t.TempDir() - cssFile := filepath.Join(tmpDir, "style.css") - - cssContent := ` -body { - background-color: red; -} -.header { - font-size: 24px; -} -` - err := os.WriteFile(cssFile, []byte(cssContent), 0644) - require.NoError(t, err) - - result, err := analyzer.Analyze(context.Background(), cssFile) - require.NoError(t, err) - require.NotNil(t, result) - assert.Greater(t, len(result.Symbols), 0, "Should extract symbols from css via treesitter") + assert.False(t, analyzer.CanHandle("style.css"), "CSS should NOT be handled by docs parser") + assert.False(t, analyzer.CanHandle("main.scss"), "SCSS should NOT be handled by docs parser") + assert.False(t, analyzer.CanHandle("query.sql"), "SQL should NOT be handled by docs parser") + assert.False(t, analyzer.CanHandle("script.sh"), "Shell should NOT be handled by docs parser") + assert.False(t, analyzer.CanHandle("app.svelte"), "Svelte should NOT be handled by docs parser") } diff --git a/pkg/parser/docs/treesitter.go b/pkg/parser/docs/treesitter.go index 574918e..4c58934 100644 --- a/pkg/parser/docs/treesitter.go +++ b/pkg/parser/docs/treesitter.go @@ -4,16 +4,42 @@ import ( "fmt" "path/filepath" "strings" + "sync" "github.com/doITmagic/rag-code-mcp/pkg/parser" "github.com/odvcencio/gotreesitter" "github.com/odvcencio/gotreesitter/grammars" ) -type TreeSitterParser struct{} +// TreeSitterParser parses documentation files using tree-sitter. +// Caches Parser instances per language to avoid re-allocating expensive lookup tables. +type TreeSitterParser struct { + mu sync.Mutex + parsers map[string]*gotreesitter.Parser +} func NewTreeSitterParser() *TreeSitterParser { - return &TreeSitterParser{} + return &TreeSitterParser{ + parsers: make(map[string]*gotreesitter.Parser), + } +} + +func (p *TreeSitterParser) getOrCreateParser(lang *grammars.LangEntry) *gotreesitter.Parser { + p.mu.Lock() + defer p.mu.Unlock() + if cached, ok := p.parsers[lang.Name]; ok { + return cached + } + tsParser := gotreesitter.NewParser(lang.Language()) + p.parsers[lang.Name] = tsParser + return tsParser +} + +// ReleaseResources drops cached tree-sitter parsers so the GC can reclaim arena memory. +func (p *TreeSitterParser) ReleaseResources() { + p.mu.Lock() + defer p.mu.Unlock() + p.parsers = make(map[string]*gotreesitter.Parser) } func (p *TreeSitterParser) Parse(source []byte, filePath string, ext string) ([]parser.Symbol, error) { @@ -23,11 +49,12 @@ func (p *TreeSitterParser) Parse(source []byte, filePath string, ext string) ([] } langObj := langInfo.Language() - parserTs := gotreesitter.NewParser(langObj) + parserTs := p.getOrCreateParser(langInfo) tree, err := parserTs.Parse(source) if err != nil { return nil, fmt.Errorf("ts parse error: %w", err) } + defer tree.Release() langName := langInfo.Name var symbols []parser.Symbol @@ -35,9 +62,10 @@ func (p *TreeSitterParser) Parse(source []byte, filePath string, ext string) ([] var walk func(node *gotreesitter.Node, parentSig string) walk = func(node *gotreesitter.Node, parentSig string) { - text := strings.TrimSpace(node.Text(source)) - if len(text) < 10 { - // Skip too small block, but we must recurse its children just in case they hold valid stuff + nodeLen := int(node.EndByte() - node.StartByte()) + + if nodeLen < 5 { + // Skip too small block naturally, but we must recurse its children just in case they hold valid stuff for i := 0; i < node.ChildCount(); i++ { walk(node.Child(i), parentSig) } @@ -45,7 +73,27 @@ func (p *TreeSitterParser) Parse(source []byte, filePath string, ext string) ([] } // A leaf or a reasonably sized chunk (~1500 chars) -> make it a valid symbol chunk - if node.ChildCount() == 0 || len(text) <= 1500 { + if node.ChildCount() == 0 || nodeLen <= 1500 { + // Prevent massive leaf nodes (e.g. 50MB SQL INSERT values) from + // allocating the full string — slice the underlying bytes directly. + var text string + if nodeLen > 8192 { + end := int(node.StartByte()) + 8192 + if end > len(source) { + end = len(source) + } + text = strings.TrimSpace(string(source[node.StartByte():end])) + "\n...[TRUNCATED]" + } else { + text = strings.TrimSpace(node.Text(source)) + } + + if len(text) < 10 { + for i := 0; i < node.ChildCount(); i++ { + walk(node.Child(i), parentSig) + } + return + } + startLine := int(node.StartPoint().Row) + 1 endLine := int(node.EndPoint().Row) + 1 diff --git a/pkg/parser/go/analyzer.go b/pkg/parser/go/analyzer.go index f1055f3..5fe3c93 100644 --- a/pkg/parser/go/analyzer.go +++ b/pkg/parser/go/analyzer.go @@ -139,6 +139,15 @@ func (ca *CodeAnalyzer) AnalyzePackage(dir string) (*PackageInfo, error) { info.Types[typeIdx].Methods = append(info.Types[typeIdx].Methods, ca.convertFunctionToMethodInfo(methodInfo, typ.Name)) } + + // Process constructor/factory functions associated with this type. + // go/doc automatically moves functions like NewX(), LoadX() that return *T + // from the top-level Funcs list into typ.Funcs. Without this loop they + // were silently dropped from the index (BUG-003). + for _, fn := range typ.Funcs { + fnInfo := ca.analyzeFunctionDecl(fset, fn, astFuncMap) + info.Functions = append(info.Functions, fnInfo) + } } // Consts and vars for _, c := range docPkg.Consts { constInfo := ca.analyzeConstantDecl(fset, c) diff --git a/pkg/parser/go/analyzer_test.go b/pkg/parser/go/analyzer_test.go index 82da07d..cb1f99a 100644 --- a/pkg/parser/go/analyzer_test.go +++ b/pkg/parser/go/analyzer_test.go @@ -4,6 +4,7 @@ import ( "context" "os" "path/filepath" + "strings" "testing" pkgParser "github.com/doITmagic/rag-code-mcp/pkg/parser" @@ -11,9 +12,198 @@ import ( "github.com/stretchr/testify/require" ) +// realIndexerDir points to the actual pkg/indexer package in the project. +// Tests that use this directory verify Go parser behaviour against real +// production code, covering edge cases like constructor functions (BUG-003) +// that go/doc moves from docPkg.Funcs into docPkg.Types[T].Funcs. +func realIndexerDir(t *testing.T) string { + t.Helper() + // Walk up from the test file's directory to the repo root. + dir, err := filepath.Abs("../../indexer") + require.NoError(t, err) + _, err = os.Stat(dir) + require.NoError(t, err, "pkg/indexer must exist; run tests from the repo root") + return dir +} + +// --------------------------------------------------------------------------- +// Tests against pkg/indexer — real production code as parser fixture +// --------------------------------------------------------------------------- + +// TestRealPackage_IndexerKnownSymbols verifies that the Go parser correctly +// extracts all exported symbols from pkg/indexer, including constructor/loader +// functions that go/doc associates with their return type (BUG-003 regression). +// +// Symbols expected to be parsed (exported, capitalized): +// +// CountAllFiles, FileState, GetFileState, IndexFile, IndexItems, +// IndexStatus, IndexWorkspace, IsChanged, LangStatus, Options, +// RemoveFile, Save, SaveIndexStatus, Service, State, UpdateFile +// +// Symbols that were MISSING before BUG-003 fix (typ.Funcs not iterated): +// +// LoadIndexStatus, NewService, NewState, LoadState +func TestRealPackage_IndexerKnownSymbols(t *testing.T) { + dir := realIndexerDir(t) + ca := NewCodeAnalyzer() + + res, err := ca.Analyze(context.Background(), dir) + require.NoError(t, err) + require.NotEmpty(t, res.Symbols, "pkg/indexer must produce symbols") + + indexed := make(map[string]pkgParser.Symbol) + for _, s := range res.Symbols { + indexed[s.Name] = s + } + + // ── Symbols confirmed IN Qdrant before the fix ────────────────────────── + knownPresent := []string{ + "CountAllFiles", "FileState", "GetFileState", "IndexFile", "IndexItems", + "IndexStatus", "IndexWorkspace", "IsChanged", "LangStatus", "Options", + "RemoveFile", "Save", "SaveIndexStatus", "Service", "State", "UpdateFile", + } + for _, name := range knownPresent { + _, ok := indexed[name] + assert.True(t, ok, "symbol %q was present in Qdrant DB and must still be parsed", name) + } + + // ── Symbols MISSING before fix (BUG-003 regression check) ──────────────── + // go/doc places these in typ.Funcs, not docPkg.Funcs. + // Before the fix (adding the typ.Funcs loop) none of these appeared in index. + bug003Fixed := []string{"LoadIndexStatus", "NewService", "NewState", "LoadState"} + for _, name := range bug003Fixed { + sym, ok := indexed[name] + assert.True(t, ok, + "BUG-003 regression: %q must now be indexed (go/doc puts it in typ.Funcs)", name) + if ok { + assert.True(t, sym.IsPublic, "%q must be IsPublic=true", name) + assert.Equal(t, "indexer", sym.Package, "%q must have package=indexer", name) + assert.NotEmpty(t, sym.FilePath, "%q must have FilePath set", name) + assert.Greater(t, sym.StartLine, 0, "%q must have StartLine > 0", name) + assert.Contains(t, sym.Signature, name, "%q signature must contain function name", name) + } + } +} + +// TestRealPackage_IndexerPublicPrivate verifies IsPublic correctness against +// the real pkg/indexer package. The Qdrant snapshot showed all 16 public +// symbols had is_public=true and 9 private symbols had is_public=false. +func TestRealPackage_IndexerPublicPrivate(t *testing.T) { + dir := realIndexerDir(t) + ca := NewCodeAnalyzer() + res, err := ca.Analyze(context.Background(), dir) + require.NoError(t, err) + + indexed := make(map[string]pkgParser.Symbol) + for _, s := range res.Symbols { + indexed[s.Name] = s + } + + // Confirmed PUBLIC in Qdrant (is_public=true) + publicSymbols := []string{ + "CountAllFiles", "FileState", "GetFileState", "IndexFile", "IndexItems", + "IndexStatus", "IndexWorkspace", "IsChanged", "LangStatus", "Options", + "RemoveFile", "Save", "SaveIndexStatus", "Service", "State", "UpdateFile", + // Fixed by BUG-003 patch — should now also be public + "LoadIndexStatus", "NewService", "NewState", "LoadState", + } + for _, name := range publicSymbols { + if sym, ok := indexed[name]; ok { + assert.True(t, sym.IsPublic, "%q must have IsPublic=true", name) + } + } + + // Confirmed PRIVATE in Qdrant (is_public=false) + privateSymbols := []string{ + "attemptOllamaRestart", "circuitBreakerThreshold", "deleteCollectionForRecreate", + "deleteCollectionMaxWait", "deleteCollectionTimeout", "ensureOllamaAlive", + "indexStatusFile", "symbolToMap", "unwrapOllamaProvider", + } + for _, name := range privateSymbols { + if sym, ok := indexed[name]; ok { + assert.False(t, sym.IsPublic, "%q must have IsPublic=false", name) + } + } +} + +// TestRealPackage_IndexerSignatures spot-checks that Go signatures are +// correctly extracted from the real pkg/indexer files. +// Expectations derived from Qdrant payload "signature" field (DB snapshot). +func TestRealPackage_IndexerSignatures(t *testing.T) { + dir := realIndexerDir(t) + ca := NewCodeAnalyzer() + res, err := ca.Analyze(context.Background(), dir) + require.NoError(t, err) + + indexed := make(map[string]pkgParser.Symbol) + for _, s := range res.Symbols { + indexed[s.Name] = s + } + + cases := []struct { + name string + wantParts []string // all must appear in Signature + }{ + // From Qdrant payload "signature" field: + {"SaveIndexStatus", []string{"SaveIndexStatus", "workspaceRoot", "IndexStatus"}}, + {"IndexWorkspace", []string{"IndexWorkspace", "root", "collection", "Options", "error"}}, + {"IndexFile", []string{"IndexFile", "collection", "path", "State", "int", "error"}}, + {"IndexItems", []string{"IndexItems", "collection", "error"}}, + {"CountAllFiles", []string{"CountAllFiles", "root", "excludePatterns", "map"}}, + // BUG-003 fixed — check constructor signatures too + {"NewService", []string{"NewService", "Service"}}, + {"LoadState", []string{"LoadState", "path", "State", "error"}}, + {"LoadIndexStatus", []string{"LoadIndexStatus", "workspaceRoot", "IndexStatus"}}, + {"NewState", []string{"NewState", "State"}}, + } + + for _, tc := range cases { + sym, ok := indexed[tc.name] + if !ok { + t.Errorf("symbol %q not found in parsed output", tc.name) + continue + } + for _, part := range tc.wantParts { + assert.True(t, strings.Contains(sym.Signature, part), + "signature of %q should contain %q; got: %q", tc.name, part, sym.Signature) + } + } +} + +// TestRealPackage_IndexerLineCoverage verifies that start/end lines are +// plausible for real functions in pkg/indexer/index_status.go. +// We check that lines are > 0 and roughly in the expected region of the file, +// rather than hardcoding exact numbers that break whenever the file is edited. +func TestRealPackage_IndexerLineCoverage(t *testing.T) { + dir := realIndexerDir(t) + ca := NewCodeAnalyzer() + res, err := ca.Analyze(context.Background(), dir) + require.NoError(t, err) + + indexed := make(map[string]pkgParser.Symbol) + for _, s := range res.Symbols { + indexed[s.Name] = s + } + + for _, name := range []string{"SaveIndexStatus", "LoadIndexStatus"} { + sym, ok := indexed[name] + require.True(t, ok, "%q must be indexed", name) + assert.Greater(t, sym.StartLine, 0, + "%q StartLine must be > 0", name) + assert.True(t, strings.HasSuffix(sym.FilePath, "index_status.go"), + "%q FilePath should end in index_status.go, got %q", name, sym.FilePath) + } +} + +// --------------------------------------------------------------------------- +// Interface / basic coverage tests (kept but updated to use real fixtures) +// --------------------------------------------------------------------------- + func TestCodeAnalyzer_Complete(t *testing.T) { tmpDir := t.TempDir() + // This code mirrors the structure of pkg/indexer (types + constructors + methods) + // to keep the synthetic fixture representative of real-world patterns. code := `package testpkg import "fmt" @@ -21,30 +211,37 @@ import "fmt" const Version = "1.0.0" var Debug = true -// Calculator provides mathematical operations +// Calculator provides mathematical operations. type Calculator struct { precision int } -// Add adds two numbers +// Add adds two numbers. func Add(a, b int) int { return a + b } -// Multiply multiplies two numbers +// Multiply multiplies two numbers. func (c *Calculator) Multiply(a, b int) int { return a * b } -// Subtractor is an interface for subtraction +// NewCalculator creates a new Calculator. +// go/doc will place this in Types["Calculator"].Funcs — BUG-003 pattern. +func NewCalculator(precision int) *Calculator { + return &Calculator{precision: precision} +} + +// Subtractor is an interface for subtraction. type Subtractor interface { Subtract(a, b int) int } + +var _ = fmt.Sprintf ` testFile := filepath.Join(tmpDir, "test.go") err := os.WriteFile(testFile, []byte(code), 0644) require.NoError(t, err) - ca := NewCodeAnalyzer() t.Run("Interface implementation", func(t *testing.T) { @@ -65,7 +262,6 @@ type Subtractor interface { symbols[s.Name+"_"+string(s.Type)] = s } - // Verify symbols assert.Contains(t, symbols, "Add_function") assert.Contains(t, symbols, "Calculator_type") assert.Contains(t, symbols, "Multiply_method") @@ -73,27 +269,41 @@ type Subtractor interface { assert.Contains(t, symbols, "Version_const") assert.Contains(t, symbols, "Debug_var") - // Verify function details + // BUG-003 regression check in synthetic fixture + assert.Contains(t, symbols, "NewCalculator_function", + "NewCalculator must be indexed (go/doc puts it in typ.Funcs)") + addFunc := symbols["Add_function"] assert.Equal(t, "testpkg", addFunc.Package) assert.Equal(t, "func Add(a int, b int) int", addFunc.Signature) - assert.Equal(t, "Add adds two numbers", addFunc.Docstring) + assert.Equal(t, "Add adds two numbers.", addFunc.Docstring) assert.Contains(t, addFunc.Content, "return a + b") - // Verify interface methods subtractor := symbols["Subtractor_type"] assert.Equal(t, "interface", subtractor.Metadata["kind"]) methods := subtractor.Metadata["methods"].([]MethodInfo) require.Len(t, methods, 1) assert.Equal(t, "Subtract", methods[0].Name) + + ctor := symbols["NewCalculator_function"] + assert.True(t, ctor.IsPublic) + assert.Equal(t, "testpkg", ctor.Package) + assert.Contains(t, ctor.Signature, "NewCalculator") }) - t.Run("AnalyzePackage", func(t *testing.T) { + t.Run("AnalyzePackage includes constructor functions", func(t *testing.T) { pkg, err := ca.AnalyzePackage(tmpDir) require.NoError(t, err) assert.Equal(t, "testpkg", pkg.Name) - assert.Contains(t, pkg.Imports, "fmt") - assert.Len(t, pkg.Functions, 2) // Add and Multiply (Multiply is method but also in Functions slice) + + funcNames := make(map[string]bool) + for _, fn := range pkg.Functions { + funcNames[fn.Name] = true + } + assert.True(t, funcNames["Add"]) + assert.True(t, funcNames["Multiply"]) + assert.True(t, funcNames["NewCalculator"], + "NewCalculator must appear — BUG-003 regression") assert.Len(t, pkg.Types, 2) }) @@ -118,9 +328,8 @@ func TestCodeAnalyzer_EdgeCases(t *testing.T) { badCode := `package bad; func {` err := os.WriteFile(filepath.Join(tmpDir, "bad.go"), []byte(badCode), 0644) require.NoError(t, err) - res, err := ca.Analyze(context.Background(), tmpDir) - assert.NoError(t, err) // Should skip bad file and return empty result + assert.NoError(t, err) assert.Empty(t, res.Symbols) }) diff --git a/pkg/parser/html/analyzer.go b/pkg/parser/html/analyzer.go index 2f73bd5..a9b5794 100644 --- a/pkg/parser/html/analyzer.go +++ b/pkg/parser/html/analyzer.go @@ -8,7 +8,6 @@ import ( "os" "path/filepath" "strings" - "github.com/PuerkitoBio/goquery" pkgParser "github.com/doITmagic/rag-code-mcp/pkg/parser" ) @@ -34,14 +33,20 @@ func (a *Analyzer) Name() string { return "html" } -// CanHandle returns true for .html and .htm files. +// CanHandle returns true for .html files. func (a *Analyzer) CanHandle(filePath string) bool { ext := strings.ToLower(filepath.Ext(filePath)) - return ext == ".html" || ext == ".htm" + switch ext { + case ".html", ".htm": + return true + default: + return false + } } -// Analyze extracts symbols (sections) from a file or directory. +// Analyze extracts symbols (sections) from an HTML file. func (a *Analyzer) Analyze(ctx context.Context, path string) (*pkgParser.Result, error) { + // HTML files: use goquery chunks, err := a.ca.AnalyzePaths([]string{path}) if err != nil { return nil, err @@ -67,6 +72,8 @@ func (a *Analyzer) Analyze(ctx context.Context, path string) (*pkgParser.Result, }, nil } + + // CodeAnalyzer handles the heavy lifting of HTML analysis. type CodeAnalyzer struct{} @@ -245,7 +252,12 @@ func (ca *CodeAnalyzer) shouldSkipDir(path, root string) bool { func (ca *CodeAnalyzer) isHTMLFile(name string) bool { lower := strings.ToLower(name) - return strings.HasSuffix(lower, ".html") || strings.HasSuffix(lower, ".htm") + for _, ext := range []string{".html", ".htm"} { + if strings.HasSuffix(lower, ext) { + return true + } + } + return false } func headingLevel(tag string) int { diff --git a/pkg/parser/javascript/analyzer.go b/pkg/parser/javascript/analyzer.go index aa58e3e..b6317c6 100644 --- a/pkg/parser/javascript/analyzer.go +++ b/pkg/parser/javascript/analyzer.go @@ -17,11 +17,22 @@ func init() { } // CodeAnalyzer implements parser.Analyzer for JavaScript/TypeScript -type CodeAnalyzer struct{} +type CodeAnalyzer struct { + tsParser *TreeSitterParser +} // NewCodeAnalyzer creates a new JS/TS code analyzer func NewCodeAnalyzer() *CodeAnalyzer { - return &CodeAnalyzer{} + return &CodeAnalyzer{ + tsParser: NewTreeSitterParser(), + } +} + +// ReleaseResources drops cached tree-sitter parsers so the GC can reclaim arena memory. +func (ca *CodeAnalyzer) ReleaseResources() { + if ca.tsParser != nil { + ca.tsParser.ReleaseResources() + } } // Name returns the analyzer name @@ -134,9 +145,8 @@ func (ca *CodeAnalyzer) analyzeFile(filePath string) (*fileAnalysis, error) { return ca.analyzeVueFile(filePath, content) } - // Try tree-sitter first (accurate AST parsing) - tsParser := NewTreeSitterParser() - fa, err := tsParser.ParseFile(content, filePath) + // Tree-sitter (accurate AST parsing) — use the cached parser instance + fa, err := ca.tsParser.ParseFile(content, filePath) if err == nil && fa != nil && (len(fa.Functions) > 0 || len(fa.Classes) > 0 || len(fa.Interfaces) > 0 || len(fa.Types) > 0 || len(fa.Enums) > 0) { // Tree-sitter succeeded — also extract imports/exports if not already done diff --git a/pkg/parser/javascript/node/analyzer.go b/pkg/parser/javascript/node/analyzer.go index 3f805fe..d676409 100644 --- a/pkg/parser/javascript/node/analyzer.go +++ b/pkg/parser/javascript/node/analyzer.go @@ -35,11 +35,15 @@ var ( ) // Analyzer detects Node.js/Express-specific patterns -type Analyzer struct{} +type Analyzer struct { + tsAnalyzer *TreeSitterAnalyzer +} // NewAnalyzer creates a new Node.js analyzer func NewAnalyzer() *Analyzer { - return &Analyzer{} + return &Analyzer{ + tsAnalyzer: NewTreeSitterAnalyzer(), + } } // IsNodeProject checks if a file looks like a Node.js project file @@ -53,8 +57,7 @@ func IsNodeProject(source string) bool { // Uses tree-sitter AST as primary engine, with regex fallback func (a *Analyzer) Analyze(source string, filePath string) *NodeInfo { // Try tree-sitter first - tsAnalyzer := NewTreeSitterAnalyzer() - info := tsAnalyzer.Analyze([]byte(source), filePath) + info := a.tsAnalyzer.Analyze([]byte(source), filePath) if info != nil && (len(info.Routes) > 0 || len(info.Requires) > 0 || len(info.ModuleExports) > 0) { return info } diff --git a/pkg/parser/javascript/node/treesitter.go b/pkg/parser/javascript/node/treesitter.go index 87246da..b530034 100644 --- a/pkg/parser/javascript/node/treesitter.go +++ b/pkg/parser/javascript/node/treesitter.go @@ -2,17 +2,35 @@ package node import ( "strings" + "sync" "github.com/odvcencio/gotreesitter" "github.com/odvcencio/gotreesitter/grammars" ) -// TreeSitterAnalyzer uses tree-sitter AST for Node.js/Express pattern detection -type TreeSitterAnalyzer struct{} +// TreeSitterAnalyzer uses tree-sitter AST for Node.js/Express pattern detection. +// Caches Parser instances per language to avoid re-allocating expensive lookup tables. +type TreeSitterAnalyzer struct { + mu sync.Mutex + parsers map[string]*gotreesitter.Parser +} // NewTreeSitterAnalyzer creates a new tree-sitter based Node.js analyzer func NewTreeSitterAnalyzer() *TreeSitterAnalyzer { - return &TreeSitterAnalyzer{} + return &TreeSitterAnalyzer{ + parsers: make(map[string]*gotreesitter.Parser), + } +} + +func (t *TreeSitterAnalyzer) getOrCreateParser(lang *grammars.LangEntry) *gotreesitter.Parser { + t.mu.Lock() + defer t.mu.Unlock() + if cached, ok := t.parsers[lang.Name]; ok { + return cached + } + p := gotreesitter.NewParser(lang.Language()) + t.parsers[lang.Name] = p + return p } // Analyze parses source with tree-sitter and extracts Node.js/Express patterns @@ -22,11 +40,12 @@ func (t *TreeSitterAnalyzer) Analyze(source []byte, filePath string) *NodeInfo { return nil } - parser := gotreesitter.NewParser(lang.Language()) + parser := t.getOrCreateParser(lang) tree, err := parser.Parse(source) if err != nil { return nil } + defer tree.Release() root := tree.RootNode() langObj := lang.Language() diff --git a/pkg/parser/javascript/react/analyzer.go b/pkg/parser/javascript/react/analyzer.go index 167f5e5..f096e25 100644 --- a/pkg/parser/javascript/react/analyzer.go +++ b/pkg/parser/javascript/react/analyzer.go @@ -59,11 +59,15 @@ var ( ) // Analyzer detects React-specific patterns -type Analyzer struct{} +type Analyzer struct { + tsAnalyzer *TreeSitterAnalyzer +} // NewAnalyzer creates a new React analyzer func NewAnalyzer() *Analyzer { - return &Analyzer{} + return &Analyzer{ + tsAnalyzer: NewTreeSitterAnalyzer(), + } } // IsReactFile checks if the file imports React or contains JSX @@ -80,8 +84,7 @@ func IsReactNativeFile(source string) bool { // Uses tree-sitter AST as primary engine, with regex fallback func (a *Analyzer) Analyze(source string, filePath string) *ReactInfo { // Try tree-sitter first (accurate AST-based detection) - tsAnalyzer := NewTreeSitterAnalyzer() - info := tsAnalyzer.Analyze([]byte(source), filePath) + info := a.tsAnalyzer.Analyze([]byte(source), filePath) if info != nil && (len(info.Components) > 0 || len(info.Hooks) > 0 || len(info.Contexts) > 0) { return info } diff --git a/pkg/parser/javascript/react/treesitter.go b/pkg/parser/javascript/react/treesitter.go index 6704d43..b9ff134 100644 --- a/pkg/parser/javascript/react/treesitter.go +++ b/pkg/parser/javascript/react/treesitter.go @@ -2,17 +2,42 @@ package react import ( "strings" + "sync" "github.com/odvcencio/gotreesitter" "github.com/odvcencio/gotreesitter/grammars" ) -// TreeSitterAnalyzer uses tree-sitter AST for accurate React/RN pattern detection -type TreeSitterAnalyzer struct{} +// TreeSitterAnalyzer uses tree-sitter AST for accurate React/RN pattern detection. +// Caches Parser instances per language to avoid re-allocating expensive lookup tables. +type TreeSitterAnalyzer struct { + mu sync.Mutex + parsers map[string]*gotreesitter.Parser +} // NewTreeSitterAnalyzer creates a new tree-sitter based React analyzer func NewTreeSitterAnalyzer() *TreeSitterAnalyzer { - return &TreeSitterAnalyzer{} + return &TreeSitterAnalyzer{ + parsers: make(map[string]*gotreesitter.Parser), + } +} + +func (t *TreeSitterAnalyzer) getOrCreateParser(lang *grammars.LangEntry) *gotreesitter.Parser { + t.mu.Lock() + defer t.mu.Unlock() + if cached, ok := t.parsers[lang.Name]; ok { + return cached + } + p := gotreesitter.NewParser(lang.Language()) + t.parsers[lang.Name] = p + return p +} + +// ReleaseResources drops cached tree-sitter parsers so the GC can reclaim arena memory. +func (t *TreeSitterAnalyzer) ReleaseResources() { + t.mu.Lock() + defer t.mu.Unlock() + t.parsers = make(map[string]*gotreesitter.Parser) } // Analyze parses source with tree-sitter and extracts React/RN patterns from the AST @@ -22,11 +47,12 @@ func (t *TreeSitterAnalyzer) Analyze(source []byte, filePath string) *ReactInfo return nil } - parser := gotreesitter.NewParser(lang.Language()) + parser := t.getOrCreateParser(lang) tree, err := parser.Parse(source) if err != nil { return nil } + defer tree.Release() root := tree.RootNode() langObj := lang.Language() diff --git a/pkg/parser/javascript/treesitter.go b/pkg/parser/javascript/treesitter.go index 602fb33..797d885 100644 --- a/pkg/parser/javascript/treesitter.go +++ b/pkg/parser/javascript/treesitter.go @@ -2,17 +2,45 @@ package javascript import ( "strings" + "sync" "github.com/odvcencio/gotreesitter" "github.com/odvcencio/gotreesitter/grammars" ) -// TreeSitterParser uses gotreesitter (pure Go, zero CGO) for accurate JS/TS AST parsing -type TreeSitterParser struct{} +// TreeSitterParser uses gotreesitter (pure Go, zero CGO) for accurate JS/TS AST parsing. +// It caches Parser instances per language to avoid re-allocating expensive lookup tables +// (~700KB per parser for JS/TS grammars) on every file. +type TreeSitterParser struct { + mu sync.Mutex + parsers map[string]*gotreesitter.Parser +} // NewTreeSitterParser creates a new tree-sitter based parser func NewTreeSitterParser() *TreeSitterParser { - return &TreeSitterParser{} + return &TreeSitterParser{ + parsers: make(map[string]*gotreesitter.Parser), + } +} + +// getOrCreateParser returns a cached parser for the given language, creating one if needed. +// gotreesitter.Parser is NOT safe for concurrent use, but indexing is sequential per language. +func (p *TreeSitterParser) getOrCreateParser(lang *grammars.LangEntry) *gotreesitter.Parser { + p.mu.Lock() + defer p.mu.Unlock() + if cached, ok := p.parsers[lang.Name]; ok { + return cached + } + parser := gotreesitter.NewParser(lang.Language()) + p.parsers[lang.Name] = parser + return parser +} + +// ReleaseResources drops cached tree-sitter parsers so the GC can reclaim arena memory. +func (p *TreeSitterParser) ReleaseResources() { + p.mu.Lock() + defer p.mu.Unlock() + p.parsers = make(map[string]*gotreesitter.Parser) } // ParseFile parses a JS/TS file using tree-sitter and returns extracted info @@ -22,11 +50,12 @@ func (p *TreeSitterParser) ParseFile(source []byte, filePath string) (*fileAnaly return nil, nil // unsupported extension } - parser := gotreesitter.NewParser(lang.Language()) + parser := p.getOrCreateParser(lang) tree, err := parser.Parse(source) if err != nil { return nil, err } + defer tree.Release() root := tree.RootNode() langObj := lang.Language() diff --git a/pkg/parser/javascript/typescript/analyzer.go b/pkg/parser/javascript/typescript/analyzer.go index 2f45ff8..26f2373 100644 --- a/pkg/parser/javascript/typescript/analyzer.go +++ b/pkg/parser/javascript/typescript/analyzer.go @@ -2,6 +2,7 @@ package typescript import ( "strings" + "sync" "github.com/odvcencio/gotreesitter" "github.com/odvcencio/gotreesitter/grammars" @@ -17,12 +18,29 @@ var utilityTypes = map[string]bool{ "ThisParameterType": true, "OmitThisParameter": true, } -// Analyzer detects TypeScript-specific patterns using tree-sitter -type Analyzer struct{} +// Analyzer detects TypeScript-specific patterns using tree-sitter. +// Caches Parser instances per language to avoid re-allocating expensive lookup tables. +type Analyzer struct { + mu sync.Mutex + parsers map[string]*gotreesitter.Parser +} // NewAnalyzer creates a new TypeScript analyzer func NewAnalyzer() *Analyzer { - return &Analyzer{} + return &Analyzer{ + parsers: make(map[string]*gotreesitter.Parser), + } +} + +func (a *Analyzer) getOrCreateParser(lang *grammars.LangEntry) *gotreesitter.Parser { + a.mu.Lock() + defer a.mu.Unlock() + if cached, ok := a.parsers[lang.Name]; ok { + return cached + } + p := gotreesitter.NewParser(lang.Language()) + a.parsers[lang.Name] = p + return p } // IsTypeScriptFile checks if a file is TypeScript @@ -44,11 +62,12 @@ func (a *Analyzer) Analyze(source string, filePath string) *TypeScriptInfo { } src := []byte(source) - parser := gotreesitter.NewParser(lang.Language()) + parser := a.getOrCreateParser(lang) tree, err := parser.Parse(src) if err != nil { return &TypeScriptInfo{} } + defer tree.Release() root := tree.RootNode() langObj := lang.Language() diff --git a/pkg/parser/javascript/vue/analyzer.go b/pkg/parser/javascript/vue/analyzer.go index 606a711..15aa4b9 100644 --- a/pkg/parser/javascript/vue/analyzer.go +++ b/pkg/parser/javascript/vue/analyzer.go @@ -69,11 +69,15 @@ var ( ) // Analyzer detects Vue.js-specific patterns -type Analyzer struct{} +type Analyzer struct { + tsAnalyzer *TreeSitterAnalyzer +} // NewAnalyzer creates a new Vue.js analyzer func NewAnalyzer() *Analyzer { - return &Analyzer{} + return &Analyzer{ + tsAnalyzer: NewTreeSitterAnalyzer(), + } } // IsVueFile checks if a file is a Vue SFC or contains Vue patterns @@ -109,8 +113,7 @@ func (a *Analyzer) Analyze(source string, filePath string) *VueInfo { } // Try tree-sitter first for script content parsing - tsAnalyzer := NewTreeSitterAnalyzer() - tsInfo := tsAnalyzer.AnalyzeScript([]byte(scriptContent), filePath, isSetup) + tsInfo := a.tsAnalyzer.AnalyzeScript([]byte(scriptContent), filePath, isSetup) if tsInfo != nil && (len(tsInfo.Composables) > 0 || len(tsInfo.Components) > 0 || tsInfo.Store != nil) { // Merge tree-sitter results with SFC info info.IsVue3 = tsInfo.IsVue3 diff --git a/pkg/parser/javascript/vue/treesitter.go b/pkg/parser/javascript/vue/treesitter.go index cae06a9..e124ae7 100644 --- a/pkg/parser/javascript/vue/treesitter.go +++ b/pkg/parser/javascript/vue/treesitter.go @@ -3,17 +3,35 @@ package vue import ( "regexp" "strings" + "sync" "github.com/odvcencio/gotreesitter" "github.com/odvcencio/gotreesitter/grammars" ) -// TreeSitterAnalyzer uses tree-sitter AST for Vue.js script content parsing -type TreeSitterAnalyzer struct{} +// TreeSitterAnalyzer uses tree-sitter AST for Vue.js script content parsing. +// Caches Parser instances per language to avoid re-allocating expensive lookup tables. +type TreeSitterAnalyzer struct { + mu sync.Mutex + parsers map[string]*gotreesitter.Parser +} // NewTreeSitterAnalyzer creates a new tree-sitter based Vue analyzer func NewTreeSitterAnalyzer() *TreeSitterAnalyzer { - return &TreeSitterAnalyzer{} + return &TreeSitterAnalyzer{ + parsers: make(map[string]*gotreesitter.Parser), + } +} + +func (t *TreeSitterAnalyzer) getOrCreateParser(lang *grammars.LangEntry) *gotreesitter.Parser { + t.mu.Lock() + defer t.mu.Unlock() + if cached, ok := t.parsers[lang.Name]; ok { + return cached + } + p := gotreesitter.NewParser(lang.Language()) + t.parsers[lang.Name] = p + return p } // AnalyzeScript parses the