Skip to content

Commit 90419b7

Browse files
authored
Merge pull request #40 from doITmagic/refactor/indexing-progress
Refactor: Replace progressStore with file-based IndexStatus **Key Changes & Impact:** - **Persistent Indexing Progress**: Replaced the in-memory `progressStore` with a decoupled, file-based `index_status.json` approach. This allows external clients/agents to track workspace indexing progress reliably. - **Atomic & Optimized Disk I/O**: Implemented "write-to-temp-then-rename" atomic updates for the status file. Added throttling (writing every 10 files) to eliminate race conditions and avoid unnecessary disk flooding. - **Incremental Indexing Stability**: Modifying a single file now cleanly updates the status in the background without wiping the pre-existing language statistics (preserving the `Languages` state). - **Python Parser (Tree-Sitter) Workarounds**: Fixed extraction failures by patching `gotreesitter`'s inability to parse `except X as e`, while strictly preserving byte offsets. Robustified `func`/`class` dependency and call extraction. - **Code Health & Testing**: - Relaxed overly strict, brittle line-number assertions in the test suite. - Added new test coverage for `treesitter` robustness logic (`patchExceptAs`, `extractCallsFromNode`). - Addressed `golangci-lint` warnings across the board (e.g., arena GC drain, unused HTML parser fields). - Explicitly drain `gotreesitter` memory arenas to prevent long-term memory leaks.
2 parents 522a68f + b3cd229 commit 90419b7

78 files changed

Lines changed: 3621 additions & 2382 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 0 additions & 20 deletions
This file was deleted.

.github/copilot-instructions.md

Lines changed: 0 additions & 33 deletions
This file was deleted.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,9 @@ Thumbs.db
5454
.env
5555
.env.local
5656

57+
# Local project config (not for VCS)
58+
.trello.json
59+
5760
# Temporary files
5861
tmp/
5962
temp/

BUGS.md

Lines changed: 312 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,312 @@
1+
# RagCode MCP — Bug Tracker
2+
3+
This file documents confirmed bugs in the RagCode MCP server, with concrete reproduction examples and expected behavior.
4+
5+
---
6+
7+
## BUG-001: `rag_list_package_exports` falsely returns "No exported symbols found" for indexed Go packages
8+
9+
**Status:** ✅ Fixed (2026-03-09)
10+
**Date confirmed:** 2026-03-09
11+
**Affected tool:** `mcp_ragcode_rag_list_package_exports`
12+
**Severity:** Medium — produced incorrect responses that could mislead AI consumers
13+
**Fixed in:** `internal/service/tools/list_package_exports.go`
14+
15+
### Description
16+
17+
The `rag_list_package_exports` tool reports that a Go package contains no exported symbols, even though the source files contain public structs, functions, and variables (capitalized identifiers).
18+
19+
### Steps to reproduce
20+
21+
**Tool call input:**
22+
```json
23+
{
24+
"file_path": "/home/razvan/go/src/github.com/doITmagic/rag-code-mcp/pkg/indexer/service.go",
25+
"package": "github.com/doITmagic/rag-code-mcp/pkg/indexer"
26+
}
27+
```
28+
29+
**Response received (incorrect):**
30+
```json
31+
{
32+
"status": "success",
33+
"message": "No exported symbols found in package 'github.com/doITmagic/rag-code-mcp/pkg/indexer'",
34+
"context": {
35+
"workspace_root": "/home/razvan/go/src/github.com/doITmagic/rag-code-mcp",
36+
"detection_source": "file_path",
37+
"indexing_progress": {
38+
"started_at": "2026-03-09T07:33:28Z",
39+
"languages": {
40+
"go": {
41+
"on_disk": 232,
42+
"changed": 0,
43+
"processed": 0
44+
}
45+
}
46+
}
47+
}
48+
}
49+
```
50+
51+
### Actual exported symbols in `pkg/indexer/` (verified with grep)
52+
53+
Verified using `grep -rn "^(func|type|var|const)\s+[A-Z]"` on the `pkg/indexer/` directory:
54+
55+
**`service.go`:**
56+
```go
57+
type Options struct { ... } // line 31
58+
type Service struct { ... } // line 40
59+
func NewService(embedder llm.Provider, store storage.VectorStore) *Service // line 47
60+
```
61+
62+
**`state.go`:**
63+
```go
64+
type FileState struct { ... } // line 12
65+
type State struct { ... } // line 21
66+
func NewState() *State // line 27
67+
func LoadState(path string) (*State, error) // line 34
68+
```
69+
70+
**`index_status.go`:**
71+
```go
72+
type IndexStatus struct { ... } // line 16
73+
type LangStatus struct { ... } // line 25
74+
func SaveIndexStatus(workspaceRoot string, status *IndexStatus) // line 32
75+
func LoadIndexStatus(workspaceRoot string) *IndexStatus // line 54
76+
```
77+
78+
### Root cause (confirmed)
79+
80+
Verified by querying the vector database directly — **the data IS indexed**. A `rag_search` for `LangStatus`, `IndexStatus`, `SaveIndexStatus` returns results with scores of 0.86–0.94, sourced from `_source: "both"` (semantic + exact match). The data is in the index.
81+
82+
The real bug is a **package name mismatch** in `internal/service/tools/list_package_exports.go`:
83+
84+
```go
85+
// The tool builds an exact-match filter using the full Go import path:
86+
filter := map[string]interface{}{
87+
"package": packageName, // e.g. "github.com/doITmagic/rag-code-mcp/pkg/indexer"
88+
}
89+
allResults, err := t.engine.ExactSearchPolyglot(ctx, wctx.ID, filter, 1000)
90+
```
91+
92+
However, the vector index stores the short package name, not the full import path:
93+
```json
94+
{ "name": "LangStatus", "package": "indexer", ... }
95+
```
96+
97+
The filter `"package": "github.com/doITmagic/rag-code-mcp/pkg/indexer"` never matches `"package": "indexer"``allResults` is always empty → the tool returns `"No exported symbols found"`.
98+
99+
### Applied fix
100+
101+
**File:** `internal/service/tools/list_package_exports.go`
102+
103+
```diff
104+
- filter := map[string]interface{}{
105+
- "package": packageName,
106+
- }
107+
+ // The index stores the short package name (e.g. "indexer"), not the full Go
108+
+ // import path (e.g. "github.com/doITmagic/rag-code-mcp/pkg/indexer").
109+
+ // Normalize by taking the last path segment so both forms work.
110+
+ filterPackage := packageName
111+
+ if idx := strings.LastIndex(packageName, "/"); idx >= 0 {
112+
+ filterPackage = packageName[idx+1:]
113+
+ }
114+
+ filter := map[string]interface{}{
115+
+ "package": filterPackage,
116+
+ }
117+
```
118+
119+
This fix is backward-compatible: if the caller passes only the short name (e.g. `"indexer"`), `strings.LastIndex` returns `-1` and `filterPackage` is unchanged.
120+
121+
---
122+
123+
## BUG-002: `indexing_progress.changed` reports `0` even when files exist on disk
124+
125+
**Status:** Confirmed (related to BUG-001)
126+
**Date confirmed:** 2026-03-09
127+
**Affected tools:** All MCP tools that include `indexing_progress` in their response context
128+
**Severity:** Low — incorrect diagnostic information; does not directly affect search results
129+
130+
### Description
131+
132+
The `indexing_progress.languages.<lang>.changed` field may report `0` even though files are present on disk and may have been modified since the last full indexing run. This is because the metric reflects how many files were processed in the **current** indexing session, not how many differ from the last indexed state.
133+
134+
### Example
135+
136+
```json
137+
"go": {
138+
"on_disk": 232, // 232 Go files present on disk
139+
"changed": 0, // no changes detected — misleading
140+
"processed": 0 // nothing processed in this session
141+
}
142+
```
143+
144+
In reality, the index may be completely stale — all 232 files could be unindexed — yet `changed` and `processed` both report `0` because no indexing session was triggered.
145+
146+
### Expected behavior
147+
148+
`changed` should reflect the number of files that differ from the last indexed snapshot (by `mtime` or content hash), not just files processed in the current in-flight session.
149+
150+
---
151+
152+
*Last updated: 2026-03-09 — BUG-001 fixed*
153+
154+
---
155+
156+
## BUG-003: Top-level Go functions with no AST relations are missing from the vector index
157+
158+
**Status:** ✅ Fixed (2026-03-10, PR #40)
159+
**Date confirmed:** 2026-03-09
160+
**Affected component:** Go parser / indexer (`pkg/indexer`, `internal/parser`)
161+
**Severity:** Medium — `rag_list_package_exports` and `rag_search` silently omit exported constructor/loader functions
162+
163+
### Description
164+
165+
Some exported top-level Go functions are never written to the vector database by the indexer. They exist in source on disk, they are syntactically exported (capitalized name), but searching the vector store for them returns no dedicated entry — they appear only embedded inside the body content of *other* functions that call them.
166+
167+
### Affected symbols (confirmed via direct vector DB search)
168+
169+
All from `pkg/indexer/`:
170+
171+
| Symbol | File | Indexed? | Notes |
172+
|---|---|---|---|
173+
| `SaveIndexStatus` | `index_status.go:32` | ✅ yes | 6 AST relations |
174+
| `LoadIndexStatus` | `index_status.go:54` |**no** | 0 dedicated index entry |
175+
| `NewService` | `service.go:47` |**no** | `rag_find_usages` explicitly returned "No usages found" |
176+
| `NewState` | `state.go:27` |**no** | No dedicated index entry |
177+
| `LoadState` | `state.go:34` |**no** | No dedicated index entry |
178+
179+
### Diagnostic evidence
180+
181+
1. `rag_list_package_exports` for `pkg/indexer` returns 16 symbols — none of the 4 missing functions appear.
182+
2. `rag_find_usages("NewService")` returns: `"No usages found for symbol 'NewService' based on Code Graph relations."` — the symbol has **zero AST relation entries** in Qdrant.
183+
3. `rag_search` for `"func LoadIndexStatus"` only returns entries where `LoadIndexStatus` appears **in the body** of other functions (e.g. `engine.GetIndexStatus`, `engine.StartIndexingAsync`), never as a standalone symbol.
184+
4. `SaveIndexStatus` (same file, same pattern) **is** indexed with 6 relations — confirming the issue is not file-level but symbol-level.
185+
186+
### Root cause (confirmed via direct Qdrant query)
187+
188+
**Direct Qdrant scroll on the collection reveals 25 points for package `indexer`.** Full list sorted by name confirms:
189+
190+
- `LangStatus``rel_count: 0`, **IS indexed**
191+
- `circuitBreakerThreshold` (private const) → `rel_count: 0`, **IS indexed**
192+
- `deleteCollectionTimeout` (private const) → `rel_count: 0`, **IS indexed**
193+
194+
This **disproves** the relation-count-as-threshold hypothesis. Symbols with zero relations *are* indexed — the missing functions are simply absent.
195+
196+
**The pattern that distinguishes missing vs present functions:**
197+
198+
| Symbol | Indexed? | Called from outside the package? |
199+
|---|---|---|
200+
| `SaveIndexStatus` || Yes — called from `engine.go` (different package) |
201+
| `LoadIndexStatus` || Only called from within `pkg/indexer/` itself |
202+
| `NewService` || Not tracked (0 AST relations despite being called from `engine.go`) |
203+
| `NewState` || Only called from within `pkg/indexer/service.go` |
204+
| `LoadState` || Only called from within `pkg/indexer/service.go` |
205+
206+
**Exact root cause found in `pkg/parser/go/analyzer.go`:**
207+
208+
The `go/doc` package automatically associates constructor/loader functions with the type they return:
209+
- `NewService() *Service` → placed in `docPkg.Types["Service"].Funcs` by `go/doc`
210+
- `LoadState() *State` → placed in `docPkg.Types["State"].Funcs` by `go/doc`
211+
- `NewState() *State` → placed in `docPkg.Types["State"].Funcs` by `go/doc`
212+
- `LoadIndexStatus() *IndexStatus` → placed in `docPkg.Types["IndexStatus"].Funcs` by `go/doc`
213+
214+
These functions **never appear** in `docPkg.Funcs` (top-level functions list).
215+
216+
In `AnalyzePackage` (lines 126–141), the type-processing loop iterates `typ.Methods` but **never `typ.Funcs`**:
217+
218+
```go
219+
// pkg/parser/go/analyzer.go lines 126-141
220+
for _, typ := range docPkg.Types {
221+
typeInfo := ca.analyzeTypeDecl(fset, typ, astFuncMap)
222+
typeIdx := len(info.Types)
223+
info.Types = append(info.Types, typeInfo)
224+
225+
// ✅ Methods are processed
226+
for _, method := range typ.Methods {
227+
methodInfo := ca.analyzeFunctionDecl(fset, method, astFuncMap, typ.Name)
228+
info.Functions = append(info.Functions, methodInfo)
229+
}
230+
// ❌ typ.Funcs (constructors like NewService, LoadState) are NEVER processed!
231+
}
232+
```
233+
234+
`SaveIndexStatus` works because it returns `void` (no associated type), so `go/doc` places it in `docPkg.Funcs` — the only list that IS iterated at line 120.
235+
236+
### Fix (exact, minimal)
237+
238+
In `AnalyzePackage` in `pkg/parser/go/analyzer.go`, add iteration over `typ.Funcs` inside the type loop:
239+
240+
```diff
241+
for _, typ := range docPkg.Types {
242+
typeInfo := ca.analyzeTypeDecl(fset, typ, astFuncMap)
243+
typeIdx := len(info.Types)
244+
info.Types = append(info.Types, typeInfo)
245+
246+
for _, method := range typ.Methods {
247+
methodInfo := ca.analyzeFunctionDecl(fset, method, astFuncMap, typ.Name)
248+
methodInfo.IsMethod = true
249+
methodInfo.Receiver = typ.Name
250+
info.Functions = append(info.Functions, methodInfo)
251+
info.Types[typeIdx].Methods = append(info.Types[typeIdx].Methods,
252+
ca.convertFunctionToMethodInfo(methodInfo, typ.Name))
253+
}
254+
+
255+
+ // Process constructor/factory functions associated with this type
256+
+ // (go/doc moves New*, Load*, etc. here from the top-level Funcs list)
257+
+ for _, fn := range typ.Funcs {
258+
+ fnInfo := ca.analyzeFunctionDecl(fset, fn, astFuncMap)
259+
+ info.Functions = append(info.Functions, fnInfo)
260+
+ }
261+
}
262+
```
263+
264+
### Note on tree-sitter
265+
266+
**tree-sitter is NOT needed** for this fix. The existing `go/ast` + `go/doc` approach is correct and more accurate than tree-sitter for Go — it's the standard library, built into the Go toolchain. The only problem is the missing `typ.Funcs` loop, which is a one-line fix.
267+
268+
---
269+
270+
*Last updated: 2026-03-09 — BUG-001 fixed, BUG-003 added*
271+
272+
---
273+
274+
## BUG-004: AST Fallback Search and Indexer do not exclude unconfigured directories like `inspirations/`
275+
276+
**Status:** Open
277+
**Date confirmed:** 2026-03-09
278+
**Affected component:** `FallbackDirectSearch` (`internal/service/engine/engine_fallback_search.go`) and `IndexWorkspace` (`pkg/indexer/service.go`)
279+
**Severity:** Medium — causes irrelevant, old, or draft code to pollute semantic and fallback search results.
280+
281+
### Description
282+
283+
When performing a search that falls back to the AST (e.g. while `go` files are `processed: 0`), RAGCode can return results from the `inspirations/` directory (or other directories that should logically be ignored). This happens because `filepath.WalkDir` relies entirely on a hardcoded list of `excludePatterns` loaded from `config.Workspace.ExcludePatterns`, alongside a basic check for `.`, `vendor`, and `node_modules`.
284+
285+
### Example
286+
287+
Searching for the processing of `state.json` via `rag_search` returned a fallback result pointing to:
288+
`/home/razvan/go/src/github.com/doITmagic/rag-code-mcp/inspirations/rag-code-mcp/internal/workspace/state.go`
289+
instead of the actual code in `pkg/indexer/state.go`.
290+
291+
### Root Cause
292+
In both `internal/service/engine/engine_fallback_search.go` (lines 88-103) and `pkg/indexer/service.go` (lines 72-88), the exclusion logic is implemented manually:
293+
```go
294+
if d.IsDir() {
295+
name := d.Name()
296+
if strings.HasPrefix(name, ".") || name == "vendor" || name == "node_modules" {
297+
return filepath.SkipDir
298+
}
299+
for _, p := range excludePatterns {
300+
if name == p {
301+
return filepath.SkipDir
302+
}
303+
}
304+
return nil
305+
}
306+
```
307+
If `inspirations` or other custom draft folders are not explicitly provided in the YAML config `exclude_patterns`, they are scanned by the fallback module and indexer. The system **does not automatically parse `.ragcodeignore` or `.gitignore`**, nor does it have a default ignore list for common draft/backup directories like `inspirations`.
308+
309+
### Proposed Fix
310+
1. Ensure that `.gitignore` or `.ragcodeignore` files are parsed and respected during the `filepath.WalkDir` traversal.
311+
2. Consider adding `inspirations` and `drafts` strings to the default hardcoded exclusions if they represent common anti-patterns for this specific repo, or automatically bundle `.gitignore` rules into the `excludePatterns` array at startup.
312+

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,11 +84,11 @@ RagCode V2 isn't just a vector database wrapper. It features deep language under
8484
## Supported Languages
8585

8686
- **Go**: Complete native AST support
87-
- **PHP**: Vanilla PHP, Laravel, WordPress (Hooks, Widgets, WooCommerce, Oxygen Builder)
87+
- **PHP**: Vanilla PHP, Laravel (Eloquent, Routes, Controllers), WordPress (Hooks, Widgets, WooCommerce, Oxygen Builder)
8888
- **JavaScript & TypeScript**: Vanilla JS/TS, Node.js, React, React Native, Next.js, Vue
8989
- **Python**: Complete native AST support
90-
- **HTML & Markdown**: Structural documentation mappings
91-
- **Generic Support**: CSS, JSON, YAML, Shell scripts, SQL
90+
- **HTML & CSS**: HTML structural mappings, CSS/SCSS/SASS/LESS via tree-sitter
91+
- **Documentation**: Markdown, JSON, YAML, XML, TOML, reStructuredText
9292

9393
---
9494

0 commit comments

Comments
 (0)