Skip to content

refactor: Replace progressStore with file-based IndexStatus#40

Merged
doITmagic merged 28 commits into
devfrom
refactor/indexing-progress
Mar 13, 2026
Merged

refactor: Replace progressStore with file-based IndexStatus#40
doITmagic merged 28 commits into
devfrom
refactor/indexing-progress

Conversation

@doITmagic
Copy link
Copy Markdown
Owner

Description

Replaces the old, complex progressStore indexing progress mechanism with a simple file-based IndexStatus system.

Problem

The previous system had two separate file walks (one for counting, one for indexing), an in-memory progressStore with preRegister/update/carry-over/flusher logic, and multiple intermediate structs (IndexingProgressSummary, LangProgressItem). This made progress reporting confusing, inaccurate, and hard to maintain.

Solution

  • Single source of truth: {workspaceRoot}/.ragcode/index_status.json — a simple JSON file written by the indexer during processing
  • Direct write from indexer: The Progress callback in pkg/indexer/service.go writes OnDisk, Changed, Processed counts per language directly to the status file
  • Direct read by tools: MCP tools read the status file via indexer.LoadIndexStatus() — no intermediate transformations
  • Lifecycle: startingrunning (first file processed) → completed/failed

What was removed

  • progressStore (preRegister, update, carry-over, flusher goroutine)
  • IndexingProgressSummary, LangProgressItem, BuildIndexingProgress, formatAge, buildIndexingMessage
  • Auto-resume logic from SearchCode/HybridSearchCode (redundant with DetectContext which already triggers re-indexing on first tool call)
  • resumeAttempts field from Engine struct

What was added

  • pkg/indexer/index_status.goIndexStatus, LangStatus, SaveIndexStatus, LoadIndexStatus
  • pkg/indexer/index_status_test.go — round-trip and missing file tests
  • Progress callback wiring in engine.IndexWorkspace

Files changed (22 files, +208 / -1182 lines)

  • Moved: engine/index_progress.gopkg/indexer/index_status.go
  • Modified: engine.go, response.go, smart_search.go, smart_search_pipeline.go, index_workspace.go, list_package_exports.go, find_usages.go, call_hierarchy.go, evaluate_ragcode.go, skills.go, read_file_context.go
  • Tests updated: engine_searchcode_test.go, engine_*_test.go, health_metrics_test.go, detector_test.go

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist:

  • I have performed a self-review of my own code
  • I have formatted my code with go fmt ./...
  • I have run tests go test ./... and they pass
  • I have verified integration with Ollama/Qdrant (if applicable)
  • I have updated the documentation accordingly

- Remove progressStore (preRegister, update, carry-over, flusher)
- Remove IndexingProgressSummary, BuildIndexingProgress, formatAge, buildIndexingMessage
- Remove auto-resume from SearchCode/HybridSearchCode (redundant with DetectContext)
- Remove resumeAttempts field from Engine
- Add IndexStatus/LangStatus/SaveIndexStatus/LoadIndexStatus in pkg/indexer/
- Indexer writes OnDisk/Changed/Processed via Progress callback
- Tools read status directly from .ragcode/index_status.json
- Fix TestDetectNoMarkers with AllowedRoots isolation
Copilot AI review requested due to automatic review settings March 8, 2026 23:04
@doITmagic doITmagic self-assigned this Mar 8, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces the in-memory progressStore indexing progress mechanism (with preRegister, flusher goroutine, carry-over logic) with a simpler file-based IndexStatus system in pkg/indexer/index_status.go. MCP tools now read status directly from {workspaceRoot}/.ragcode/index_status.json via indexer.LoadIndexStatus(). This is a significant simplification that removes ~1000 lines of complex concurrency code.

Changes:

  • New pkg/indexer/index_status.go with IndexStatus/LangStatus structs and SaveIndexStatus/LoadIndexStatus functions, replacing the old engine/index_progress.go with its progressStore, flusher goroutine, and deep-copy logic
  • All MCP tools updated to use the new IndexingStatus field (backed by indexer.IndexStatus) instead of IndexingProgress (backed by IndexingProgressSummary), with the JSON tag preserved as "indexing_progress" for backward compatibility
  • Removed auto-resume logic from SearchCode/HybridSearchCode and the resumeAttempts throttle, as workspace re-indexing is now triggered via DetectContext

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pkg/indexer/index_status.go New file: IndexStatus/LangStatus types, SaveIndexStatus, LoadIndexStatus
pkg/indexer/index_status_test.go New file: round-trip and missing-file tests
internal/service/engine/engine.go Remove progressStore/resumeAttempts, add GetIndexStatus, wire Progress callback with file-based status
internal/service/engine/index_progress.go Deleted: old progressStore and all related types/functions
internal/service/engine/index_progress_test.go Deleted: tests for removed progressStore
internal/service/tools/response.go Replace IndexingProgressSummary with IndexingStatus, rename helper to ContextFromWorkspaceWithStatus
internal/service/tools/smart_search_pipeline.go Use LoadIndexStatus, replace dynamic fallback note with static string
internal/service/tools/smart_search.go Simplify indexing error messages, remove progress attachment
internal/service/tools/find_usages.go Switch to GetIndexStatus/ContextFromWorkspaceWithStatus
internal/service/tools/call_hierarchy.go Switch to GetIndexStatus/ContextFromWorkspaceWithStatus
internal/service/tools/list_package_exports.go Switch to ContextFromWorkspaceWithStatus, remove idx-based status override
internal/service/tools/index_workspace.go Use LoadIndexStatus directly
internal/service/tools/skills.go Set IndexingStatus: nil
internal/service/tools/evaluate_ragcode.go Remove unused wctxID/wctxRoot, set IndexingStatus: nil
internal/service/tools/read_file_context.go Set IndexingStatus: nil
internal/service/tools/tests/health_metrics_test.go Remove tests for deleted types/functions
internal/service/engine/engine_searchcode_test.go Remove auto-resume test
internal/service/engine/engine_nonblocking_search_test.go Remove progress.stop() cleanup
internal/service/engine/engine_fallback_search_test.go Remove progress.stop() cleanup
internal/service/engine/engine_sticky_test.go Remove progress.stop() cleanup
pkg/workspace/detector/detector_test.go Isolate test to avoid picking up .ragcode markers from parent dirs
cmd/rag-code-mcp/main.go Version bump to 2.1.63

Comment thread internal/service/tools/smart_search_pipeline.go Outdated
Comment thread pkg/indexer/index_status.go Outdated
Comment thread internal/service/tools/call_hierarchy.go Outdated
Comment thread internal/service/engine/engine.go
Comment thread internal/service/engine/engine.go Outdated
Comment thread internal/service/tools/response.go Outdated
Comment thread internal/service/tools/response.go Outdated
Comment thread internal/service/engine/engine.go Outdated
Comment thread internal/service/tools/find_usages.go Outdated
The Progress callback received totalFiles = len(changedFiles), which only
counts modified files needing re-indexing. This was incorrectly assigned
to OnDisk, causing on_disk: 1 when only 1 file changed — despite 232 Go
files and 655 docs on disk.
Fix:
- Call CountAllFiles() once before the language loop for real disk totals
- Pre-populate index_status.json with on_disk counts at indexing start
- Use diskTotal (pre-counted) for OnDisk, totalFiles for Changed
- Languages with 0 changed files now correctly show their disk totals
@doITmagic doITmagic requested a review from Copilot March 8, 2026 23:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

razvan added 2 commits March 9, 2026 08:56
- Fix nil panic in ContextFromWorkspaceWithStatus when wctx is nil (#7)
- Fix indentation in smart_search_pipeline.go (#1)
- Use loaded idx instead of nil in call_hierarchy.go and find_usages.go (#3, #9)
- Add backward-compat comment on JSON tag mismatch (#6)
- Create fresh IndexStatus when LoadIndexStatus returns nil (#8)
- Populate Elapsed field at completed/failed transitions (#2)
- Throttle progress I/O writes to every 10 files (#4)
- Fix test cleanup for .ragcode dir in TempDir
- Removed the 'State' field ('starting', 'running', 'completed', 'failed') from IndexStatus entirely.
- This state was misleading for AI consumers, especially during incremental re-indexing (which reset state to 'starting' even if the index was 99% complete), causing AI agents to prematurely abandon tools.
- Simplified engine.go progress callbacks and terminal states to only log timestamps and errors, rather than a potentially confusing overall state keyword.
- Updated related tests to match the simplified struct.
Copilot AI review requested due to automatic review settings March 9, 2026 07:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (2)

internal/service/tools/call_hierarchy.go:136

  • The check if idx != nil at line 133 is now semantically incorrect. With the old in-memory progressStore, GetIndexProgress returned nil when no indexing was active. With the new file-based system, GetIndexStatus reads from disk and returns non-nil even for a completed previous run (the file persists). This means the response will always be indexing_in_progress when no collections exist but a previous run left an index_status.json file — even if indexing completed hours ago.

This needs to check the actual state (e.g., EndedAt is empty and Error is empty) to determine if indexing is truly in progress. Without a State field on IndexStatus, you'd need something like: if idx != nil && idx.EndedAt == "" && idx.Error == "".

			if idx != nil {
				resp.Status = "indexing_in_progress"
				resp.Data = map[string]any{"indexing": idx}
			}

internal/service/tools/find_usages.go:105

  • Same issue as in call_hierarchy.go: if idx != nil at line 103 will now be true even for a completed previous indexing run (the file persists on disk), incorrectly changing the status from indexing_required to indexing_in_progress. This check needs to verify that indexing is actually ongoing (e.g., idx.EndedAt == "").
			if idx != nil {
				resp.Status = "indexing_in_progress"
			}

Comment thread internal/service/engine/engine.go
Comment thread internal/service/engine/engine.go Outdated
Comment thread internal/service/tools/list_package_exports.go
Comment thread internal/service/tools/smart_search.go
Comment thread internal/service/tools/skills.go Outdated
Comment thread internal/service/engine/engine_nonblocking_search_test.go Outdated
Comment thread internal/service/engine/engine_fallback_search_test.go Outdated
Comment thread pkg/indexer/index_status.go
Comment thread internal/service/engine/engine.go Outdated
razvan added 2 commits March 9, 2026 11:52
BUG-001 (list_package_exports): normalize full import path to short
package name before querying Qdrant. The index stores 'indexer', not
'github.com/doITmagic/rag-code-mcp/pkg/indexer'.

BUG-003 (Go parser): go/doc automatically moves constructor/loader
functions (NewX, LoadX) that return *T from docPkg.Funcs into
docPkg.Types[T].Funcs. The parser only iterated typ.Methods, so these
functions were silently dropped and never written to the vector index.

Fix: add a typ.Funcs loop in AnalyzePackage() after the methods loop.

Affected symbols confirmed missing from Qdrant before fix:
  LoadIndexStatus, NewService, NewState, LoadState (pkg/indexer)

Tests: expanded analyzer_test.go to use real pkg/indexer code as
fixture with expectations anchored to the Qdrant DB snapshot
(25 points, 2026-03-09). Added regression tests for BUG-003,
IsPublic correctness, signature accuracy, and line coverage.
Engine (BUG-004):
- StartIndexingAsync now queues recreate=true as pendingOverflow when a job
  is already running, instead of silently dropping the request
- Fix all flaky engine test cleanups: properly wait for background goroutines
  from BOTH engine instances with time.Sleep before TempDir removal
- Add tests: TestStartIndexingAsyncRecreateQueues/StartsImmediately

Python parser (treesitter.go):
- Add patchExceptAs workaround for gotreesitter v0.6.0 broken AST on except-as
- Extract module-level variables/constants (extractAssignment/extractAssignmentDirect)
- Extract class variables from class body blocks
- Extract function/method calls for Code Graph relations (rag_find_usages)
- Detect generators via nodeContainsType(yield)
- Parse metaclass= keyword arguments in class bases
- Refactor docstring extraction with stripDocstringQuotes helper
- Handle gotreesitter putting string nodes directly in blocks (no wrapper)

Python parser (extract.go):
- Refactor getIndentation to use tagged switch
Copilot AI review requested due to automatic review settings March 9, 2026 13:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (2)

internal/service/tools/call_hierarchy.go:136

  • Same bug as in find_usages.go: idx is loaded from the persisted index_status.json file. Once any indexing run has completed, the file exists and idx != nil is always true — causing the response status to be incorrectly set to "indexing_in_progress" even when no indexing job is active.

The condition on line 133 should check whether indexing is actually in progress (e.g., via ActiveIndexingJobs() or checking idx.EndedAt == "") rather than just checking file existence.

	idx := t.engine.GetIndexStatus(wctx.Root)

	visited := make(map[string]bool)

	rootNode := &CallNode{Name: symbolName}

	// Try to find root symbol info
	rootRes := t.findSymbolInfo(ctx, wctx.ID, symbolName)
	if rootRes != nil {
		rootNode.Type, _ = rootRes.Point.Payload["type"].(string)
		rootNode.FilePath, _ = rootRes.Point.Payload["file_path"].(string)
		rootNode.Package, _ = rootRes.Point.Payload["package"].(string)
	} else {
		// If nothing is indexed yet, ExactSearchPolyglot will return ErrNoCollectionsFound.
		// Signal indexing status instead of returning an empty hierarchy.
		_, sErr := t.engine.ExactSearchPolyglot(ctx, wctx.ID, map[string]interface{}{"name": symbolName}, 1)
		var noCollections *engine.ErrNoCollectionsFound
		if errors.As(sErr, &noCollections) {
			resp := ToolResponse{
				Status:  "indexing_required",
				Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete call hierarchy results.", wctx.Root),
				Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
			}
			if idx != nil {
				resp.Status = "indexing_in_progress"
				resp.Data = map[string]any{"indexing": idx}
			}

internal/service/tools/find_usages.go:105

  • Bug: idx is now loaded from the persisted index_status.json file via GetIndexStatus(). Unlike the old GetIndexProgress() which returned non-nil only when an active in-memory job was running, the file persists after indexing completes. This means idx != nil will always be true once any indexing run has occurred, causing the status to incorrectly change from "indexing_required" to "indexing_in_progress" even when indexing finished long ago.

To fix: either check if an indexing job is currently active (via ActiveIndexingJobs() or indexingJobs.Load(wctx.ID)), or check a specific field on the status (e.g., s.EndedAt == "") before treating it as "in progress".

	idx := t.engine.GetIndexStatus(wctx.Root)
	allResults, err := t.engine.ExactSearchPolyglot(ctx, wctx.ID, filter, 100)
	if err != nil {
		var noCollections *engine.ErrNoCollectionsFound
		if errors.As(err, &noCollections) {
			resp := ToolResponse{
				Status:  "indexing_required",
				Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete results.", wctx.Root),
				Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
			}
			if idx != nil {
				resp.Status = "indexing_in_progress"
			}

Comment thread internal/service/engine/engine.go
Comment thread pkg/parser/python/treesitter.go Outdated
Comment thread BUGS.md Outdated
Comment thread SUGGESTIONS.md Outdated
Comment thread pkg/parser/python/treesitter.go
Comment thread pkg/parser/go/analyzer_test.go Outdated
Comment thread internal/service/tools/evaluate_ragcode.go Outdated
Comment thread pkg/parser/python/extract.go
doITmagic and others added 4 commits March 10, 2026 11:51
This addresses issues where indexing large files (e.g., barou.sql) caused the host system to freeze due to host CPU/GPU starvation and excessive GC pressure.

- Fix Ollama throttling bug in indexer service by correctly using a 150ms delay instead of 10ms.

- Prevent GC thrashing in treesitter parser by evaluating byte sizes instead of allocating strings for every AST node.

- Truncate massive leaf nodes (>8KB) to prevent crashing the Ollama embedding API.
This addresses issues where indexing large files (e.g., barou.sql) caused the host system to freeze due to host CPU/GPU starvation and excessive GC pressure.

- Fix Ollama throttling bug in indexer service by correctly using a 150ms delay instead of 10ms.
- Prevent GC thrashing in treesitter parser by evaluating byte sizes instead of allocating strings for every AST node.
- Truncate massive leaf nodes (>8KB) to prevent crashing the Ollama embedding API.
Export IsInvalidRoot from the watch package and apply it as a
safety check at the very start of StartIndexingAsync, before any
job registration or SaveIndexStatus call.

This prevents accidental indexing of dangerous paths such as the
user home directory (~), filesystem root (/), or /tmp — which would
cause .ragcode/index_status.json to be written outside any real
workspace.

- pkg/workspace/watch: isInvalidRoot → IsInvalidRoot (exported + docstring)
- internal/service/engine: guard added as first check in StartIndexingAsync
Copilot AI review requested due to automatic review settings March 10, 2026 19:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 9 comments.

Comment thread internal/service/tools/evaluate_ragcode.go Outdated
Comment thread internal/service/tools/read_file_context.go Outdated
Comment thread internal/service/engine/engine.go Outdated
Comment thread pkg/workspace/watch/watcher.go
Comment thread pkg/parser/go/analyzer_test.go
Comment thread pkg/parser/go/analyzer_test.go Outdated
Comment thread internal/service/tools/skills.go Outdated
Comment thread pkg/indexer/index_status.go Outdated
Comment thread pkg/parser/python/extract.go
razvan added 2 commits March 10, 2026 22:15
Critical fixes:
- Populate IndexingStatus in tool responses (was nil) for ListSkillsTool,
  InstallSkillTool, EvaluateRagCodeTool, ReadFileContextTool, SmartSearchTool,
  ListPackageExportsTool — use ContextFromWorkspaceWithStatus consistently
- fix(engine): preserve Languages map during incremental indexing in
  StartIndexingAsync (was overwriting with empty object)
- fix(engine): extract finalizeIndexStatus helper to eliminate duplicated
  EndedAt/Elapsed/Error finalization logic in success and error branches
- fix(engine): Progress callback — eliminate LoadIndexStatus (disk read +
  JSON unmarshal) on every tick; keep single *IndexStatus in-memory and
  only call SaveIndexStatus (atomic write) for disk flush every 10 files
- fix(indexer): SaveIndexStatus uses atomic write-to-temp-then-rename
  to prevent concurrent readers seeing partial JSON

Hidden from AI consumers:
- LangStatus.Changed field now json:"-" — AI sees only on_disk and processed

Cleanup:
- smart_search_pipeline.go: fix extra blank lines and restore missing
  return statement after buildResponseMeta refactor
- treesitter.go: replace invalid issues/TBD link with descriptive comment
- watcher.go: clarify IsInvalidRoot doc comment (~ is not expanded by
  filepath.Clean; rejection is via os.UserHomeDir())
- BUGS.md: mark BUG-003 as Fixed (PR #40)
- SUGGESTIONS.md: translate to English, update with current State-field status
- analyzer_test.go: remove stale Qdrant DB snapshot references from comments
- extract.go: fix getIndentation break → return to exit for-loop

Tests:
- analyzer_test.go: relax exact line number assertions to > 0
- treesitter_test.go: add 7 new tests for patchExceptAs, call extraction
  (Code Graph), module-level vars/constants, class vars, IsGenerator
- treesitter.go: fix extractClassVarsFromBlock to handle assignment nodes
  placed directly in block without expression_statement wrapper
Copilot AI review requested due to automatic review settings March 10, 2026 20:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 36 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

internal/service/tools/find_usages.go:105

  • In the ErrNoCollectionsFound branch, resp.Status is switched to "indexing_in_progress" whenever idx != nil. With the new file-based IndexStatus, idx will be non-nil for any workspace that has ever written index_status.json (even if indexing already completed), so this can misreport "indexing_in_progress". Consider keying this on a real in-progress signal (e.g., idx.EndedAt == "" / idx.Error == "" or checking Engine.ActiveIndexingJobs for wctx.ID) instead of mere file existence.
	idx := t.engine.GetIndexStatus(wctx.Root)
	allResults, err := t.engine.ExactSearchPolyglot(ctx, wctx.ID, filter, 100)
	if err != nil {
		var noCollections *engine.ErrNoCollectionsFound
		if errors.As(err, &noCollections) {
			resp := ToolResponse{
				Status:  "indexing_required",
				Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete results.", wctx.Root),
				Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
			}
			if idx != nil {
				resp.Status = "indexing_in_progress"
			}

internal/service/tools/call_hierarchy.go:136

  • The "indexing_in_progress" status is currently set whenever idx != nil, but IndexStatus will be non-nil for any workspace that has an index_status.json from a previous run. This can incorrectly report indexing as in progress when indexing is actually completed (or stale). Consider switching this condition to something that reflects an active run (e.g., idx.EndedAt == "" and idx.Error == "" / checking Engine.ActiveIndexingJobs for wctx.ID).
	idx := t.engine.GetIndexStatus(wctx.Root)

	visited := make(map[string]bool)

	rootNode := &CallNode{Name: symbolName}

	// Try to find root symbol info
	rootRes := t.findSymbolInfo(ctx, wctx.ID, symbolName)
	if rootRes != nil {
		rootNode.Type, _ = rootRes.Point.Payload["type"].(string)
		rootNode.FilePath, _ = rootRes.Point.Payload["file_path"].(string)
		rootNode.Package, _ = rootRes.Point.Payload["package"].(string)
	} else {
		// If nothing is indexed yet, ExactSearchPolyglot will return ErrNoCollectionsFound.
		// Signal indexing status instead of returning an empty hierarchy.
		_, sErr := t.engine.ExactSearchPolyglot(ctx, wctx.ID, map[string]interface{}{"name": symbolName}, 1)
		var noCollections *engine.ErrNoCollectionsFound
		if errors.As(sErr, &noCollections) {
			resp := ToolResponse{
				Status:  "indexing_required",
				Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete call hierarchy results.", wctx.Root),
				Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
			}
			if idx != nil {
				resp.Status = "indexing_in_progress"
				resp.Data = map[string]any{"indexing": idx}
			}

Comment thread pkg/parser/docs/treesitter.go Outdated
Comment thread pkg/indexer/index_status.go Outdated
Comment thread internal/service/engine/engine.go
Copilot AI review requested due to automatic review settings March 11, 2026 08:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (2)

internal/service/tools/find_usages.go:106

  • idx != nil is treated as “indexing_in_progress”, but IndexStatus is loaded from disk and will remain non-nil even after indexing completed (EndedAt set) or failed. This can incorrectly label the workspace as indexing when ErrNoCollectionsFound occurs for other reasons (e.g., collections deleted). Consider deriving “in progress” from the status fields (e.g., StartedAt set AND EndedAt empty), and only then switch to indexing_in_progress / attach indexing data.
	idx := t.engine.GetIndexStatus(wctx.Root)
	allResults, err := t.engine.ExactSearchPolyglot(ctx, wctx.ID, filter, 100)
	if err != nil {
		var noCollections *engine.ErrNoCollectionsFound
		if errors.As(err, &noCollections) {
			resp := ToolResponse{
				Status:  "indexing_required",
				Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete results.", wctx.Root),
				Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
			}
			if idx != nil {
				resp.Status = "indexing_in_progress"
			}
			return resp.JSON()

internal/service/tools/call_hierarchy.go:137

  • The tool sets status=indexing_in_progress whenever an IndexStatus file exists (idx != nil), but IndexStatus persists after completion. In the ErrNoCollectionsFound branch this can misreport state if collections are missing for other reasons. Prefer a deterministic “in progress” check (e.g., StartedAt present and EndedAt empty) before reporting indexing_in_progress and returning indexing data.
	idx := t.engine.GetIndexStatus(wctx.Root)

	visited := make(map[string]bool)

	rootNode := &CallNode{Name: symbolName}

	// Try to find root symbol info
	rootRes := t.findSymbolInfo(ctx, wctx.ID, symbolName)
	if rootRes != nil {
		rootNode.Type, _ = rootRes.Point.Payload["type"].(string)
		rootNode.FilePath, _ = rootRes.Point.Payload["file_path"].(string)
		rootNode.Package, _ = rootRes.Point.Payload["package"].(string)
	} else {
		// If nothing is indexed yet, ExactSearchPolyglot will return ErrNoCollectionsFound.
		// Signal indexing status instead of returning an empty hierarchy.
		_, sErr := t.engine.ExactSearchPolyglot(ctx, wctx.ID, map[string]interface{}{"name": symbolName}, 1)
		var noCollections *engine.ErrNoCollectionsFound
		if errors.As(sErr, &noCollections) {
			resp := ToolResponse{
				Status:  "indexing_required",
				Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete call hierarchy results.", wctx.Root),
				Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
			}
			if idx != nil {
				resp.Status = "indexing_in_progress"
				resp.Data = map[string]any{"indexing": idx}
			}
			return resp.JSON()

Comment thread pkg/indexer/service.go
Comment thread pkg/indexer/index_status.go
Comment thread pkg/parser/python/treesitter.go
ResumeIndexingOnConnect and DetectContext auto-trigger could both call
StartIndexingAsync for the same workspace simultaneously, bypassing the
LoadOrStore dedup guard via TOCTOU race window.

Changes:
- ResumeIndexingOnConnect now marks connectTriggered before StartIndexingAsync
- Removed redundant indexingJobs.Load check from DetectContext (TOCTOU)
- Changed 'go e.StartIndexingAsync(...)' to direct call (goroutine created internally)

Fixes system freeze when indexing large workspaces (~5000+ files).
Copilot AI review requested due to automatic review settings March 11, 2026 17:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 2 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment thread internal/service/engine/engine_searchcode_test.go
Comment thread pkg/parser/python/treesitter.go
razvan added 4 commits March 11, 2026 21:37
The VKCOM PHP parser AST already includes $ in Identifier.Value
(e.g. "$role" not "role"), so adding another $ prefix resulted
in $$role in method signatures and $$table in property signatures.

- buildMethodSignature: remove explicit "$" + prefix (line 663)
- convertToChunks: remove "$" from property Signature format (line 944)

Verified: all php parser tests pass, manual test on Laravel project
confirms single $ in all signatures.
Three PHP parser improvements:

1. uses_type relations: PHP 'use' import statements now generate
   uses_type relations on class chunks. This enables find_usages to
   discover all classes importing a given type (e.g. find_usages('Lawyer')
   finds all controllers with 'use App\Lawyer').

2. Route file extraction: PHP files in routes/ directories that yield
   0 symbols from standard AST analysis now fall back to regex-based
   Route::get/post/resource extraction. routes/web.php goes from
   0 to 39 symbols.

3. Fix $$ double dollar: Remove extra $ prefix from parameter and
   property signatures since VKCOM AST already includes $ in
   Identifier.Value.
These file types are not documentation - they are code that was
incorrectly classified as docs. Removing them from the docs parser:

- SQL: query language
- SH: shell scripts
- Svelte: frontend framework components

This reduces docs from 551 to ~49 files on the barou Laravel project,
making the language sort put PHP/JS first and dramatically reducing
indexing time for documentation.

Updated tests to verify these extensions are no longer handled by docs.
These file types are not documentation - they are code that was
incorrectly classified as docs. Removing them from the docs parser:

- SQL: query language
- SH: shell scripts
- Svelte: frontend framework components

This reduces docs from 551 to ~49 files on the barou Laravel project,
making the language sort put PHP/JS first and dramatically reducing
indexing time for documentation.

Updated tests to verify these extensions are no longer handled by docs.
Copilot AI review requested due to automatic review settings March 11, 2026 22:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 44 out of 45 changed files in this pull request and generated 4 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment thread pkg/parser/php/php_analyzer.go
Comment thread pkg/indexer/index_status.go
Comment thread pkg/indexer/index_status.go
Comment thread pkg/parser/php/analyzer.go
- Replace Unix socket and .pid lock files with TCP port binding (localhost:39000) for singleton enforcement.
- Update IsDaemonRunning, StartDaemon and StopDaemon to fetch process ID via HTTP /health.
- Remove tracking logic around pidfile and sockets.
- Recreate adapter and lifecycle tests to connect over loopback TCP instead of sockets.
- Update rag-code-install gracefully stop procedure to pull daemon PID from health endpoint.
- Introduce FrameworkEnricher interface in core PHP analyzer
- Isolate Laravel and WordPress specific analysis into enricher.go
- Resolve plugin overhead with blank imports on run/test files
- Maintain lazy-loading decoupled structure to prevent import cycles
Copilot AI review requested due to automatic review settings March 12, 2026 15:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 59 out of 62 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

internal/service/tools/find_usages.go:105

  • In the ErrNoCollectionsFound branch, the tool sets status="indexing_in_progress" whenever GetIndexStatus returns non-nil. With the new file-based IndexStatus, a non-nil status file can exist even when indexing is completed/failed, so this can incorrectly report “in progress”. Consider checking something like idx.EndedAt == "" && idx.StartedAt != "" (and/or idx.Error=="") before switching the status.
    internal/service/tools/call_hierarchy.go:136
  • Same issue as in FindUsagesTool: idx := GetIndexStatus(...) is used as a boolean to decide status="indexing_in_progress". Since IndexStatus persists after completion, this can mislabel a workspace as “in progress” even when it’s done. Gate this on an “active” condition (e.g., EndedAt == "").
	idx := t.engine.GetIndexStatus(wctx.Root)

	visited := make(map[string]bool)

	rootNode := &CallNode{Name: symbolName}

	// Try to find root symbol info
	rootRes := t.findSymbolInfo(ctx, wctx.ID, symbolName)
	if rootRes != nil {
		rootNode.Type, _ = rootRes.Point.Payload["type"].(string)
		rootNode.FilePath, _ = rootRes.Point.Payload["file_path"].(string)
		rootNode.Package, _ = rootRes.Point.Payload["package"].(string)
	} else {
		// If nothing is indexed yet, ExactSearchPolyglot will return ErrNoCollectionsFound.
		// Signal indexing status instead of returning an empty hierarchy.
		_, sErr := t.engine.ExactSearchPolyglot(ctx, wctx.ID, map[string]interface{}{"name": symbolName}, 1)
		var noCollections *engine.ErrNoCollectionsFound
		if errors.As(sErr, &noCollections) {
			resp := ToolResponse{
				Status:  "indexing_required",
				Message: fmt.Sprintf("⏳ Workspace '%s' is not indexed yet. Indexing is required for complete call hierarchy results.", wctx.Root),
				Context: ContextFromWorkspaceWithStatus(wctx, t.engine),
			}
			if idx != nil {
				resp.Status = "indexing_in_progress"
				resp.Data = map[string]any{"indexing": idx}
			}

You can also share your feedback on Copilot code review. Take the survey.

Comment thread pkg/parser/html/analyzer.go Outdated
razvan and others added 3 commits March 13, 2026 11:01
…emon

- Parsers: Introduced gotreesitter parser caching & explicit 'arenagc' draining. Arena memory is now freed after each file, fixing a severe memory leak.
- HTML/CSS: Dropped CSS/SCSS tracking in the HTML parser to avoid Tree-Sitter GLR explosions and extreme slowdowns during embedding.
- Indexer: Added strict ignoring of minified/vendored files (.min.js, .bundle.css, etc.) to skip massive auto-generated files.
- Indexer: Added watchdog and auto-recovery for Ollama embedded deadlocks.
- Daemon: Reverted to simple and stable 'Setsid' background daemon spawn pattern in lifecycle.go.
- Main: Removed unnecessary --fork-exec flag logic and bumped version.
…tter memory explosions

- Extracted CSS parsing from html/analyzer.go to a dedicated css/analyzer.go
- Replaces GLR AST generation with linear bracket-depth text scanning
- Caps huge CSS rule chunks to 8KB to prevent vector DB overload
- Removed old unused css_regex.go implementation
- Registered the new generic CSS parser globally in the daemon

Resolves Trello Task 1, Task 2, Task 3
Copilot AI review requested due to automatic review settings March 13, 2026 21:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 75 out of 78 changed files in this pull request and generated 14 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment thread README.md
- **Python**: Complete native AST support
- **HTML & Markdown**: Structural documentation mappings
- **Generic Support**: CSS, JSON, YAML, Shell scripts, SQL
- **HTML & CSS**: HTML structural mappings, CSS/SCSS/SASS/LESS via tree-sitter
Comment thread test.css
Comment on lines +1 to +5

body {
color: #fff;
}

No newline at end of file
Comment thread internal/daemon/server.go
Comment on lines 16 to 27
// ListenConfig configures the daemon's network listeners and lifecycle.
type ListenConfig struct {
SocketPath string // Unix domain socket path (required)
PIDPath string // PID file path (required)
Version string // Server version string
HTTPPort int // TCP port for optional HTTP listener (0 = disabled)
Handler http.Handler // MCP handler (must handle /mcp)
OnReady func() // Called when daemon is ready to accept connections (optional)
Port int // TCP port for localhost listener
Version string // Server version string
Handler http.Handler // MCP handler (must handle /mcp)
OnReady func() // Called when daemon is ready to accept connections (optional)
}

// ListenAndServe starts the daemon listeners and blocks until ctx is cancelled
// or SIGTERM/SIGINT is received. Cleans up socket and PID file on exit.
//
// It sets up two listeners:
// 1. Unix domain socket at SocketPath (primary, for stdio adapters)
// 2. TCP HTTP on HTTPPort (optional, for curl/debug/external agents, localhost only)
//
// Both serve the same handler mux with /health and the provided MCP handler.
// or SIGTERM/SIGINT is received. It binds exclusively to a local TCP port to
// guarantee it is a singleton, avoiding file locking issues.
func ListenAndServe(ctx context.Context, cfg ListenConfig) error {
Comment thread pkg/parser/README.md
Comment on lines +48 to +51
| **PHP** | [`/php`](./php/README.md) | Deep Laravel integration (Eloquent, Routes, Controllers) & WordPress. | ✅ Production |
| **HTML & CSS** | [`/html`](./html/README.md) | HTML semantic sectioning + CSS/SCSS/SASS/LESS via tree-sitter. | ✅ Production |
| **JavaScript** | [`/javascript`](./javascript/README.md) | React, Vue, & TypeScript support. | ✅ Production |
| **Docs** | [`/docs`](./docs/README.md) | Markdown, JSON, YAML, XML, TOML, reStructuredText. | ✅ Production |
Comment thread pkg/indexer/service.go
Comment on lines 223 to +234
var fileErrs []string
for _, path := range changedFiles {
fileNum := int(doneFiles.Load()) + 1
logger.Instance.Debug("[IDX] ws=%s lang=%s [%d/%d] %s (indexing...)",
logger.Instance.Info("[IDX] ws=%s lang=%s [%d/%d] %s (indexing...)",
wsName, opts.Language, fileNum, totalFiles, filepath.Base(path))

symCount, indexErr := s.IndexFile(ctx, collection, path, state)
if indexErr != nil {
logger.Instance.Warn("[IDX] ws=%s lang=%s ⚠️ %s: %v", wsName, opts.Language, filepath.Base(path), indexErr)
fileErrs = append(fileErrs, fmt.Sprintf("%s: %v", path, indexErr))
} else {
logger.Instance.Debug("[IDX] ws=%s lang=%s %s → %d symbol(s)", wsName, opts.Language, filepath.Base(path), symCount)
logger.Instance.Info("[IDX] ws=%s lang=%s %s → %d symbol(s)", wsName, opts.Language, filepath.Base(path), symCount)
Comment on lines +17 to +19
// Analyzer implementeaza procesarea pe bucati (chunk-based) a fisierelor CSS/SCSS/LESS.
// Fara sa depinda de Tree-sitter, nu face OOM nici macar la bundle-uri gigantice.
type Analyzer struct{}
Comment thread internal/daemon/run.go
Comment on lines +218 to +223
// Profiling endpoints
mcpMux.HandleFunc("/debug/pprof/", pprof.Index)
mcpMux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
mcpMux.HandleFunc("/debug/pprof/profile", pprof.Profile)
mcpMux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
mcpMux.HandleFunc("/debug/pprof/trace", pprof.Trace)
Comment thread heap
@@ -0,0 +1 @@
404 page not found
Comment on lines +202 to +208
atIdx := strings.Index(argsStr, "@")
if atIdx > 0 {
// Find the quoted string containing @
for _, q := range []byte{'\'', '"'} {
startQ := strings.IndexByte(argsStr[strings.Index(argsStr, string(q))+1:], q)
_ = startQ
}
Comment on lines +22 to +33
func getFreePort() (int, error) {
addr, err := net.ResolveTCPAddr("tcp", "localhost:0")
if err != nil {
return 0, err
}
l, err := net.ListenTCP("tcp", addr)
if err != nil {
return 0, err
}
defer l.Close()
return l.Addr().(*net.TCPAddr).Port, nil
}
@doITmagic doITmagic merged commit 90419b7 into dev Mar 13, 2026
7 checks passed
@doITmagic doITmagic deleted the refactor/indexing-progress branch March 13, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants