You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**MCP Server** (`code-rag` or `code-rag-mcp`): Exposes search to AI assistants (Claude, etc.)
61
-
-**Embedding Server** (`code-rag-server`): Shared model server for multiple MCP instances
62
-
63
-
### 6. Shared Embedding Server
64
-
When running multiple MCP instances (e.g., multiple VS Code windows), each would normally load its own transformer model (~300MB+ RAM each). The **shared embedding server** solves this:
65
-
66
-
-**Auto-spawns** on first client request if not running
67
-
-**Auto-terminates** when no clients remain (after idle timeout)
68
-
- Uses **heartbeat** mechanism for client lifecycle tracking
69
-
-**Lock file** prevents duplicate server instances
70
-
71
-
Configuration (via environment):
72
-
-`CODE_RAG_SHARED_SERVER=true` (enabled by default)
73
-
-`CODE_RAG_SHARED_SERVER_PORT=8199`
74
-
75
-
Files:
76
-
-`src/code_rag/embedding_server.py` - FastAPI server
77
-
-`src/code_rag/embeddings/http_embedding.py` - HTTP client for embedding
78
-
-`src/code_rag/reranker/http_reranker.py` - HTTP client for reranking
35
+
### 1. API Layer (`src/code_rag/api.py`)
36
+
-**What**: The central hub for all Code-RAG operations.
37
+
-**How**: Integrates embedding, database, reranking, and indexing logic. Used by both CLI and MCP.
38
+
-**Features**: Session tracking, auto-generated collection names, and unified indexing flow.
39
+
40
+
### 2. File Processor & Chunker
41
+
-**What**: Discovers source files and breaks them into logical chunks.
42
+
-**How**: Uses `SyntaxChunker` (tree-sitter based) for code-aware splitting, falling back to line-based.
43
+
-**Output**: Text chunks with rich metadata (file path, line numbers, symbol names).
44
+
45
+
### 3. Metadata Index (`src/code_rag/index/metadata_index.py`)
46
+
-**What**: Tracks state of indexed files for incremental updates.
47
+
-**How**: Stores `mtime`, `size`, and `sha256` hashes.
48
+
-**Benefit**: Only re-indexes modified files, significantly speeding up subsequent runs.
49
+
50
+
### 4. Hybrid Search & Query Analyzer
51
+
-**What**: Improves search relevance by combining vector search with exact identifier matching.
52
+
-**How**: `QueryAnalyzer` detects code identifiers (CamelCase, snake_case) in queries and boosts results containing those identifiers.
0 commit comments