| tags |
|
|||
|---|---|---|---|---|
| aliases |
|
This guide takes you from zero to searching your codebase in five steps. Make sure you have completed the Installation steps first.
graph LR
A["1. Init"] --> B["2. Index"]
B --> C["3. Search"]
C --> D["4. MCP Server"]
D --> E["5. Viewer"]
Navigate to your project root and run:
cd /path/to/your/project
coderag initWhat happens:
- CodeRAG scans your directory to auto-detect programming languages (TypeScript, Python, Go, Rust, Java, C#, C/C++, Ruby, PHP)
- Creates a
.coderag.yamlconfiguration file with sensible defaults - Creates the
.coderag/storage directory for the index data - Checks that Ollama is running and reachable
Expected output:
Scanning for project languages...
Detected languages: python, typescript
Created /path/to/your/project/.coderag.yaml
Created /path/to/your/project/.coderag
✔ Ollama is running at http://localhost:11434
CodeRAG initialized successfully!
Run "coderag index" to index your codebase.
Available options:
| Flag | Description |
|---|---|
--languages <langs> |
Override auto-detection with a comma-separated list (e.g., --languages typescript,python) |
--force |
Overwrite an existing .coderag.yaml |
--multi |
Generate a multi-repo configuration with a repos array |
Tip: > If you want to index multiple repositories together, use
coderag init --multito generate the multi-repo config scaffold. See Configuration for details.
coderag indexWhat happens (the full ingestion pipeline):
- Scan --- Discovers source files, respecting
.gitignoreand configured exclusions - Parse --- Builds an AST for each file using Tree-sitter
- Chunk --- Splits code into semantically meaningful chunks (functions, classes, methods) based on the AST
- Enrich --- Generates natural language summaries for each chunk using Ollama (
qwen2.5-coder:7b) - Embed --- Creates vector embeddings using
nomic-embed-text(768 dimensions) - Store --- Saves embeddings to LanceDB, builds a BM25 keyword index, and constructs a dependency graph
Expected output:
◐ Loading configuration...
◐ Scanning files...
◐ Scanned 142 files
◐ Initializing parser...
◐ Parsing 142 files...
◐ Parsed 142 files, created 387 chunks
◐ Enriching 387 chunks with NL summaries...
◐ Embedding 387 chunks...
◐ Storing embeddings in LanceDB...
◐ Building BM25 index...
◐ Building dependency graph...
◐ Saving index state...
✔ Indexing complete!
Summary:
Files processed: 142
Chunks created: 387
Time elapsed: 45.2s
Available options:
| Flag | Description |
|---|---|
--full |
Force a complete re-index, ignoring incremental state |
Note: > Subsequent runs of
coderag indexare incremental --- only changed files are re-processed. Use--fullif you want to rebuild the entire index from scratch.
coderag search "how does authentication work"CodeRAG performs a hybrid search combining:
- Vector search (semantic similarity, weighted 0.7 by default)
- BM25 search (keyword matching, weighted 0.3 by default)
- Results are fused using Reciprocal Rank Fusion (RRF)
Expected output:
Found 10 result(s) for "how does authentication work":
[1] src/auth/middleware.ts L12-45 function score: 0.8234
Middleware that validates JWT tokens and attaches user context to the request
[2] src/auth/provider.ts L8-32 class score: 0.7891
Authentication provider that supports OIDC and SAML protocols
[3] src/routes/login.ts L15-67 function score: 0.7456
POST /login handler that authenticates users and issues tokens
Available options:
| Flag | Description |
|---|---|
--language <lang> |
Filter results by programming language |
--type <chunkType> |
Filter by chunk type (function, class, method, etc.) |
--file <path> |
Filter by file path substring |
--top-k <n> |
Maximum number of results (default: 10) |
Examples:
# Search only in TypeScript files
coderag search "error handling" --language typescript
# Find only class definitions
coderag search "database connection" --type class
# Search within a specific directory
coderag search "validation" --file src/utils
# Get more results
coderag search "configuration" --top-k 20The MCP (Model Context Protocol) server exposes CodeRAG's capabilities as tools that AI coding agents can call directly.
# stdio transport (for direct agent integration)
coderag serve
# SSE transport (for network access)
coderag serve --port 3000Expected output (SSE mode):
[coderag] Starting MCP server (SSE transport on port 3000)...
[coderag] MCP server running on http://localhost:3000/sse
Available MCP tools:
| Tool | Description |
|---|---|
coderag_search |
Semantic + keyword hybrid search |
coderag_context |
Assemble context within a token budget |
coderag_status |
Check index health and statistics |
coderag_explain |
Explain a code symbol with full context |
coderag_docs |
Search indexed documentation |
coderag_backlog |
Query project backlog items |
Tip: > stdio transport is the default and is used when an AI agent spawns the MCP server as a subprocess. Use SSE transport with
--portwhen you want to connect from a remote client or share the server across multiple agents.
Add the following to your Claude Desktop MCP configuration (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"coderag": {
"command": "npx",
"args": ["coderag", "serve"],
"cwd": "/path/to/your/project"
}
}
}CodeRAG includes a web-based viewer for exploring your indexed codebase visually.
coderag viewerExpected output:
[coderag] API server initialized
[coderag] Viewer running at http://localhost:3333
The viewer opens automatically in your default browser.
Available options:
| Flag | Description |
|---|---|
-p, --port <port> |
Port number (default: 3333) |
--no-open |
Do not open the browser automatically |
Viewer features:
- Dashboard --- Overview of index statistics and health
- Chunk Browser --- Browse all indexed code chunks with metadata
- Search Playground --- Interactive search with result visualization
- Dependency Graph --- Interactive graph of code dependencies
- Embedding Explorer --- UMAP 2D/3D projection of embedding space
Note: > The viewer requires a built viewer package. If you see an error about the viewer not being built, run:
pnpm --filter @code-rag/viewer build
At any time, you can check the health of your CodeRAG index:
coderag statusExpected output:
CodeRAG Status
Health: ok
Total chunks: 387
Model: nomic-embed-text
Dimensions: 768
Languages: typescript, python
Storage: /path/to/your/project/.coderag
Use --json for machine-readable output:
coderag status --jsonflowchart TD
subgraph Setup["One-time Setup"]
I["coderag init"] --> IDX["coderag index"]
end
subgraph Usage["Daily Usage"]
IDX --> S["coderag search 'query'"]
IDX --> MCP["coderag serve"]
IDX --> V["coderag viewer"]
IDX --> ST["coderag status"]
MCP --> AGENT["AI Agent uses MCP tools"]
end
subgraph Auto["Automatic"]
CHANGE["Code changes"] --> INCR["coderag index<br/>(incremental)"]
INCR --> IDX
end
- Configuration --- Fine-tune
.coderag.yamlfor your project - CLI --- Full CLI command reference
- Embedding Providers --- Switch between Ollama, Voyage, and OpenAI embeddings
- Multi Repo --- Index and search across multiple repositories