Technical reference for what codebase-context ships today. For the user-facing overview, see README.md.
10 MCP tools + 1 optional resource (codebase://context).
| Tool | Input | Output |
|---|---|---|
search_codebase |
query, optional intent, limit, filters, includeSnippets |
Ranked results (file, summary, score, type, trend, patternWarning, relationships, hints) + searchQuality + decision card (ready, nextAction, patterns, bestExample, impact, whatWouldHelp) when intent="edit". Hints capped at 3 per category. |
get_team_patterns |
optional category |
Pattern frequencies, trends, golden files, conflicts |
get_symbol_references |
symbol, optional limit |
Concrete symbol usage evidence: usageCount + top usage snippets + confidence ("syntactic") + isComplete boolean |
remember |
type, category, memory, reason |
Persists to .codebase-context/memory.json |
get_memory |
optional category, type, query, limit |
Memories with confidence decay scoring |
| Tool | Purpose |
|---|---|
get_codebase_metadata |
Framework, dependencies, project stats |
get_style_guide |
Style rules from project documentation |
detect_circular_dependencies |
Import cycles in the file graph |
refresh_index |
Full or incremental re-index + git memory extraction |
get_indexing_status |
Index state, progress, last stats |
Ordered by execution:
- Intent classification — EXACT_NAME (for symbols), CONCEPTUAL, FLOW, CONFIG, WIRING. Sets keyword/semantic weight ratio.
- Query expansion — bounded domain term expansion for conceptual queries.
- Dual retrieval — keyword (Fuse.js) + semantic (local embeddings or OpenAI).
- RRF fusion — Reciprocal Rank Fusion (k=60) across all retrieval channels.
- Definition-first boost — for EXACT_NAME intent, results matching the symbol name get +15% score boost (e.g., defining file ranks above using files).
- Structure-aware boosting — import centrality, composition root boost, path overlap, definition demotion for action queries.
- Contamination control — test file filtering for non-test queries.
- File deduplication — best chunk per file.
- Symbol-level deduplication — within each
symbolPathgroup, keep only the highest-scoring chunk (prevents duplicate methods from same class clogging results). - Stage-2 reranking — cross-encoder (
Xenova/ms-marco-MiniLM-L-6-v2) triggers when the score between the top files are very close. CPU-only, top-10 bounded. - Result enrichment — compact type (
componentType:layer), pattern momentum (trendRising/Declining only, Stable omitted),patternWarning, condensed relationships (importedByCount/hasTests), structured hints (capped callers/consumers/tests ranked by frequency), scope header for symbol-aware snippets (// ClassName.methodName), related memories (capped to 3), search quality assessment withhintwhen low confidence.
- Chunk size: 50 lines, 0 overlap
- Reranker trigger: activates when top-3 results are within 0.08 score of each other
- Embedding model: Granite (
ibm-granite/granite-embedding-30m-english, 8192 token context) via@huggingface/transformersv3 - Vector DB: LanceDB with cosine distance
Returned as preflight when search intent is edit, refactor, or migrate.
Output shape:
{
ready: boolean;
nextAction?: string; // Only when ready=false; what to search for next
warnings?: string[]; // Failure memories (capped at 3)
patterns?: {
do: string[]; // Top 3 preferred patterns with adoption %
avoid: string[]; // Top 3 declining patterns
};
bestExample?: string; // Top 1 golden file (path format)
impact?: {
coverage: string; // "X/Y callers in results"
files: string[]; // Top 3 impact candidates (files importing results)
};
whatWouldHelp?: string[]; // Concrete next steps (max 4) when ready=false
}Fields explained:
ready: boolean, whether evidence is sufficient to proceednextAction: actionable reason whyready=false(e.g., "2 of 5 callers missing")warnings: failure memories from team (auto-surfaces past mistakes)patterns.do: patterns the team is adopting, ranked by adoption %patterns.avoid: declining patterns, ranked by % (useful for migrations)bestExample: exemplar file for the area under editimpact.coverage: shows caller visibility ("3/5 callers in results" means 2 callers weren't searched yet)impact.files: which files import the results (helps find blind spots)whatWouldHelp: specific next searches, tool calls, or files to check that would close evidence gaps
- Evidence triangulation — scores code match (45%), pattern alignment (30%), and memory support (25%). Needs combined score ≥ 40 to pass.
- Epistemic stress check — if pattern conflicts, stale memories, thin evidence, or low caller coverage are detected,
readyis set to false. - Search quality gate — if
searchQuality.statusislow_confidence,readyis forced to false regardless of evidence scores. This prevents the "confidently wrong" problem.
- Risk level from circular deps, impact breadth, and failure memories
- Preferred/avoid patterns from team pattern analysis
- Golden files ranked by pattern density
- Caller coverage from import graph (X of Y callers appearing in results)
- Pattern conflicts when two patterns in the same category are both > 20% adoption
- Confidence decay of related memories
- 4 types:
convention,decision,gotcha,failure - Confidence decay: conventions never decay, decisions 180-day half-life, gotchas/failures 90-day half-life
- Stale threshold: memories below 30% confidence are flagged
- Git auto-extraction: conventional commits from last 90 days
- Surface locations:
search_codebaseresults (asrelatedMemories),get_team_patternsresponses, preflight analysis
- Initial: full scan → chunking (50 lines, 0 overlap) → embedding → vector DB (LanceDB) + keyword index (Fuse.js)
- Incremental: SHA-256 manifest diffing, selective embed/delete, full intelligence regeneration
- Version gating:
index-meta.jsontracks format version; mismatches trigger automatic rebuild - Crash-safe rebuilds: full rebuilds write to
.staging/and swap atomically only on success - Auto-heal: corrupted index triggers automatic full re-index on next search
- Relationships sidecar:
relationships.jsoncontains file import graph and symbol export index - Storage:
.codebase-context/directory (memory.json + generated files)
- Angular: signals, standalone components, control flow syntax, lifecycle hooks, DI patterns, component metadata
- Generic: 30+ languages — TypeScript, JavaScript, Python, Java, Kotlin, C/C++, C#, Go, Rust, PHP, Ruby, Swift, Scala, Shell, config/markup formats
Notes:
- Language detection covers common extensions including
.pyi,.kt/.kts,.cc/.cxx, and config formats like.toml/.xml. - When Tree-sitter grammars are present, the Generic analyzer can derive symbol components from Tree-sitter extraction (with fallbacks).
Reproducible evaluation is shipped as a CLI entrypoint backed by shared scoring/reporting code.
- Command:
npm run eval -- <codebaseA> <codebaseB>(builds first, then runsscripts/run-eval.mjs) - Shared implementation:
src/eval/harness.ts+src/eval/types.ts(tests and CLI use the same scoring) - Frozen fixtures:
tests/fixtures/eval-angular-spotify.json(real-world)tests/fixtures/eval-controlled.json+tests/fixtures/codebases/eval-controlled/(offline controlled)
- Reported metrics: Top-1 accuracy, Top-3 recall, spec contamination rate, and a gate pass/fail