Capabilities Reference

Technical reference for what codebase-context ships today. For the user-facing overview, see README.md.

CLI Reference

All 10 MCP tools are exposed as CLI subcommands. Set CODEBASE_ROOT or run from the project directory.

Command	Flags	Maps to
`search --query <q>`	`--intent explore\|edit\|refactor\|migrate`, `--limit <n>`, `--lang <l>`, `--framework <f>`, `--layer <l>`	`search_codebase`
`metadata`	—	`get_codebase_metadata`
`status`	—	`get_indexing_status`
`reindex`	`--incremental`, `--reason <r>`	`refresh_index`
`style-guide`	`--query <q>`, `--category <c>`	`get_style_guide`
`patterns`	`--category all\|di\|state\|testing\|libraries`	`get_team_patterns`
`refs --symbol <name>`	`--limit <n>`	`get_symbol_references`
`cycles`	`--scope <path>`	`detect_circular_dependencies`
`memory list`	`--category`, `--type`, `--query`, `--json`	—
`memory add`	`--type`, `--category`, `--memory`, `--reason`	`remember`
`memory remove <id>`	—	—

All commands accept --json for raw JSON output. Errors go to stderr with exit code 1.

# Quick examples
npx codebase-context status
npx codebase-context search --query "auth middleware" --intent edit
npx codebase-context refs --symbol "UserService" --limit 10
npx codebase-context cycles --scope src/features
npx codebase-context reindex --incremental

Tool Surface

10 MCP tools + 1 optional resource (codebase://context). Migration: get_component_usage was removed; use get_symbol_references for symbol usage evidence.

Core Tools

Tool	Input	Output
`search_codebase`	`query`, optional `intent`, `limit`, `filters`, `includeSnippets`	Ranked results (`file`, `summary`, `score`, `type`, `trend`, `patternWarning`, `relationships`, `hints`) + `searchQuality` + decision card (`ready`, `nextAction`, `patterns`, `bestExample`, `impact`, `whatWouldHelp`) when `intent="edit"`. Hints capped at 3 per category.
`get_team_patterns`	optional `category`	Pattern frequencies, trends, golden files, conflicts
`get_symbol_references`	`symbol`, optional `limit`	Concrete symbol usage evidence: `usageCount` + top usage snippets + `confidence` + `isComplete`. `confidence: "syntactic"` means static/source-based only (no runtime or dynamic dispatch). Replaces the removed `get_component_usage`.
`remember`	`type`, `category`, `memory`, `reason`	Persists to `.codebase-context/memory.json`
`get_memory`	optional `category`, `type`, `query`, `limit`	Memories with confidence decay scoring

Utility Tools

Tool	Purpose
`get_codebase_metadata`	Framework, dependencies, project stats
`get_style_guide`	Style rules from project documentation
`detect_circular_dependencies`	Import cycles in the file graph
`refresh_index`	Full or incremental re-index + git memory extraction
`get_indexing_status`	Index state, progress, last stats

Retrieval Pipeline

Ordered by execution:

Intent classification — EXACT_NAME (for symbols), CONCEPTUAL, FLOW, CONFIG, WIRING. Sets keyword/semantic weight ratio.
Query expansion — bounded domain term expansion for conceptual queries.
Dual retrieval — keyword (Fuse.js) + semantic (local embeddings or OpenAI).
RRF fusion — Reciprocal Rank Fusion (k=60) across all retrieval channels.
Definition-first boost — for EXACT_NAME intent, results matching the symbol name get +15% score boost (e.g., defining file ranks above using files).
Structure-aware boosting — import centrality, composition root boost, path overlap, definition demotion for action queries.
Contamination control — test file filtering for non-test queries.
File deduplication — best chunk per file.
Symbol-level deduplication — within each symbolPath group, keep only the highest-scoring chunk (prevents duplicate methods from same class clogging results).
Stage-2 reranking — cross-encoder (Xenova/ms-marco-MiniLM-L-6-v2) triggers when the score between the top files are very close. CPU-only, top-10 bounded.
Result enrichment — compact type (componentType:layer), pattern momentum (trend Rising/Declining only, Stable omitted), patternWarning, condensed relationships (importedByCount/hasTests), structured hints (capped callers/consumers/tests ranked by frequency), scope header for symbol-aware snippets (// ClassName.methodName), related memories (capped to 3), search quality assessment with hint when low confidence.

Defaults

Chunk size: 50 lines, 0 overlap
Reranker trigger: activates when top-3 results are within 0.08 score of each other
Embedding model: Granite (ibm-granite/granite-embedding-30m-english, 8192 token context) via @huggingface/transformers v3
Vector DB: LanceDB with cosine distance

Decision Card (Edit Intent)

Returned as preflight when search intent is edit, refactor, or migrate.

Output shape:

{
  ready: boolean;
  nextAction?: string;        // Only when ready=false; what to search for next
  warnings?: string[];        // Failure memories (capped at 3)
  patterns?: {
    do: string[];             // Top 3 preferred patterns with adoption %
    avoid: string[];          // Top 3 declining patterns
  };
  bestExample?: string;       // Top 1 golden file (path format)
  impact?: {
    coverage: string;         // "X/Y callers in results"
    files: string[];          // Top 3 impact candidates (files importing results)
  };
  whatWouldHelp?: string[];   // Concrete next steps (max 4) when ready=false
}

Fields explained:

ready: boolean, whether evidence is sufficient to proceed
nextAction: actionable reason why ready=false (e.g., "2 of 5 callers missing")
warnings: failure memories from team (auto-surfaces past mistakes)
patterns.do: patterns the team is adopting, ranked by adoption %
patterns.avoid: declining patterns, ranked by % (useful for migrations)
bestExample: exemplar file for the area under edit
impact.coverage: shows caller visibility ("3/5 callers in results" means 2 callers weren't searched yet)
impact.files: which files import the results (helps find blind spots)
whatWouldHelp: specific next searches, tool calls, or files to check that would close evidence gaps

How `ready` is determined

Evidence triangulation — scores code match (45%), pattern alignment (30%), and memory support (25%). Needs combined score ≥ 40 to pass.
Epistemic stress check — if pattern conflicts, stale memories, thin evidence, or low caller coverage are detected, ready is set to false.
Search quality gate — if searchQuality.status is low_confidence, ready is forced to false regardless of evidence scores. This prevents the "confidently wrong" problem.

Internal signals (not in output, feed `ready` computation)

Risk level from circular deps, impact breadth, and failure memories
Preferred/avoid patterns from team pattern analysis
Golden files ranked by pattern density
Caller coverage from import graph (X of Y callers appearing in results)
Pattern conflicts when two patterns in the same category are both > 20% adoption
Confidence decay of related memories

Memory System

4 types: convention, decision, gotcha, failure
Confidence decay: conventions never decay, decisions 180-day half-life, gotchas/failures 90-day half-life
Stale threshold: memories below 30% confidence are flagged
Git auto-extraction: conventional commits from last 90 days
Surface locations: search_codebase results (as relatedMemories), get_team_patterns responses, preflight analysis

Indexing

Initial: full scan → chunking (50 lines, 0 overlap) → embedding → vector DB (LanceDB) + keyword index (Fuse.js)
Incremental: SHA-256 manifest diffing, selective embed/delete, full intelligence regeneration
Version gating: index-meta.json tracks format version; mismatches trigger automatic rebuild
Crash-safe rebuilds: full rebuilds write to .staging/ and swap atomically only on success
Auto-heal: corrupted index triggers automatic full re-index on next search
Relationships sidecar: relationships.json contains file import graph and symbol export index
Storage: .codebase-context/ directory (memory.json + generated files)

Analyzers

Angular: signals, standalone components, control flow syntax, lifecycle hooks, DI patterns, component metadata
Generic: 30+ have indexing/retrieval coverage including PHP, Ruby, Swift, Scala, Shell, config/markup., 10 languages have full symbol extraction (Tree-sitter: TypeScript, JavaScript, Python, Java, Kotlin, C, C++, C#, Go, Rust).

Notes:

Language detection covers common extensions including .pyi, .kt/.kts, .cc/.cxx, and config formats like .toml/.xml.
When Tree-sitter grammars are present, the Generic analyzer uses AST-aligned chunking and scope-aware prefixes for symbol-aware snippets (with fallbacks).

Evaluation Harness

Reproducible evaluation is shipped as a CLI entrypoint backed by shared scoring/reporting code.

Command: npm run eval -- <codebaseA> <codebaseB> (builds first, then runs scripts/run-eval.mjs)
Shared implementation: src/eval/harness.ts + src/eval/types.ts (tests and CLI use the same scoring)
Frozen fixtures:
- tests/fixtures/eval-angular-spotify.json (real-world)
- tests/fixtures/eval-controlled.json + tests/fixtures/codebases/eval-controlled/ (offline controlled)
Reported metrics: Top-1 accuracy, Top-3 recall, spec contamination rate, and a gate pass/fail

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capabilities Reference

CLI Reference

Tool Surface

Core Tools

Utility Tools

Retrieval Pipeline

Defaults

Decision Card (Edit Intent)

How `ready` is determined

Internal signals (not in output, feed `ready` computation)

Memory System

Indexing

Analyzers

Evaluation Harness

FilesExpand file tree

capabilities.md

Latest commit

History

capabilities.md

File metadata and controls

Capabilities Reference

CLI Reference

Tool Surface

Core Tools

Utility Tools

Retrieval Pipeline

Defaults

Decision Card (Edit Intent)

How ready is determined

Internal signals (not in output, feed ready computation)

Memory System

Indexing

Analyzers

Evaluation Harness

How `ready` is determined

Internal signals (not in output, feed `ready` computation)