Local-first MCP toolkit for fast code search, dependency-aware module discovery, visual code atlas pages, and DeepWiki-style repository documentation.
English | 简体中文
Overview • MCP Tools • Code Module Atlas • DeepWiki • Benchmarks • Setup • Skills
codebase-mcp turns a local repository into a persistent MCP code intelligence service. It keeps tree-sitter indexed source data, symbols, references, dependencies, graph metadata, lexical indexes, and vector search data under the target repo's .codedb-mcp directory.
Warm MCP calls are designed to be millisecond-level inside a persistent server process. See Benchmark Snapshot, MCP vs rg, and Warm Tool Validation for measured latency and accuracy checks.
| Area | What It Provides |
|---|---|
| Fast MCP tools | Indexed exact/regex search, hybrid lexical/vector search, outlines, definitions, callers, dependencies, fuzzy file lookup, query pipelines, and 100-call bundles. |
| Module discovery | Dependency-connected file components plus dependency-weighted label propagation, with terms and paths used as explainable labels and evidence. |
| Code Module Atlas | A packaged meet-blog-style 3D viewer with one star per source file, module/file lists, dependency edges, and file focus/details. |
| DeepWiki | Local repository documentation generated from MCP evidence and the active agent's reasoning, with business-module-first pages and cited source files. |
| Local deployment | Explicit .codedb-mcp/codedb-mcp.toml, project-local storage, bundled skills, and no hidden environment-variable behavior. |
The server keeps a tree-sitter indexed, project-local code database under .codedb-mcp and exposes tools for:
- fast exact/regex and hybrid lexical/vector search;
- symbol outlines and definition lookup;
- LSP-like callers anchored to a definition path and line;
- direct and reverse file dependencies, including transitive walks;
- fuzzy file lookup, path globbing, compact query pipelines, and 100-call bundles;
- graph summaries, lazy Louvain communities, module planning, atlas export, and DeepWiki evidence gathering.
The atlas page is generated by the skills/code-module-atlas skill. It calls the local MCP module-atlas export, converts the result into the bundled meet-blog-style 3D viewer dataset, and shows one star node per source file.
Module boundaries are computed from the dependency-connected file graph first. Inside each connected component, the Rust module planner uses dependency-weighted label propagation; paths and distinctive terms are used for names, evidence, and oversized-component splitting, not as the primary grouping rule. The page then provides a module list, a file list for the selected module, file-to-file dependency edges, and file focus/details.
node skills\code-module-atlas\scripts\build-module-atlas.mjs u3dclient
cd skills\code-module-atlas\assets\viewer
npm run dev -- --port 5174 --strictPortThe skills/deepwiki skill builds local DeepWiki-style documentation from MCP evidence and the active agent's reasoning. It starts from dependency-aware module candidates, then writes business-module-first pages with cited files, entry points, flows, dependencies, and risk notes. It does not require a separate model API.
The intended distribution model is setup-guide first: give an agent setup-for-agent.md, let it create .codedb-mcp, use the default HuggingFace cache when it already exists, fall back to a second-drive cache when it does not, and then ask the human whether this specific agent should register the MCP server. The codedb-mcp skill is for using the tools after setup, not for installing them.
Benchmark target: u3dclient.
Current index status with the Unity C# benchmark config:
- Indexed files: 19,030.
- Chunks: 129,790.
- Symbols: 277,008.
- Graph: 296,941 nodes and 691,419 edges.
- Vector index: Vicinity HNSW over Model2Vec
minishlab/potion-code-16M. - Storage:
u3dclient\.codedb-mcp.
Index timings on this machine:
| Scenario | Cache | Internal total | Notes |
|---|---|---|---|
Cold tree-sitter .codedb-mcp build |
miss | 66.061s wall | scan, tree-sitter declaration parse, embeddings, graph, BM25, HNSW, cache save |
| Reopen with unchanged files/config | hit | 30.8s wall | reuses parsed files, chunks, semantic units, embeddings; rebuilds runtime graph/BM25/HNSW |
One-shot codedb_status CLI |
hit | 31.0s wall | includes process startup and index load; persistent MCP is the intended mode |
One-shot Rust codedb_module_atlas export |
hit | 42.057s wall | includes cache-hit index load plus atlas JSON export |
| Warm Rust module atlas generation | ready | 9.746s internal | 1,374 modules and 16,361 plotted files from dependency-connected file graph |
Java smoke benchmark on gameserver:
| Scenario | Files | Chunks | Symbols | Time |
|---|---|---|---|---|
| Cold tree-sitter build | 6,940 | 55,057 | 245,238 | 16.919s |
| Reopen with unchanged files/config | 6,940 | 55,057 | 245,238 | 11.527s |
Multi-language smoke coverage includes C#, Java, Rust, Python, Lua, TypeScript, C, and C++ parser paths.
Rust smoke check on this repository: 20 indexed files including 17 .rs files, 341 chunks, 604 symbols; codedb_outline, codedb_search, and codedb_deps all returned Rust results.
Warm persistent MCP tool timings below do not include server startup or index load.
For exact text and regex search, codedb_search regex=true and rg can both answer the query. The rg baseline used --no-ignore because this Unity project intentionally includes Library/PackageCache.
| Scenario | MCP tool | MCP hits | MCP time | rg baseline | rg hits | rg time |
|---|---|---|---|---|---|---|
Exact PoolManager |
codedb_search regex=true |
154 | 0.2234s | rg --no-ignore -n -i -F |
154 | 1.7201s |
Exact Joystick |
codedb_search regex=true |
938 | 0.2343s | rg --no-ignore -n -i -F |
938 | 1.9419s |
Exact NetworkListenerManager |
codedb_search regex=true |
14 | 0.1973s | rg --no-ignore -n -i -F |
14 | 1.7486s |
Exact GameObjectPoolMgr |
codedb_search regex=true |
8 | 0.2210s | rg --no-ignore -n -i -F |
8 | 2.1606s |
Exact AllianceManager |
codedb_search regex=true |
16 | 0.2190s | rg --no-ignore -n -i -F |
16 | 1.7719s |
Scoped Joystick in Joystick Pack |
codedb_search regex=true path_glob=... |
46 | 0.0063s | scoped rg --no-ignore -n -i -F |
46 | 0.0415s |
Scoped NetworkListenerManager in UnityNativeTools |
codedb_search regex=true path_glob=... |
14 | 0.0064s | scoped rg --no-ignore -n -i -F |
14 | 0.0414s |
Alliance UI/proto regex in Assets/Scripts |
codedb_search regex=true path_glob=... |
409 | 0.0635s | scoped rg --no-ignore -n -i |
409 | 0.4137s |
Alliance UI .cs file glob |
codedb_glob |
52 | 0.0044s | rg --files --no-ignore -g |
52 | 0.5748s |
Feature comparison:
| Capability | codedb-mcp | rg |
|---|---|---|
| Raw exact grep | yes, indexed via codedb_search regex=true |
yes |
| Regex line search | yes, indexed source corpus | yes, direct filesystem scan |
| Scoped file/path filtering | yes, path_glob, codedb_find, codedb_query |
yes, -g, shell paths |
| Fuzzy file lookup | yes, codedb_find |
no direct fuzzy ranking |
| Hybrid lexical + vector search | yes, BM25 + Model2Vec + Vicinity | no |
| Symbol outline | yes, codedb_outline from precomputed tree-sitter symbols |
no |
| Definition-anchored callers | yes, codedb_callers |
no semantic anchor |
| File dependency graph | yes, codedb_deps |
no |
| Code graph export/analysis | yes, codedb_graph, codedb_analyze, codedb_export |
no |
| Batch calls in one MCP round trip | yes, batch params and codedb_bundle |
no MCP batching |
| Arbitrary non-indexed binary/text corpus | no, source extensions from config | yes |
MCP-only measured features:
| Scenario | MCP tool | Results | Time | rg equivalent |
|---|---|---|---|---|
Hybrid search for PoolManager related chunks |
codedb_search |
20 | 0.0198s | none |
Hybrid search for Joystick related chunks |
codedb_search |
20 | 0.0666s | none |
Hybrid search for NetworkListenerManager related chunks |
codedb_search |
20 | 0.0271s | none |
Business semantic search: alliance member ranking donation gift under Assets/Scripts |
codedb_search path_glob=... |
20 | 0.0358s | none |
Definition-anchored references for PoolManager at PoolManager.cs:26 |
codedb_callers |
7 | 0.0045s | none |
Definition-anchored references for Joystick at Joystick.cs:8 |
codedb_callers |
7 | 0.0069s | none |
rg remains the better tool for ad hoc raw filesystem grep across arbitrary file types. codedb-mcp is for repeated code-aware work inside a configured source corpus.
These calls were measured through one already-started MCP process on u3dclient; exact regex searches were checked against rg --no-ignore on the same scoped corpus.
| Scenario | Tool | Accuracy Check | avg | p95 |
|---|---|---|---|---|
Scoped exact PoolManager |
codedb_search regex=true |
MCP 52 = rg 52 | 5.813ms | 5.953ms |
Scoped exact Joystick |
codedb_search regex=true |
MCP 46 = rg 46 | 6.371ms | 6.853ms |
Scoped exact NetworkListenerManager |
codedb_search regex=true |
MCP 14 = rg 14 | 6.486ms | 6.707ms |
Hybrid PoolManager |
codedb_search |
expected text present | 20.826ms | 21.723ms |
Hybrid Joystick |
codedb_search |
expected text present | 84.755ms | 84.621ms |
Business phrase alliance member ranking donation gift |
codedb_search |
Alliance results present | 39.849ms | 41.138ms |
Definition refs for PoolManager |
codedb_callers |
7 refs | 4.518ms | 5.464ms |
Definition refs for Joystick |
codedb_callers |
7 refs | 7.726ms | 8.692ms |
GameObjectPoolMgr.cs depends_on |
codedb_deps |
7 files, expected deps present | 0.244ms | 0.318ms |
NetworkListenerManager.cs imported_by |
codedb_deps |
3 files, expected importers present | 0.193ms | 0.212ms |
NetworkListenerManager.cs transitive imported_by |
codedb_deps |
16 files | 0.192ms | 0.230ms |
NetworkListenerManager.cs path lookup |
codedb_find |
top1 correct | 20.259ms | 21.108ms |
Joystick Pack Base Joystick path lookup |
codedb_find |
top1 correct | 17.710ms | 18.054ms |
ResTypDef typo-ish lookup |
codedb_find |
target rank 3 | 19.109ms | 20.027ms |
find NetworkListenerManager -> outline |
codedb_query |
expected outline present | 20.173ms | 20.505ms |
filter Joystick Pack -> limit 3 -> outline |
codedb_query |
expected outlines present | 8.017ms | 9.206ms |
filter UnityNativeTools -> search NetworkListenerManager |
codedb_query |
expected results present | 9.650ms | 10.755ms |
find GameObjectPoolMgr -> search PoolManager |
codedb_query |
expected results present | 22.019ms | 23.469ms |
Additional tool timings:
| Tool / Scenario | Result | Time |
|---|---|---|
codedb_deps GameObjectPoolMgr.cs depends_on |
7 files | 0.0002s |
codedb_deps NetworkListenerManager.cs imported_by |
3 files | 0.0002s |
codedb_deps AndroidPlatform.cs depends_on |
3 files | 0.0002s |
codedb_outline NetworkListenerManager.cs |
1 symbol | 0.3ms |
codedb_outline Joystick.cs |
17 symbols | 0.3ms |
codedb_outline PoolManager.cs |
32 symbols | 0.2ms |
codedb_outline NEON_AArch64.cs |
2,211 symbols | 1.4ms |
100 codedb_outline compact=true calls |
p95 | 0.3ms |
codedb_analyze on u3dclient |
graph analysis | about 0.93s |
codedb_bundle runs up to 100 inner operations in one MCP request. Requests above 100 execute the first 100 and include a truncation notice.
| Scenario | Inner ops requested | Repeats | Inner ops executed | Time |
|---|---|---|---|---|
| Fast mixed metadata/deps/outline/read bundle | 100 | 1 | 100 | 0.0895s |
| Overflow bundle | 120 | 1 | 100 + truncation notice | 0.0924s |
| Repeated fast bundle | 100 | 50 | 5,000 total | avg 0.0913s, p95 0.1084s |
| Mixed search/callers/deps/outline bundle | 100 | 1 | 100 | 2.3174s |
| Heavy regex search bundle | 100 | 1 | 100 | 26.0085s |
- Give the target agent
setup-for-agent.md. - The agent creates
<repo-root>\.codedb-mcpand<repo-root>\.codedb-mcp\models. - On Windows, the agent checks the default HuggingFace hub cache first. If
minishlab/potion-code-16Malready has a valid snapshot there, config points to that snapshot. If the hub cache exists but the model is missing, the agent downloads toC:\Users\<user>\.cache\huggingface\hub\codedb-mcp\models\potion-code-16M. If the default hub cache does not exist, it uses the second available drive, such asD:\codedb-mcp-cache\models\potion-code-16M. - The agent writes
<repo-root>\.codedb-mcp\codedb-mcp.tomlfrom the demo config, writes the model as an absolute path, and shows the human which languages are configured. - The human can edit
extensions,include_paths,skip_dirs, and the model path before first indexing. - The agent runs an index check.
- The agent asks whether this specific agent should register MCP. If yes, it uses its own MCP mechanism.
- Restart or reload the agent MCP session and check
/mcp.
The MCP command shape is:
<package-root>\skills\codedb-mcp\assets\codebase-mcp.exe --config <repo-root>\.codedb-mcp\codedb-mcp.toml mcp <repo-root>
This project intentionally keeps installation explicit: setup prepares local project files, while the agent/user chooses when and where to register MCP.
- Exposes local MCP tools for code search, outlines, symbols, typed callers, dependencies, file discovery, graph analysis, DeepWiki module planning, module atlas export, batching, and exports.
- Indexes configured source languages through one explicit config file:
<repo-root>/.codedb-mcp/codedb-mcp.toml. - Stores generated data inside the target repo under
.codedb-mcp. Delete that directory to remove local cache and generated wiki/index data. - Uses a unified tree-sitter parser layer, not Roslyn/JDT. C#, Java, Rust, Python, Lua, JavaScript, TypeScript/TSX, C, and C++ all emit the same
FileEntry/Symbolmodel. C#/Java typed callers and dependencies remain the strongest path because their namespace/package import rules are implemented on top of that shared AST output. - Uses Minish ecosystem pieces:
model2vec-rswith explicit-pathminishlab/potion-code-16M, file-level semantic units, BM25 lexical ranking, exact identifier indexes, and Vicinity HNSW vectors. - Builds a graphify-style code graph, computes Louvain communities lazily for
codedb_communities, and exposes Rust-nativecodedb_module_map/codedb_module_atlasoutputs from a dependency-connected file graph with label propagation, dependency cohesion, cross-folder evidence, semantic-neighbor probes, key symbols, and c-TF-IDF-like labels. - Watches configured source extensions in MCP mode and rebuilds after a debounce.
- Explicit project-local config: all behavior comes from
.codedb-mcp/codedb-mcp.toml. There are no environment-variable switches for indexing behavior. - Project-local storage: cache payloads, manifests, Louvain caches, and DeepWiki output live under
.codedb-mcp. Deleting that directory removes all generated data for the repo. - Scanner: walks the repo with explicit extensions, max file size, project
.gitignorebehavior, skip dirs, and include paths. Nested Git worktrees/submodules under the target root are scanned as normal source directories. UnityLibrary/PackageCachecan be included while the rest ofLibraryis skipped. - Unified language layer: extension dispatch selects a tree-sitter grammar for C#, Java, Rust, Python, Lua, JavaScript, TypeScript/TSX, C, or C++. The parser emits the same
FileEntry/Symbolmodel for every language and visits declarations without descending into large method bodies. - Code-aware references: C#/Java namespace/package imports, qualified names, aliases, static using, annotations, and attribute suffixes feed typed callers and dependency edges. Rust and the other non C#/Java languages currently provide indexed search, outlines, imports/includes/use declarations, Lua
require()imports, and graph nodes, but not Roslyn/JDT-level semantic binding. - Search indexes: builds chunks, exact identifier hits, symbol-definition chunk hits, dependency references, BM25 lexical search, Model2Vec file embeddings, and a Vicinity HNSW vector index.
- Graph layer: builds a graphify-style graph with file, namespace/package, symbol, dependency, and reference edges. Louvain communities and subcommunities are computed lazily on first request and cached under
.codedb-mcp. - Module atlas layer:
codedb_module_mapandcodedb_module_atlasrun in Rust. They first split files by dependency-connected components, then do dependency-weighted label propagation inside each component. Path and token terms are used for naming, evidence, and oversized-component splitting, not as the primary clustering basis.codedb_module_atlasexports Embedding Atlas-ready JSON. - MCP runtime: implemented with the Rust
rmcpSDK over stdio. Tools operate against a warm in-process index, and batch-capable tools pluscodedb_bundlereduce MCP round trips. - Setup guide and skills package:
setup-for-agent.mdowns installation guidance.skills/codedb-mcpis standalone for tool usage and includes the executable, config template, MCP reference, and tool guidance.skills/deepwikibuilds local DeepWiki-style docs from MCP evidence plus the active agent's reasoning.skills/code-module-atlascallscodedb_module_atlasand packages the local meet-blog-style module/file graph webpage.
Default config path:
<repo-root>/.codedb-mcp/codedb-mcp.toml
The repo includes a working example at .codedb-mcp/codedb-mcp.toml and a distributable template at skills/codedb-mcp/assets/codedb-mcp.toml.template.
Important defaults:
[scan]
extensions = ["cs", "java", "rs", "py", "pyw", "lua", "js", "jsx", "mjs", "cjs", "ts", "tsx", "c", "h", "cc", "cpp", "cxx", "hpp", "hh", "hxx"]
max_file_bytes = 50000000
respect_gitignore = true
include_paths = ["Library/PackageCache"]
[embedding]
model = "C:/Users/<user>/.cache/huggingface/hub/codedb-mcp/models/potion-code-16M"
[storage]
enabled = true
dir = ".codedb-mcp"There are no environment-variable toggles. Edit the config file explicitly. respect_gitignore=true reads project .gitignore files, but nested Git worktrees/submodules inside the target root are still indexed unless excluded by skip_dirs or file extension rules. The model path is explicit and absolute; on Windows the setup guide uses the default HuggingFace cache when present, otherwise it falls back to the second available drive.
Build:
cargo build --releaseRun MCP directly:
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml mcp u3dclientQuick CLI checks:
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml index u3dclient
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml search "network listener manager" u3dclient -k 5
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml --root u3dclient tool codedb_status "{}"MCP mode answers the protocol handshake before the initial index finishes, then builds the default project index in the background. Early tool calls may wait for that first build. It also watches indexed extensions by default; when a configured source file changes, the server debounces events, rebuilds the project index in the background, and swaps in the new index after it is ready. Use --no-watch for static benchmark runs.
codedb_search accepts queries:
{
"max_results": 3,
"queries": [
"PoolManager",
{
"query": "Joystick",
"path_glob": "Assets/Plugins/3rdPlugins/Joystick Pack/**"
},
{
"query": "NetworkListenerManager",
"regex": true,
"compact": true
}
]
}codedb_callers accepts targets:
{
"max_results": 10,
"targets": [
{
"name": "PoolManager",
"definition_path": "Assets/Scripts/HotFix/3rdExtend/Runtime/PoolManager/PoolManager.cs",
"definition_line": 26
},
{
"name": "Joystick",
"definition_path": "Assets/Plugins/3rdPlugins/Joystick Pack/Scripts/Runtime/Base/Joystick.cs",
"definition_line": 8
}
]
}codedb_communities uses lazy Louvain clustering:
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml --root u3dclient tool codedb_communities "{`"community_limit`":10}"
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml --root u3dclient tool codedb_communities "{`"community_id`":0,`"children`":true,`"community_limit`":20}"Overview calls return community IDs, labels, member counts, and cohesion. Add children=true or subcommunities=true with a community_id to split only that community's subgraph; child clusters are cached in .codedb-mcp/louvain-subcommunities.bin.
codedb_module_map is the preferred DeepWiki planning call. It uses the Rust dependency-connected module graph, then adds dependency cohesion, cross-folder roots, semantic-neighbor probes, entry points, key symbols, and c-TF-IDF-like labels:
target\release\codebase-mcp.exe --config u3dclient\.codedb-mcp\codedb-mcp.toml --root u3dclient tool codedb_module_map "{`"path_prefix`":`"Assets/Scripts`",`"limit`":40,`"min_files`":2,`"semantic_neighbors`":5}"The skills/ directory is intended to be copied as a standalone package.
setup-for-agent.md: installation guide for agents. It reuses the default HuggingFace cache when present, falls back to the second Windows drive when absent, and writes project-local config with an absolute model path.skills/codedb-mcp: includesassets/codebase-mcp.exe, a config template, MCP registration reference, and tool guidance. It does not own setup.skills/deepwiki: creates DeepWiki-style local documentation using localcodedb_*tools plus the active agent's reasoning. It emphasizes business module boundaries over folder-only or community-only grouping.skills/code-module-atlas: creates a local 3D module/file atlas webpage by callingcodedb_module_atlas, then adapting the bundled meet-blog-style viewer. Generated repo-specific JSON stays ignored.
- meet-blog.buyixiao.xyz inspired the Code Module Atlas visual style and viewer experience.
- justrach/codedb inspired the original MCP tool interface direction.
