Skip to content

Commit 5755ef1

Browse files
committed
Add get_architecture, manage_adr tools; rename Leiden to Louvain
New tools: - get_architecture: codebase architecture overview with 12 selectable aspects (languages, packages, entry_points, routes, hotspots, boundaries, services, layers, clusters, file_tree, adr) - manage_adr: CRUD for Architecture Decision Records with 6 fixed sections, section filtering, validation, and discovery of existing architecture docs Architecture analysis fixes: - Fix qnToPackage extracting segment[2] for meaningful sub-packages instead of segment[1] which collapsed everything to top-level dirs - Filter test functions from entry_points, hotspots, and routes using both is_test property (Module/File nodes) and file_path pattern matching - Filter boundaries to Function/Method/Class nodes only - Add FindArchitectureDocs for discovering existing architecture documentation Naming corrections: - Rename Leiden to Louvain throughout (algorithm is simplified Louvain, not actual Leiden which requires CPM-based refinement) - leiden.go -> louvain.go with all symbols renamed - Remove unused ClusterInfo.Modularity field (always zero) Other changes: - Case-insensitive search by default for search_graph and search_code - Remove read_file and list_directory tools (handled by coding agents natively) - Update tool descriptions and README for 8000 char ADR limit
1 parent a0674cb commit 5755ef1

16 files changed

Lines changed: 2999 additions & 298 deletions

README.md

Lines changed: 77 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,17 @@
44

55
Single Go binary. No Docker, no external databases, no API keys. One command to install, say *"Index this project"* — done.
66

7-
Parses source code with [tree-sitter](https://tree-sitter.github.io/tree-sitter/), extracts functions, classes, modules, call relationships, and cross-service HTTP links. Exposes the graph through 14 MCP tools for use with Claude Code, Codex CLI, Cursor, Windsurf, or any MCP-compatible client. Also includes a **CLI mode** for direct tool invocation from the shell — no MCP client needed.
7+
Parses source code with [tree-sitter](https://tree-sitter.github.io/tree-sitter/), extracts functions, classes, modules, call relationships, and cross-service HTTP links. Exposes the graph through 12 MCP tools for use with Claude Code, Codex CLI, Cursor, Windsurf, or any MCP-compatible client. Also includes a **CLI mode** for direct tool invocation from the shell — no MCP client needed.
88

99
## Features
1010

1111
- **35 languages**: Python, Go, JavaScript, TypeScript, TSX, Rust, Java, C++, C#, C, PHP, Lua, Scala, Kotlin, Ruby, Bash, Zig, Elixir, Haskell, OCaml, Objective-C, Swift, Dart, Perl, Groovy, Erlang, R, HTML, CSS, SCSS, YAML, TOML, HCL, SQL, Dockerfile
12+
- **Architecture overview**: `get_architecture` returns languages, packages, entry points, routes, hotspots, boundaries, layers, and clusters in a single call — instant codebase orientation
13+
- **Architecture Decision Records**: `manage_adr` persists architectural decisions (PURPOSE, STACK, ARCHITECTURE, PATTERNS, TRADEOFFS, PHILOSOPHY) across sessions with section filtering and validation
14+
- **Louvain community detection**: Discovers hidden functional modules across packages by clustering CALLS, HTTP_CALLS, and ASYNC_CALLS edges
1215
- **Git diff impact mapping**: `detect_changes` maps uncommitted changes to affected graph symbols + blast radius with risk classification (CRITICAL/HIGH/MEDIUM/LOW)
1316
- **Risk-classified tracing**: `trace_call_path` with `risk_labels=true` adds impact classification to every node in the call chain
17+
- **Case-insensitive search**: `search_graph` and `search_code` are case-insensitive by default — set `case_sensitive=true` for exact matching
1418
- **One-command install**: `codebase-memory-mcp install` auto-detects Claude Code, Codex CLI, Cursor, and Windsurf, registers the MCP server, and installs task-specific skills
1519
- **Self-update**: `codebase-memory-mcp update` downloads the latest release, verifies checksums, and atomically swaps the binary
1620
- **Task-specific skills**: 4 skills (exploring, tracing, quality, reference) that prescribe exact tool sequences — Claude Code automatically uses graph tools instead of defaulting to grep
@@ -239,7 +243,7 @@ Add the MCP server to your project's `.mcp.json` (per-project, recommended) or `
239243
}
240244
```
241245

242-
Restart Claude Code after adding the config. Verify with `/mcp` — you should see `codebase-memory-mcp` listed with 14 tools.
246+
Restart Claude Code after adding the config. Verify with `/mcp` — you should see `codebase-memory-mcp` listed with 12 tools.
243247

244248
</details>
245249

@@ -331,22 +335,22 @@ The CLI uses the same SQLite database as the MCP server (`~/.cache/codebase-memo
331335

332336
| Tool | Key Parameters | Description |
333337
|------|---------------|-------------|
334-
| `search_graph` | `label`, `name_pattern`, `project`, `file_pattern`, `relationship`, `direction`, `min_degree`, `max_degree`, `exclude_entry_points`, `limit` (default 100), `offset` | Structured search with filters. Use `project` to scope to a single repo when multiple are indexed. Supports pagination via `limit`/`offset` — response includes `has_more` and `total`. |
338+
| `search_graph` | `label`, `name_pattern`, `project`, `file_pattern`, `relationship`, `direction`, `min_degree`, `max_degree`, `exclude_entry_points`, `case_sensitive`, `limit` (default 100), `offset` | Structured search with filters. **Case-insensitive by default** (set `case_sensitive=true` for exact case). Use `project` to scope to a single repo when multiple are indexed. Supports pagination via `limit`/`offset` — response includes `has_more` and `total`. |
335339
| `trace_call_path` | `function_name` (required), `direction` (inbound/outbound/both), `depth` (1-5, default 3), `risk_labels` (boolean) | BFS traversal from/to a function (exact name match). Returns call chains with signatures, constants, and edge types. Capped at 200 nodes. With `risk_labels=true`, adds CRITICAL/HIGH/MEDIUM/LOW classification and `impact_summary`. |
336340
| `detect_changes` | `scope` (unstaged/staged/all/branch), `base_branch`, `depth` (1-5, default 3) | Map git diff to affected graph symbols + blast radius. Returns changed files, changed symbols, and impacted callers with risk classification. Requires git in PATH. |
337-
| `query_graph` | `query` (required) | Execute Cypher-like graph queries (read-only). See [Supported Cypher Subset](#supported-cypher-subset) for what's supported. |
341+
| `query_graph` | `query` (required) | Execute Cypher-like graph queries (read-only). String matching in WHERE is case-sensitive by default — use `(?i)` flag for case-insensitive regex. See [Supported Cypher Subset](#supported-cypher-subset). |
338342
| `get_graph_schema` || Node/edge counts, relationship patterns, sample names. Run this first to understand what's in the graph. |
339343
| `get_code_snippet` | `qualified_name` (required) | Read source code for a function by its qualified name (reads from disk). See [Qualified Names](#qualified-names) for the format. |
344+
| `get_architecture` | `aspects` (array, default `["all"]`), `project` | Codebase architecture overview computed from the code graph. Aspects: `languages`, `packages`, `entry_points`, `routes`, `hotspots`, `boundaries`, `services`, `layers` (heuristic), `clusters` (Louvain community detection), `file_tree`, `adr` (stored Architecture Decision Record). Call with `["all"]` for full orientation. |
345+
| `manage_adr` | `mode` (required: `get`/`store`/`update`/`delete`), `project`, `content`, `sections` | CRUD for Architecture Decision Records. `get`: retrieve ADR with parsed sections. `store`: create/replace full ADR (max 8000 chars). `update`: patch specific sections (unmentioned preserved). `delete`: remove ADR. Fixed sections: PURPOSE, STACK, ARCHITECTURE, PATTERNS, TRADEOFFS, PHILOSOPHY. |
340346

341347
### File Access
342348

343-
> **Note**: These tools require at least one indexed project. They resolve relative paths against indexed project roots. Index a project first with `index_repository`.
349+
> **Note**: File reading and directory listing are handled natively by your coding agent (Claude Code `Read` tool, Codex CLI `cat`/`ls`, etc.). The tools below provide text search within indexed project files.
344350
345351
| Tool | Key Parameters | Description |
346352
|------|---------------|-------------|
347-
| `search_code` | `pattern` (required), `file_pattern`, `regex`, `max_results` (default 100), `offset` | Grep-like text search within indexed project files. Supports pagination via `max_results`/`offset`. |
348-
| `read_file` | `path`, `start_line`, `end_line` | Read any file from an indexed project. Path can be absolute or relative to project root. |
349-
| `list_directory` | `path`, `pattern` | List files/directories with optional glob filtering (e.g. `*.go`, `*.py`). |
353+
| `search_code` | `pattern` (required), `file_pattern`, `regex`, `case_sensitive`, `max_results` (default 100), `offset` | Grep-like text search within indexed project files. **Case-insensitive by default** (set `case_sensitive=true` for exact case). Supports pagination via `max_results`/`offset`. |
350354

351355
## Usage Examples
352356

@@ -356,10 +360,64 @@ The CLI uses the same SQLite database as the MCP server (`~/.cache/codebase-memo
356360
index_repository(repo_path="/path/to/your/project")
357361
```
358362

363+
### Get codebase architecture overview
364+
365+
```
366+
get_architecture(aspects=["all"])
367+
# → languages, packages, entry points, routes, hotspots, boundaries, services, layers, clusters, file tree
368+
369+
get_architecture(aspects=["languages", "packages"])
370+
# → quick orientation — just language breakdown and top packages
371+
372+
get_architecture(aspects=["hotspots", "boundaries", "clusters"])
373+
# → dependency analysis — most-called functions, cross-package calls, community detection
374+
```
375+
376+
### Manage Architecture Decision Records (ADR)
377+
378+
```
379+
# Store a new ADR
380+
manage_adr(mode="store", content="## PURPOSE\nOrder processing service\n\n## STACK\n- Go: speed\n- SQLite: embedded storage")
381+
382+
# Update specific sections (others preserved)
383+
manage_adr(mode="update", sections={"PATTERNS": "- Pipeline pattern\n- Repository pattern"})
384+
385+
# Retrieve the full ADR with parsed sections
386+
manage_adr(mode="get")
387+
388+
# View ADR via architecture overview
389+
get_architecture(aspects=["adr"])
390+
391+
# Delete the ADR
392+
manage_adr(mode="delete")
393+
```
394+
359395
### Find all functions matching a pattern
360396

397+
Search is **case-insensitive by default** — no need for `(?i)`:
398+
361399
```
362-
search_graph(label="Function", name_pattern=".*Handler")
400+
search_graph(label="Function", name_pattern=".*handler")
401+
# → matches "Handler", "handler", "HANDLER", "RequestHandler", etc.
402+
403+
# Use regex alternatives for broad matching:
404+
search_graph(name_pattern="auth|authenticate|authorization")
405+
406+
# Opt in to exact case matching when needed:
407+
search_graph(name_pattern=".*Handler", case_sensitive=true)
408+
```
409+
410+
### Search code (text search)
411+
412+
```
413+
search_code(pattern="TODO")
414+
# → case-insensitive by default, matches "TODO", "Todo", "todo"
415+
416+
search_code(pattern="TODO|FIXME|HACK", regex=true)
417+
# → find all issue markers
418+
419+
search_code(pattern="TODO", case_sensitive=true)
420+
# → exact case match only
363421
```
364422

365423
### Trace what a function calls
@@ -418,6 +476,11 @@ search_graph(label="Route")
418476
query_graph(query="MATCH (f:Function)-[:CALLS]->(g:Function) WHERE f.name = 'main' RETURN g.name, g.qualified_name LIMIT 20")
419477
```
420478

479+
```
480+
# Case-insensitive regex in Cypher (use (?i) flag):
481+
query_graph(query="MATCH (f:Function) WHERE f.name =~ '(?i).*handler.*' RETURN f.name LIMIT 20")
482+
```
483+
421484
```
422485
query_graph(query="MATCH (a)-[r:HTTP_CALLS]->(b) RETURN a.name, b.name, r.url_path, r.confidence LIMIT 10")
423486
```
@@ -589,7 +652,7 @@ make install # go install
589652
|---------|-------|-----|
590653
| `/mcp` doesn't show the server | Config not loaded or binary not found | Check `.mcp.json` path is absolute and correct. Restart Claude Code. Verify binary runs: `/path/to/codebase-memory-mcp` should output JSON. |
591654
| `index_repository` fails | Missing `repo_path` or path doesn't exist | Pass an absolute path: `index_repository(repo_path="/absolute/path")` |
592-
| `read_file` / `list_directory` returns error | No project indexed yet | Run `index_repository` first. These tools resolve paths against indexed project roots. |
655+
| `get_architecture` returns empty sections | No project indexed or project has few nodes | Run `index_repository` first. Some aspects (routes, hotspots, clusters) require enough graph data to produce meaningful results. |
593656
| `get_code_snippet` returns "node not found" | Wrong qualified name format | Use `search_graph` first to find the exact `qualified_name`, then pass it to `get_code_snippet`. See [Qualified Names](#qualified-names). |
594657
| `trace_call_path` returns 0 results | Exact name match — no fuzzy matching | Use `search_graph(name_pattern=".*PartialName.*")` to discover the exact function name first. |
595658
| Queries return results from wrong project | Multiple projects indexed, no filter | Add `project="your-project-name"` to `search_graph`. Use `list_projects` to see indexed project names. |
@@ -617,17 +680,18 @@ See [`BENCHMARK.md`](BENCHMARK.md) for the full 35-language benchmark with per-q
617680
```
618681
cmd/codebase-memory-mcp/ Entry point (MCP stdio server + CLI mode + install/update commands)
619682
internal/
620-
store/ SQLite graph storage (nodes, edges, traversal, search)
683+
store/ SQLite graph storage (nodes, edges, traversal, search, architecture, Louvain clustering)
621684
lang/ Language specs (35 languages, tree-sitter node types)
622685
parser/ Tree-sitter grammar loading and AST parsing
623-
pipeline/ 4-pass indexing (structure -> definitions -> calls -> HTTP links)
686+
pipeline/ Multi-pass indexing (structure definitions calls HTTP links → communities → tests)
624687
httplink/ Cross-service HTTP route/call-site matching
625688
cypher/ Cypher query lexer, parser, planner, executor
626689
selfupdate/ GitHub release checking, version comparison, asset download
627-
tools/ MCP tool handlers (14 tools) + CLI dispatch
690+
tools/ MCP tool handlers (12 tools) + CLI dispatch
628691
watcher/ Background auto-sync (mtime+size polling, adaptive intervals)
629692
discover/ File discovery with .cgrignore support
630693
fqn/ Qualified name computation
694+
traces/ OpenTelemetry trace ingestion for HTTP_CALLS validation
631695
```
632696

633697
## License

internal/pipeline/communities.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ import (
99
"github.com/DeusData/codebase-memory-mcp/internal/store"
1010
)
1111

12-
// passCommunities runs Leiden-style community detection on the CALLS graph
12+
// passCommunities runs Louvain community detection on the CALLS graph
1313
// and creates Community nodes + MEMBER_OF edges.
1414
func (p *Pipeline) passCommunities() {
1515
slog.Info("pass.communities")
@@ -37,7 +37,7 @@ func (p *Pipeline) passCommunities() {
3737
adj[e.TargetID][e.SourceID] = true
3838
}
3939

40-
// Run Louvain/Leiden community detection
40+
// Run Louvain community detection
4141
communities := louvainCommunities(adj, allNodes)
4242

4343
// Create Community nodes + MEMBER_OF edges

0 commit comments

Comments
 (0)