Skip to content

Commit f70fe5d

Browse files
committed
Fix Magma import/call extraction, resolve file-path imports in linker
- Add field('path', ...) to Magma load_statement grammar rule so parse_generic_imports() finds the import path via field lookup instead of the broken text fallback (which only extracted 1 per file) - Fix passImports() to resolve file-path imports (e.g. "utils.mag", "lib/helpers.h") via fqn.ModuleQN() when raw path doesn't match any node QN — general fix benefiting any file-path-based import - Add TestMagmaImport_Regression and TestMagmaCall_Regression - Update language count 59 → 63 in README, docs/index.html, marketing
1 parent ef4ad0c commit f70fe5d

25 files changed

Lines changed: 1667 additions & 159 deletions

BENCHMARK.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Methodology
44

5-
- **35 languages** (27 programming + 8 config/markup), 12 questions each (4 for config languages)
5+
- **63 languages** (27 programming + 8 config/markup), 12 questions each (4 for config languages)
66
- **Up to 5 attempts** per question with escalating retry strategies
77
- **Real open-source repos** (medium to large: 78--49K nodes)
88
- **Grading**: PASS (1.0) / PARTIAL (0.5) / FAIL (0.0), N/A excluded from denominator
@@ -847,7 +847,7 @@ The tool handles 20K-node kernel subsystems without timeouts. Deep traces on wel
847847
## Cross-Cutting Findings
848848

849849
### Strengths
850-
1. **Zero indexing failures** across 35 repos of all sizes (78 to 49K nodes)
850+
1. **Zero indexing failures** across 63 repos of all sizes (78 to 49K nodes)
851851
2. **100% on 17 languages** -- half the languages achieved perfect scores
852852
3. **No language below 62%** -- even the weakest performs core operations
853853
4. **Massive codebases handled**: Django (49K nodes), Laravel (38K nodes), neovim (24K nodes) -- no performance issues

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ golangci-lint run ./...
3737
```
3838
cmd/codebase-memory-mcp/ Entry point (MCP server + CLI + install/update)
3939
internal/
40-
lang/ Language specs (35 languages, tree-sitter node types)
40+
lang/ Language specs (63 languages, tree-sitter node types)
4141
parser/ Tree-sitter grammar loading
4242
pipeline/ Multi-pass indexing pipeline
4343
httplink/ Cross-service HTTP route matching

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Parses source code with [tree-sitter](https://tree-sitter.github.io/tree-sitter/
88

99
## Features
1010

11-
- **59 languages**: Python, Go, JavaScript, TypeScript, TSX, Rust, Java, C++, C#, C, PHP, Lua, Scala, Kotlin, Ruby, Bash, Zig, Elixir, Haskell, OCaml, Objective-C, Swift, Dart, Perl, Groovy, Erlang, R, Clojure, F#, Julia, Vim Script, Nix, Common Lisp, Elm, Fortran, CUDA, COBOL, Verilog, Emacs Lisp, HTML, CSS, SCSS, YAML, TOML, HCL, SQL, Dockerfile, JSON, XML, Markdown, Makefile, CMake, Protobuf, GraphQL, Vue, Svelte, Meson, GLSL, INI
11+
- **63 languages**: Python, Go, JavaScript, TypeScript, TSX, Rust, Java, C++, C#, C, PHP, Lua, Scala, Kotlin, Ruby, Bash, Zig, Elixir, Haskell, OCaml, Objective-C, Swift, Dart, Perl, Groovy, Erlang, R, Clojure, F#, Julia, Vim Script, Nix, Common Lisp, Elm, Fortran, CUDA, COBOL, Verilog, Emacs Lisp, MATLAB, Lean 4, FORM, Magma, HTML, CSS, SCSS, YAML, TOML, HCL, SQL, Dockerfile, JSON, XML, Markdown, Makefile, CMake, Protobuf, GraphQL, Vue, Svelte, Meson, GLSL, INI
1212
- **Architecture overview**: `get_architecture` returns languages, packages, entry points, routes, hotspots, boundaries, layers, and clusters in a single call — instant codebase orientation
1313
- **Architecture Decision Records**: `manage_adr` persists architectural decisions (PURPOSE, STACK, ARCHITECTURE, PATTERNS, TRADEOFFS, PHILOSOPHY) across sessions with section filtering and validation
1414
- **Louvain community detection**: Discovers hidden functional modules across packages by clustering CALLS, HTTP_CALLS, and ASYNC_CALLS edges
@@ -57,7 +57,7 @@ Claude Code formats and explains the results.
5757

5858
**Why no built-in LLM?** Other code graph tools embed an LLM to translate natural language into graph queries. This means extra API keys, extra cost per query, and another model to configure. With MCP, the AI assistant you're already talking to *is* the query translator — no duplication needed.
5959

60-
**Token efficiency**: Compared to having an AI agent grep through your codebase file by file, graph queries return precise results in a single tool call. In benchmarks across 59 real-world repos (78 to 49K nodes), five structural queries consumed ~3,400 tokens via codebase-memory-mcp versus ~412,000 tokens via file-by-file exploration — a **99.2% reduction**. All 59 supported languages use the same efficient graph backend.
60+
**Token efficiency**: Compared to having an AI agent grep through your codebase file by file, graph queries return precise results in a single tool call. In benchmarks across 63 real-world repos (78 to 49K nodes), five structural queries consumed ~3,400 tokens via codebase-memory-mcp versus ~412,000 tokens via file-by-file exploration — a **99.2% reduction**. All 63 supported languages use the same efficient graph backend.
6161

6262
## Performance
6363

@@ -663,7 +663,7 @@ make install # go install
663663

664664
## Language Benchmark
665665

666-
59 languages supported. Benchmarked against 59 real open-source repositories (78 to 49K nodes). 12 standardized questions per language. Grading: HIGH (1.0) / MEDIUM (0.5) / LOW (0.1). Overall: **76%** average MCP score across all languages (97% for explorer-based agents).
666+
63 languages supported. Benchmarked against 63 real open-source repositories (78 to 49K nodes). 12 standardized questions per language. Grading: HIGH (1.0) / MEDIUM (0.5) / LOW (0.1). Overall: **76%** average MCP score across all languages (97% for explorer-based agents).
667667

668668
| Tier | Score | Languages |
669669
|------|-------|-----------|
@@ -682,8 +682,8 @@ See [`BENCHMARK.md`](BENCHMARK.md) for the full 35-language benchmark with per-q
682682
cmd/codebase-memory-mcp/ Entry point (MCP stdio server + CLI mode + install/update commands)
683683
internal/
684684
store/ SQLite graph storage (nodes, edges, traversal, search, architecture, Louvain clustering)
685-
lang/ Language specs (59 languages, tree-sitter node types)
686-
cbm/ Vendored tree-sitter C grammars (59 languages) and AST extraction engine
685+
lang/ Language specs (63 languages, tree-sitter node types)
686+
cbm/ Vendored tree-sitter C grammars (63 languages) and AST extraction engine
687687
pipeline/ Multi-pass indexing (structure → definitions → calls → HTTP links → config links → communities → tests)
688688
httplink/ Cross-service HTTP route/call-site matching
689689
cypher/ Cypher query lexer, parser, planner, executor

docs/index.html

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,19 @@
44
<meta charset="UTF-8">
55
<meta name="viewport" content="width=device-width, initial-scale=1.0">
66
<title>codebase-memory-mcp — Code Knowledge Graph for AI Assistants</title>
7-
<meta name="description" content="MCP server that indexes codebases into a persistent knowledge graph. 59 languages, 120x fewer tokens, single Go binary. Works with Claude Code, Codex CLI, Cursor, Windsurf, Gemini CLI, VS Code, Zed.">
7+
<meta name="description" content="MCP server that indexes codebases into a persistent knowledge graph. 63 languages, 120x fewer tokens, single Go binary. Works with Claude Code, Codex CLI, Cursor, Windsurf, Gemini CLI, VS Code, Zed.">
88
<meta name="keywords" content="MCP server, code analysis, knowledge graph, tree-sitter, Claude Code, Codex, Cursor, Windsurf, Gemini CLI, VS Code, Zed, developer tools, code exploration, token reduction">
99

1010
<!-- Open Graph -->
1111
<meta property="og:title" content="codebase-memory-mcp — Code Knowledge Graph for AI Assistants">
12-
<meta property="og:description" content="120x fewer tokens for AI code exploration. 59 languages, sub-ms queries, single Go binary.">
12+
<meta property="og:description" content="120x fewer tokens for AI code exploration. 63 languages, sub-ms queries, single Go binary.">
1313
<meta property="og:type" content="website">
1414
<meta property="og:url" content="https://deusdata.github.io/codebase-memory-mcp">
1515

1616
<!-- Twitter Card -->
1717
<meta name="twitter:card" content="summary_large_image">
1818
<meta name="twitter:title" content="codebase-memory-mcp — Code Knowledge Graph for AI Assistants">
19-
<meta name="twitter:description" content="120x fewer tokens for AI code exploration. 59 languages, sub-ms queries, single Go binary.">
19+
<meta name="twitter:description" content="120x fewer tokens for AI code exploration. 63 languages, sub-ms queries, single Go binary.">
2020

2121
<link rel="canonical" href="https://deusdata.github.io/codebase-memory-mcp">
2222

@@ -210,7 +210,7 @@ <h1>codebase-memory-mcp</h1>
210210
<div class="label">fewer tokens</div>
211211
</div>
212212
<div class="stat">
213-
<div class="number">59</div>
213+
<div class="number">63</div>
214214
<div class="label">languages</div>
215215
</div>
216216
<div class="stat">
@@ -322,8 +322,8 @@ <h2>Benchmark results</h2>
322322
<h2>Features</h2>
323323
<div class="features">
324324
<div class="feature">
325-
<h3>59 Languages</h3>
326-
<p>Python, Go, JS, TS, TSX, Rust, Java, C++, C#, C, PHP, Ruby, Kotlin, Scala, Zig, Elixir, Haskell, OCaml, Swift, Dart, and 39 more via vendored tree-sitter grammars.</p>
325+
<h3>63 Languages</h3>
326+
<p>Python, Go, JS, TS, TSX, Rust, Java, C++, C#, C, PHP, Ruby, Kotlin, Scala, Zig, Elixir, Haskell, OCaml, Swift, Dart, MATLAB, Lean 4, and 41 more via vendored tree-sitter grammars.</p>
327327
</div>
328328
<div class="feature">
329329
<h3>Call Graph Tracing</h3>
@@ -371,7 +371,7 @@ <h2>How it compares</h2>
371371
</tr>
372372
</thead>
373373
<tbody>
374-
<tr><td>Languages</td><td class="win">59</td><td>8-11</td></tr>
374+
<tr><td>Languages</td><td class="win">63</td><td>8-11</td></tr>
375375
<tr><td>Runtime</td><td class="win">Single Go binary</td><td>Node.js (npx)</td></tr>
376376
<tr><td>Runtime dependency</td><td class="win">None</td><td>Node.js</td></tr>
377377
<tr><td>Embedded LLM</td><td class="win">No (uses your MCP client)</td><td>Yes (extra API key + cost)</td></tr>

internal/cbm/regression_test.go

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1031,3 +1031,34 @@ func TestMagmaParse_Regression(t *testing.T) {
10311031
t.Fatal(err)
10321032
}
10331033
}
1034+
1035+
func TestMagmaImport_Regression(t *testing.T) {
1036+
src := []byte("load \"utils.mag\";\nload \"lib/helpers.mag\";\n")
1037+
r, err := ExtractFile(src, lang.Magma, "t", "main.mag")
1038+
if err != nil {
1039+
t.Fatal(err)
1040+
}
1041+
if len(r.Imports) < 2 {
1042+
t.Fatalf("expected at least 2 imports, got %d", len(r.Imports))
1043+
}
1044+
}
1045+
1046+
func TestMagmaCall_Regression(t *testing.T) {
1047+
src := []byte("function Foo(x)\n y := Bar(x);\n return y;\nend function;\n")
1048+
r, err := ExtractFile(src, lang.Magma, "t", "calls.mag")
1049+
if err != nil {
1050+
t.Fatal(err)
1051+
}
1052+
if len(r.Calls) == 0 {
1053+
t.Fatal("expected calls to be extracted, got 0")
1054+
}
1055+
found := false
1056+
for _, c := range r.Calls {
1057+
if c.CalleeName == "Bar" {
1058+
found = true
1059+
}
1060+
}
1061+
if !found {
1062+
t.Errorf("expected call to 'Bar', got calls: %v", r.Calls)
1063+
}
1064+
}

0 commit comments

Comments
 (0)