Skip to content

pyalwin/codemesh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

130 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Codemesh

Intelligent code knowledge graph for AI coding agents

71% cheaper, 72% faster, 82% fewer tool calls vs baseline Grep+Read
on 6 real-world repos (Sonnet 4.6) β€” from a single codemesh index.

npm Tests License TypeScript MCP

Benchmarks Β· Quick Start Β· Integrations Β· Write-Back Β· How It Works Β· API Reference Β· Full Results


The Problem

AI coding agents waste 40-80% of their tokens on discovery β€” grepping through files, reading irrelevant code, and rebuilding context they've already seen in previous sessions.

On a 600-file codebase, a typical exploration task involves 10+ file reads before the agent even knows what's relevant.

Before:  Agent β†’ Grep β†’ 50 matches β†’ Read 10 files β†’ Understand β†’ Work
After:   Agent β†’ codemesh_explore β†’ 3 relevant files β†’ codemesh_trace β†’ full path β†’ Work

Codemesh is an MCP server that gives agents a persistent, queryable knowledge graph. The graph gets smarter over time: agents write back what they learn, so the next session starts informed.


Benchmarks

Benchmarked on 6 real-world codebases (Alamofire, Excalidraw, VS Code, Swift Compiler, pydantic-validators, pydantic-basemodel) with Claude Sonnet 4.6, compared alongside baseline and graph-based approaches for context.

Full methodology, per-repo breakdowns, and pairwise comparisons: docs/benchmark-results.md | Early pydantic evals

Cost

Mode Alamofire Excalidraw VS Code Swift Compiler1 pydantic-validators pydantic-basemodel Avg
Baseline $0.54 $0.89 $0.21 $0.83 $1.32 $0.78 $0.76
Codemesh MCP $0.25 $0.21 $0.16 $0.23 $0.33 $0.13 $0.22
Codemesh CLI $0.67 $0.51 $0.16 $0.83 $1.00 $0.18 $0.56
Codegraph $0.37 $0.56 $0.57 $0.74 $0.29 $0.19 $0.45

Time

Mode Alamofire Excalidraw VS Code Swift1 pydantic-v pydantic-b Avg
Baseline 180s 191s 87s 199s 352s 232s 207s
Codemesh MCP 78s 45s 35s 87s 72s 32s 58s
Codemesh CLI 226s 177s 62s 227s 235s 51s 163s
Codegraph 134s 180s 192s 199s 75s 60s 140s

Tool calls (agent turns)

Mode Alamofire Excalidraw VS Code Swift1 pydantic-v pydantic-b Avg
Baseline 31 48 12 29 84 65 45
Codemesh MCP 9 5 3 14 14 3 8
Codemesh CLI 30 32 12 56 64 9 34
Codegraph 31 35 44 44 20 12 31

Quality (1–10, LLM-as-judge)

Mode Alamofire2 Excalidraw VS Code Swift Compiler pydantic-validators pydantic-basemodel Avg
Baseline n/a 9 8 7 2 9 7.0
Codemesh MCP 9 9 7 8 7 7.8 7.9
Codemesh CLI 9 7 7 9 1 8.4 6.9
Codegraph 8 9 8.7 8 8 9 8.4

Cost savings: Codemesh MCP vs Baseline

Repo Baseline Codemesh MCP Cost saved Time saved
Alamofire $0.54 $0.25 βˆ’54% βˆ’57% (180s β†’ 78s)
Excalidraw $0.89 $0.21 βˆ’76% βˆ’76% (191s β†’ 45s)
VS Code $0.21 $0.16 βˆ’24% βˆ’60% (87s β†’ 35s)
Swift Compiler1 $0.83 $0.23 βˆ’72% βˆ’56% (199s β†’ 87s)
pydantic-validators $1.32 $0.33 βˆ’75% βˆ’79% (352s β†’ 72s)
pydantic-basemodel $0.78 $0.13 βˆ’83% βˆ’86% (232s β†’ 32s)
Average $0.76 $0.22 βˆ’71% βˆ’72%

Note

Codemesh MCP achieves the lowest cost and fastest time of any mode tested β€” 71% cheaper and 72% faster than baseline on average across 6 repos, using 82% fewer tool calls (8 vs 45). Quality is comparable to baseline (7.9 vs 7.0); Codegraph edges Codemesh on quality (8.4) but at roughly double the cost ($0.45 vs $0.22). Every repo shows cost and time savings β€” including the comprehension-heavy queries (Excalidraw, pydantic-basemodel) that regressed in prior builds of codemesh.


Quick Start

1. Install

npm install -g @pyalwin/codemesh

Or run directly without installing:

npx -y @pyalwin/codemesh --help
Build from source
git clone https://github.com/pyalwin/codemesh.git
cd codemesh
npm install && npm run build
npm link

Verify the install: codemesh --version should print the package version.

2. Index your project

cd /your/project
codemesh index --with-embeddings
Indexed 656 files
  Symbols found:  16733
  Edges created:  33266
  Duration:       10009ms
  PageRank:       13843 nodes scored
  Embeddings:     13187 symbols embedded

3. Choose your mode

Codemesh offers two ways to integrate with AI agents:

Option A: MCP Server (structured tool calls)

Add to your Claude Code MCP config (~/.claude/mcp-servers.json or project .mcp.json):

{
  "mcpServers": {
    "codemesh": {
      "command": "npx",
      "args": ["-y", "@pyalwin/codemesh"],
      "env": {
        "CODEMESH_PROJECT_ROOT": "/path/to/your/project"
      }
    }
  }
}

The agent gets native MCP tools:

  • codemesh_answer β€” one-call question answering (PRIMARY)
  • codemesh_explore β€” search, context (multi-target), impact
  • codemesh_trace β€” follow call chains
  • codemesh_enrich / codemesh_workflow β€” write back
  • codemesh_status β€” health check

Best for: Opus, structured workflows, enrichment/write-back

Option B: CLI Mode (via Bash β€” zero MCP overhead)

No MCP config needed. The agent calls codemesh directly via Bash:

export CODEMESH_PROJECT_ROOT=/path/to/your/project

# Primary β€” one-call question answering:
codemesh explore answer "How does request handling work?"

# Follow-up commands:
codemesh explore search "request flow"
codemesh explore context Source/Core/Session.swift Source/Core/Request.swift
codemesh explore trace Session.request --depth 5
codemesh explore semantic "network request handling"  # requires --with-embeddings

All commands return JSON to stdout. No MCP server process, no protocol overhead.

Best for: Sonnet/Haiku, speed-sensitive workflows, simpler setup

Which mode should I use?

MCP Server CLI Mode
Setup MCP config file Just export CODEMESH_PROJECT_ROOT
Overhead MCP protocol per call Zero β€” direct subprocess
Enrichment Native codemesh_enrich tool Via Bash("codemesh enrich ...")
Best model Opus (follows MCP well) Sonnet (55% cheaper, 61% faster than baseline)
Recommended Complex codebases Default choice

4. Use it

The agent now has 6 new tools. Query the graph before reading code:

You: "Find how pydantic handles validation"

Agent calls: codemesh_answer({ question: "How does pydantic handle validation?" })
       gets: 9 relevant files ranked by PageRank, call chains, 
             git hotspots, co-change relationships, 5 suggested reads

Agent calls: Read("pydantic/functional_validators.py", lines 1-50)
       reads: only the specific lines suggested by the answer tool

Agent calls: codemesh_enrich({ path: "pydantic/functional_validators.py",
               summary: "Primary V2 validator API..." })
       saves: summary for next session

Client Integrations

Codemesh speaks the Model Context Protocol, so any MCP-compatible client can use it. Paste one of the snippets below, restart the client, and the six codemesh_* tools show up in the agent's toolbox.

Claude Code (CLI)

Add to ~/.claude/mcp-servers.json (user-wide) or .mcp.json (project-local):

{
  "mcpServers": {
    "codemesh": {
      "command": "npx",
      "args": ["-y", "@pyalwin/codemesh"],
      "env": {
        "CODEMESH_PROJECT_ROOT": "/absolute/path/to/your/project"
      }
    }
  }
}
Claude Desktop (macOS / Windows app)

Edit claude_desktop_config.json:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "codemesh": {
      "command": "npx",
      "args": ["-y", "@pyalwin/codemesh"],
      "env": {
        "CODEMESH_PROJECT_ROOT": "/absolute/path/to/your/project"
      }
    }
  }
}

Restart Claude Desktop. Codemesh's tools will appear in the tool picker (hammer icon).

Cursor β€” stop the agent from wandering your codebase

Cursor reads .cursor/mcp.json per project (or ~/.cursor/mcp.json for all projects):

{
  "mcpServers": {
    "codemesh": {
      "command": "npx",
      "args": ["-y", "@pyalwin/codemesh"],
      "env": {
        "CODEMESH_PROJECT_ROOT": "${workspaceFolder}"
      }
    }
  }
}

Open Settings β†’ MCP, confirm codemesh is green, then mention it in a prompt (@codemesh how does auth work?) to nudge the agent toward graph queries instead of recursive Grep.

Windsurf / VS Code (Continue)

Add to ~/.continue/config.json under experimental.modelContextProtocolServers:

{
  "experimental": {
    "modelContextProtocolServers": [
      {
        "transport": {
          "type": "stdio",
          "command": "npx",
          "args": ["-y", "@pyalwin/codemesh"],
          "env": {
            "CODEMESH_PROJECT_ROOT": "/absolute/path/to/your/project"
          }
        }
      }
    ]
  }
}

Agent Write-Back: the graph that gets smarter

Every other code-intelligence tool indexes your repo once and hands the agent a read-only view. Codemesh lets the agent teach the graph as it works β€” summaries, workflows, and cross-concept links persist across sessions and survive re-indexing.

// Session 1 β€” agent reads unfamiliar code, then writes back what it learned.
codemesh_enrich({
  path: "pydantic/functional_validators.py",
  summary: "Primary V2 validator API. `@field_validator` wraps "
         + "`_decorators.FieldValidatorDecoratorInfo`; `mode='before'|'after'` "
         + "toggles pre/post-coercion execution. Extends BaseValidator.",
  concepts: ["validation", "decorators", "v2-api"]
})

// Session 1 β€” agent traces a multi-file flow, records the path.
codemesh_workflow({
  name: "pydantic field validation",
  description: "Request β†’ BaseModel.__init__ β†’ SchemaValidator β†’ field_validator",
  files: [
    "pydantic/main.py",
    "pydantic/_internal/_model_construction.py",
    "pydantic/functional_validators.py"
  ]
})

// Session 2 (days later) β€” same question, different agent instance.
codemesh_answer({ question: "How does pydantic validate fields?" })
// β†’ returns the enriched summary AND the 3-file workflow from Session 1
//   before the agent reads a single line. Zero rediscovery cost.

The graph now knows things no static analyzer could infer: why a file matters, which files move together, what a maintainer called a concept. Re-indexing rebuilds the structural layer (files, symbols, imports, calls) but preserves every enrichment β€” entries only go stale when their referenced files change.

See codemesh_enrich and codemesh_workflow under MCP Tools.


How It Works

                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚         Knowledge Graph           β”‚
                      β”‚                                   β”‚
                      β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                      β”‚  β”‚Structuralβ”‚ β”‚   Semantic     β”‚  β”‚
                      β”‚  β”‚  (auto)  β”‚ β”‚   (agents)    β”‚  β”‚
                      β”‚  β”‚          β”‚ β”‚               β”‚  β”‚
                      β”‚  β”‚ files    β”‚ β”‚ summaries     β”‚  β”‚
                      β”‚  β”‚ symbols  β”‚ β”‚ workflows     β”‚  β”‚
                      β”‚  β”‚ imports  β”‚ β”‚ concepts      β”‚  β”‚
                      β”‚  β”‚ calls    β”‚ β”‚ enrichments   β”‚  β”‚
                      β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                      β”‚                                   β”‚
                      β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                      β”‚  β”‚   Git    β”‚ β”‚   Search      β”‚  β”‚
                      β”‚  β”‚  Intel   β”‚ β”‚               β”‚  β”‚
                      β”‚  β”‚          β”‚ β”‚ FTS5 (exact)  β”‚  β”‚
                      β”‚  β”‚ hotspots β”‚ β”‚ Trigram (fuzzy)β”‚  β”‚
                      β”‚  β”‚ co-changeβ”‚ β”‚ LanceDB (sem) β”‚  β”‚
                      β”‚  β”‚ churn    β”‚ β”‚ PageRank      β”‚  β”‚
                      β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                      β”‚                                   β”‚
                      β”‚        SQLite + LanceDB           β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚    MCP Server / CLI (7 tools)      β”‚
                      β”‚                                    β”‚
                      β”‚  answer Β· explore Β· trace          β”‚
                      β”‚  enrich Β· workflow Β· status         β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Structural layer (automatic) β€” Tree-sitter parses your code into files, symbols (functions, classes, methods), and relationships (imports, calls, extends). Rebuilt on each index.

Semantic layer (agent-built) β€” As agents work with your code, they write back summaries and workflow paths. These survive re-indexing and accumulate across sessions. Invalidated when referenced files change.


MCP Tools

Tool Purpose Example
codemesh_answer One-call context assembly β€” returns all relevant files, call chains, hotspots, suggested reads codemesh_answer({ question: "How does auth work?" })
codemesh_explore Search, context (multi-target), impact analysis codemesh_explore({ action: "search", query: "auth" })
codemesh_trace Follow call chains with source code codemesh_trace({ symbol: "login", depth: 5 })
codemesh_enrich Write back what you learned for future sessions codemesh_enrich({ path: "src/auth.py", summary: "..." })
codemesh_workflow Record multi-file workflow paths codemesh_workflow({ name: "login flow", files: [...] })
codemesh_status Graph health check codemesh_status()

CLI

codemesh index                          # structural + git intel + pagerank
codemesh index --with-embeddings        # + semantic vectors (~80MB model, zero API cost)
codemesh status                         # graph statistics
codemesh rebuild                        # purge and re-index

codemesh explore answer "question"      # one-call context assembly (PRIMARY)
codemesh explore search "query"         # FTS5 + trigram + semantic search
codemesh explore context file1 file2    # multi-target context
codemesh explore trace symbol --depth 5 # follow call chains
codemesh explore semantic "query"       # vector similarity (needs embeddings)
codemesh explore impact file            # reverse dependencies

Optional: Hooks & Skills

Skill β€” teaches agents the graph-first workflow

Copy skills/codemesh.md to ~/.claude/skills/ or your project's .claude/skills/.

# Install the skill so Claude Code loads the workflow automatically
cp /path/to/codemesh/skills/codemesh.md /your/project/.claude/skills/

The skill instructs agents to query the graph before using Grep/Read, and to write back via codemesh_enrich after reading code.

Hooks β€” automatic pre-read context injection

Add to .claude/settings.json:

{
  "hooks": {
    "pre_tool_use": [{
      "matcher": "Read",
      "command": "/path/to/codemesh/hooks/pre-read.sh"
    }],
    "post_tool_use": [{
      "matcher": "Read",
      "command": "/path/to/codemesh/hooks/post-read.sh"
    }]
  }
}
  • Pre-read β€” Injects cached summaries before file reads
  • Post-read β€” Nudges the agent to enrich after reading unfamiliar files

Supported Languages

TypeScriptJavaScriptPythonGoRustJavaC#
RubyPHPCC++SwiftKotlinDart

Any language with a tree-sitter grammar can be added.


Graph Data Model

Nodes

Type Source Key Fields
file Static (tree-sitter) path, hash, last_indexed_at
symbol Static (tree-sitter) name, kind, file_path, line_start, line_end, signature
concept Agent-written summary, last_updated_by, stale
workflow Agent-written description, file_sequence, last_walked_at

Edges

Type Direction Source
contains file β†’ symbol Static
imports file β†’ file Static
calls symbol β†’ symbol Static
extends symbol β†’ symbol Static
describes concept β†’ file/symbol Agent
related_to concept β†’ concept Agent
traverses workflow β†’ file Agent

Architecture

codemesh/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ index.ts              # MCP server entry (stdio transport)
β”‚   β”œβ”€β”€ server.ts             # Tool registration (zod schemas)
β”‚   β”œβ”€β”€ graph/
β”‚   β”‚   β”œβ”€β”€ types.ts          # Node/edge type definitions
β”‚   β”‚   β”œβ”€β”€ storage.ts        # StorageBackend interface (swappable)
β”‚   β”‚   └── sqlite.ts         # SQLite + FTS5 implementation
β”‚   β”œβ”€β”€ indexer/
β”‚   β”‚   β”œβ”€β”€ indexer.ts        # File walking, hashing, incremental indexing
β”‚   β”‚   β”œβ”€β”€ parser.ts         # Tree-sitter AST extraction
β”‚   β”‚   └── languages.ts      # Language registry (ext β†’ grammar)
β”‚   β”œβ”€β”€ tools/                # 6 MCP tool handlers
β”‚   └── cli.ts                # CLI entry point
β”œβ”€β”€ skills/codemesh.md        # Agent education skill
β”œβ”€β”€ hooks/                    # Pre/post read hooks
└── eval/                     # Eval framework (5 tasks, 3 models)

Storage is backend-agnostic. The StorageBackend interface abstracts all persistence. v1 uses SQLite with FTS5 for zero-dependency local operation. The interface supports swapping to Memgraph, Neo4j, or other graph databases.


Eval Framework

Reproducible evaluation harness with LLM-as-judge scoring:

# Setup
npm install -g @pyalwin/codemesh
git clone --depth 1 https://github.com/Alamofire/Alamofire.git /tmp/alamofire
# ... clone other repos ...

# Index
CODEMESH_PROJECT_ROOT=/tmp/alamofire codemesh index

# Run benchmarks
python3 eval/head_to_head.py --model sonnet alamofire excalidraw vscode swift-compiler

See docs/benchmark-results.md for full methodology and results. Early pydantic evals are archived in docs/experiments/.


vs. Existing Tools

Feature CodeGraph Graphify Axon Codemesh
Structural indexing Yes Yes Yes Yes
FTS search Yes β€” Yes Yes
Agent write-back β€” β€” β€” Yes
Workflow memory β€” β€” β€” Yes
Hook interception β€” β€” β€” Yes
Backend-swappable β€” β€” β€” Yes
Eval framework β€” β€” β€” Yes
Published benchmarks β€” β€” β€” Yes

Development

bun install          # Install dependencies
bun run build        # Compile TypeScript
bun run test         # Run 102 tests
bun run dev          # Watch mode
bun run lint         # Type check

Contributing

Contributions welcome. Areas for improvement:

  • More languages β€” Add tree-sitter grammars and language-specific extractors
  • AST-diff invalidation β€” Function-level instead of file-level staleness detection
  • Graph backends β€” Memgraph/Neo4j adapters for StorageBackend
  • Semantic search β€” Embedding columns alongside FTS5
  • Agent adoption β€” Better patterns for agents to prefer graph tools naturally

License

MIT

Footnotes

  1. Swift Compiler's codemesh index failed to complete (indexer regression on 30k+ file codebases β€” see known issues). The codemesh numbers above reflect agent behavior with an empty retrieval graph, falling back to Read + LSP β€” still ahead of baseline, but unrepresentative of codemesh's capability on a properly-indexed Swift repo. ↩ ↩2 ↩3 ↩4

  2. Baseline for Alamofire hit a judge error (score recorded as 0 but not meaningful); excluded from the Baseline average. ↩

About

Intelligent code knowledge graph for AI coding agents. Reduces cost by up to 44%, speeds up exploration by up to 57%, and boosts smaller model quality to match Opus.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors