A graph-based RAG system that parses multi-language codebases with Tree-sitter, builds knowledge graphs in Memgraph, and enables natural language querying, editing, and optimization.
pip install code-graph-ragWith all Tree-sitter grammars (Python, JS, TS, Rust, Go, Java, Scala, C++, Lua):
pip install 'code-graph-rag[treesitter-full]'With semantic code search (UniXcoder embeddings):
pip install 'code-graph-rag[semantic]'- Python 3.12+
- Docker (for Memgraph)
cmake(for building pymgclient)ripgrep(rg) (for shell command text searching)
The package installs a cgr command.
Start Memgraph, parse a repo, and query it:
docker compose up -d # start Memgraph
cgr start --repo-path ./my-project \
--update-graph --clean # parse & launch interactive chatIndex to protobuf for offline use:
cgr index -o ./index-output --repo-path ./my-projectExport knowledge graph to JSON:
cgr export -o graph.jsonAI-guided optimization:
cgr optimize python --repo-path ./my-projectRun as an MCP server (for Claude Code):
cgr mcp-serverCheck your setup:
cgr doctorThe cgr package provides short imports for programmatic use.
from cgr import load_graph
graph = load_graph("graph.json")
print(graph.summary())
functions = graph.find_nodes_by_label("Function")
for fn in functions[:5]:
rels = graph.get_relationships_for_node(fn.node_id)
print(f"{fn.properties['name']}: {len(rels)} relationships")from cgr import MemgraphIngestor
with MemgraphIngestor(host="localhost", port=7687) as db:
rows = db.fetch_all("MATCH (f:Function) RETURN f.name LIMIT 10")
for row in rows:
print(row)import asyncio
from cgr import CypherGenerator
async def main():
gen = CypherGenerator()
cypher = await gen.generate("Find all classes that inherit from BaseModel")
print(cypher)
asyncio.run(main())Requires the semantic extra.
from cgr import embed_code
embedding = embed_code("def authenticate(user, password): ...")
print(f"Embedding dimension: {len(embedding)}")from cgr import settings
settings.set_orchestrator("openai", "gpt-4o", api_key="sk-...")
settings.set_cypher("google", "gemini-2.5-flash", api_key="your-key")Configure via .env or environment variables:
| Variable | Default | Description |
|---|---|---|
MEMGRAPH_HOST |
localhost |
Memgraph hostname |
MEMGRAPH_PORT |
7687 |
Memgraph port |
ORCHESTRATOR_PROVIDER |
Provider: google, openai, ollama |
|
ORCHESTRATOR_MODEL |
Model ID (e.g. gpt-4o, gemini-2.5-pro) |
|
ORCHESTRATOR_API_KEY |
API key for the provider (not needed for ollama) |
|
CYPHER_PROVIDER |
Provider for Cypher generation | |
CYPHER_MODEL |
Model ID for Cypher generation (e.g. codellama, gpt-4o-mini) |
|
CYPHER_API_KEY |
API key for Cypher provider (not needed for ollama) |
|
TARGET_REPO_PATH |
. |
Default repository path |
Full documentation, architecture details, and contribution guide: docs.code-graph-rag.com
MIT