CoreGraph builds an in-memory code symbol graph for multi-language and monorepo codebases. It combines tree-sitter (symbol extraction) with stack-graphs (cross-file name resolution), serves the graph from a background daemon, and answers questions a single-file search cannot — who calls this, what breaks if I change it, what is dead, and where do two languages disagree about the same value.
Ask "who calls compute_impact?" and you get the callers, not text matches:
$ coregraph query compute_impact --direction incoming --edge-kind calls --hop-limit 1
── query: compute_impact ──────────────────────────────────
✓ compute_impact [crates/query/src/impact.rs:27]
kind: Function | package: query (cargo)
Incoming (14):
├── calls ← run [Function] @ crates/cli/src/commands/diff.rs [0.85] ✓
├── calls ← run [Function] @ crates/cli/src/commands/impact.rs [0.85] ✓
├── calls ← cached_impact [Function] @ crates/cli/src/dispatch.rs [0.85] ✓
├── calls ← api_impact [Function] @ crates/server/src/handlers.rs [0.85] ✓
└── ... (14 total)
✓ trust: all paths verified
── page 1/1 | 14 edges total | budget: 506/5600 tokens ──
Each edge carries a confidence score ([0.85]) and a trust mark (✓), so an
LLM — or a human — knows how much weight to give the relationship. That trust
tagging is the core idea; everything else builds on it.
Existing tools each miss something:
| Tool | Limitation |
|---|---|
grep / regex |
No syntax awareness. Can't follow renames, re-exports, or dynamic dispatch; matches text, not symbols. |
ctags |
Indexes definition locations only. No reference edges, no value-level tracking. |
| Single-language LSP | Sees one language at a time. Can't connect cross-language references or wire config to code. |
CoreGraph pairs tree-sitter parsing with stack-graphs name resolution to build a project-wide graph that spans languages, then exposes queries tuned for feeding context to an LLM.
CoreGraph is deliberately not a compiler.
- A compiler's consumer is a machine, and a 0.01% error fails the build.
- CoreGraph's consumer is an LLM (or a person), which reasons usefully from partial information.
- Dropping a real relationship because it couldn't be proven is more harmful than surfacing it with a lower confidence score.
So CoreGraph prefers trust-tagging plus fast answers over compiler-grade certainty. Every edge is annotated with how it was derived and how much to trust it (see Confidence), instead of being silently discarded.
| Capability | What it does | Try |
|---|---|---|
| Dead-code detection | Lists public symbols with no semantic edges in either direction (no callers and no callees) — accounting for renames, re-exports, and dynamic dispatch. Structural edges (Contains/BelongsTo/Documents/DescribedIn) are excluded from this count. | coregraph orphans |
| Cross-language linking | Connects code across language boundaries through dedicated mediators (Spring DI/config, React Router, Docker Compose, Go DI) and shared API paths (ApiPathMatch edges). |
coregraph query <symbol> |
| Config ↔ code consistency | Relates config keys (application.yml, docker-compose.yml, …) to the code that reads them. |
coregraph inconsistencies --category config-key |
| Enum / value mismatches | Flags the same logical value spelled differently across languages (e.g. a Java enum constant vs. a TypeScript string). | coregraph inconsistencies --category enum-mismatch |
| Impact analysis | Computes the transitive set a change can reach, with optional risk scoring and affected tests. | coregraph impact <symbol> --risk |
| LLM context shaping | Extracts an N-hop subgraph around a symbol and serializes it to fit a token budget. | coregraph query <symbol> --depth 2 |
CoreGraph also tracks documentation: /// / /** */ doc comments and Markdown
sections become nodes, so inconsistencies --category doc-drift can catch a
@param that names an argument the signature no longer has. Doc-drift detection
currently covers JS/TS/Java/Python @param/:param conventions only; Rust
rustdoc and Go doc comments are not yet checked.
All seven code languages — Rust, Java, Kotlin, TypeScript, JavaScript, Go, Python — have stack-graphs name-resolution rules (Java/TS/JS/Python from upstream, Go/Rust/Kotlin hand-authored here), so cross-file resolution works the same way across the stack.
- Evidence — the source file that justifies an edge. A
callsedge's evidence is the file containing the call site. When that file changes, the edge becomes stale. - Stale — a symbol or edge is stale when its source file changed and the extracted data is no longer current.
- Healing — re-parsing stale files to refresh the graph. CoreGraph heals
on-demand at query time: before answering, it re-extracts every
content-hash-changed file known to the project graph (within a time budget),
not only the files the query touches. Pass
--no-healto skip it and read the graph as-is. - Epoch — a monotonic version number bumped after each
invalidate-and-heal cycle. It keys the query cache and signals staleness; it is
not an atomically swapped immutable graph version. The graph itself sits behind
an
RwLock: queries take the read lock, while healing and invalidation take the write lock and mutate the graph in place. See Architecture for the concurrency model. - Server (daemon) — a single background process that holds the in-memory graph for one or more projects. It serves the IPC socket and the HTTP API, and backs the LSP and MCP stdio bridges (which reuse its in-memory graph when running).
- Client (thin client) — the CLI. It forwards queries to the daemon over an
IPC socket and auto-starts the daemon on first use (or runs in-process with
--no-auto-start). - ACTIVE / LOADING — a project's load state in
server status. ACTIVE means its graph is in memory; LOADING means it is being loaded. When a project sits idle, its graph is snapshotted to disk (if dirty) and dropped from memory, and the entry is removed from the daemon entirely — so an idle project simply disappears from the listing rather than appearing as a separate state. See Architecture.