How CoreGraph runs: a background daemon holds an in-memory symbol graph per project, and thin clients (CLI, IDE via LSP, LLM agent via MCP, scripts via HTTP) talk to it over a local socket. This page is for users and contributors who want to understand the runtime without reading the source.
Clients (thin) Daemon (one process) On disk
───────────────── ────────────────────────── ─────────────────
coregraph <cmd> ──IPC──▶ Project Manager (LRU cache) .coregraph/
coregraph lsp ──IPC──▶ └─ per-project SymbolGraph snapshot.bin
coregraph mcp ──IPC──▶ (petgraph + indexes) config.toml
HTTP client ──TCP──▶ Edge evaluator (trust/confidence)
File watcher → invalidate → heal
- You run a command such as
coregraph query UserController. - The CLI tries to connect to the daemon's IPC socket. If nothing is listening (and auto-start is on), it spawns the daemon in the background and waits for the socket to come up, then forwards the request.
- The daemon loads the project's graph (from a
snapshot.binif present, otherwise by indexing the source tree), keeps it in memory, and answers. - Subsequent commands reuse the in-memory graph, so they return immediately.
- After a configurable idle period the daemon saves a snapshot, frees the graph, and eventually self-terminates. The next command restarts it.
Passing --no-auto-start (or setting COREGRAPH_NO_AUTO_START=1) prevents the
CLI from auto-spawning a daemon when none is listening; in that case the command
builds the graph in-process. If a daemon is already running, the command still
routes through it.
CoreGraph is a Cargo workspace. The crates, top to bottom:
| Crate | Role |
|---|---|
core |
Shared domain types: SymbolNode, DirectEdge, SymbolId, SymbolKind, EdgeKind, span/file-state types. Pure types, no I/O. |
extractor |
tree-sitter symbol extraction per language, plus config/doc/markdown extractors. Holds the .scm query files. |
stack |
stack-graphs integration: cross-file name resolution. Bundles hand-authored .tsg rules for Go, Rust, and Kotlin. |
manifest |
Manifest/dependency parsers (npm, Cargo, Gradle, Maven, Go modules, Python, Vite) used to scope packages. |
graph |
The in-memory symbol graph engine: the SymbolGraph itself, indexes, bloom filters, the edge evaluator (trust + confidence), epoch versioning, invalidation/healing, mediators, snapshot serialization, risk scoring. |
query |
Query and serialization: subgraph extraction, token budget, pagination, orphans, inconsistencies, impact, and the human/llm/json output writers. |
server |
The HTTP API only (axum): routes and handlers over a shared SymbolGraph. |
watcher |
File watching, debouncing, and git-aware diff (watch, diff). |
cli |
The binary. The thin-client logic, the daemon process, IPC, the project manager, the LSP/MCP bridges, and all subcommand handlers. |
A note on where things live, because it is easy to assume otherwise: the
daemon, IPC socket, project manager, and the LSP/MCP bridges all live in the
cli crate (crates/cli/src/{daemon.rs, ipc.rs, project_manager.rs, commands/lsp.rs, commands/mcp.rs}). The server crate is only the HTTP API.
Symbol extraction lives in extractor, not a separate parser crate.
crates/
├── core/src/ symbol.rs, edge.rs, span.rs, file_state.rs, graph.rs
├── extractor/src/ {rust,go,java,kotlin,python}_extractor.rs, typescript.rs,
│ │ javascript.rs, config_extractor.rs, doc_comment.rs,
│ │ markdown.rs, drift.rs, string_literal_extractor.rs
│ └── queries/ <lang>.scm (symbols) + <lang>-refs.scm (references)
├── stack/src/ resolver.rs, backend.rs
│ └── rules/ go.tsg, rust.tsg, kotlin.tsg (hand-authored)
├── manifest/src/ npm.rs, cargo_parser.rs, gradle.rs, maven.rs, go.rs,
│ python.rs, vite.rs, detector.rs, filter.rs
├── graph/src/ symbol_graph.rs, index.rs, bloom.rs, value_index.rs,
│ │ edge_evaluator.rs, epoch.rs, invalidation.rs,
│ │ healing.rs, risk.rs, snapshot.rs
│ └── mediator/ spring_di.rs, spring_config.rs, react_router.rs,
│ docker_compose.rs, go_di.rs
├── query/src/ impact.rs, orphans.rs, inconsistencies.rs, budget.rs,
│ │ paginate.rs, library.rs
│ └── serialize/ human.rs, llm.rs, json.rs
├── server/src/ routes.rs, handlers.rs, lib.rs (HTTP only)
├── watcher/src/ lib.rs, debounce.rs, git_diff.rs
└── cli/src/ main.rs, daemon.rs, ipc.rs, dispatch.rs,
│ project_manager.rs, graph_loader.rs
└── commands/ index, query, inspect, stats, orphans, impact, diff,
review, inconsistencies, export, snapshot, config,
server, lsp, mcp, watch_diff, batch, plugin
CoreGraph splits into a long-lived daemon that owns the graph and thin clients that just send requests. This keeps the expensive part — building the graph — out of every command's hot path.
When you run a command, the CLI:
- Tries to connect to the IPC socket. On Unix this is a filesystem socket (under the runtime dir, next to the pid-file); on Windows it is a named pipe.
- On success, forwards the request and prints the response.
- If nothing is listening and auto-start is enabled, it spawns the daemon
(
coregraph server start --foreground, detached withsetsidon Unix / its own process group on Windows), polls until the socket is ready, then forwards the request. - If auto-start is suppressed (
--no-auto-startorCOREGRAPH_NO_AUTO_START=1) and nothing is listening, it builds the graph in-process instead. Useful for CI, one-off scripts, or debugging. (If a daemon is already running, the request still routes through it.)
All clients speak the same small IPC protocol. The daemon dispatches six methods
backing public CLI commands — query, impact, orphans, inconsistencies,
stats, and diff_summary (backs coregraph diff) — plus status (backs
coregraph server status), reindex, and health. It also exposes
bridge/extension methods for the LSP/MCP bridges and the VSCode extension:
inspect, impact_batch, diff (rich per-file diff for the extension),
cross_lang, and the LSP definition/references/workspace-symbol routes. Both
the CLI and the bridges go through this one dispatch path, so the CLI, IDE, and
LLM agent always see the same graph and the same results.
One daemon serves many projects. The project manager
(crates/cli/src/project_manager.rs) keeps each project's graph behind its own
Arc<RwLock<SymbolGraph>>, so writes to one project never block reads of
another.
A project moves through three states:
UNLOADED ──(first query)──▶ LOADING ──(snapshot load / index)──▶ ACTIVE
▲ │ │
│ │ singleflight: only the first │
│ │ caller loads; others wait │
└────────(idle unload)─────┴──── quiesce + save snapshot ◀──────┘
- Singleflight loading — concurrent requests for the same project do not each rebuild it. The first caller performs the load; the rest wait on a shared gate and use the result.
- LRU eviction — when the number of loaded projects exceeds
server.max_loaded_projects(default 5), the least-recently-used project with no in-flight queries is evicted persist-then-free. Because a victim must haveactive_queries == 0, in-flight work (including request-scoped heals) finishes first; if the graph is dirty a final snapshot is written off-lock, a revive guard re-checks for late queries, then memory is freed. No watcher is stopped: the daemon runs a single file watcher for its default project only (other loaded projects are never watched), and that watcher survives eviction — a subsequent file change re-loads the evicted default project. - Staleness check on load — if the source tree changed since the cached graph was built, the entry is evicted and rebuilt before answering, so you never query a stale graph.
These lifecycle values live in ProjectManagerConfig, seeded at daemon start
from config (project-local over global); the defaults apply when the key is unset:
| Setting | Default | Config key / flag |
|---|---|---|
max_loaded |
5 | server.max_loaded_projects |
max_loaded_bytes |
0 (off) | server.max_loaded_bytes |
idle_unload |
10 min | server.idle_unload_minutes |
auto_stop |
30 min | server start --auto-stop-minutes <N> (CLI flag wins over config) |
| graceful drain | 30 s | server.graceful_shutdown_sec |
The idle timer resets on IPC query requests. File-watch events do not reset it — a quiet-but-watched project still unloads on schedule.
Eviction (idle unload, LRU, or byte-budget) is persist-then-free: a graph
modified since its last snapshot is written to .coregraph/snapshot.bin
(atomic temp-file rename) before its memory is released, and the snapshot
records the built_at time. The next LOADING for that project warm-loads
the snapshot — skipping tree-sitter extraction — and immediately validates it:
if any source file is newer than built_at, the snapshot is discarded and the
graph is rebuilt from source, so a warm load never serves stale data. A clean
(unmodified) graph is dropped without a redundant write.
When all projects are unloaded and auto_stop has elapsed, the daemon drains:
it stops accepting new project loads, waits briefly for in-flight requests to
finish, then exits. The next CLI command starts it again. Override the window
with coregraph server start --auto-stop-minutes <N> (0 disables self-stop;
default 30).
If you want the daemon always resident:
coregraph server install # register a launchd (macOS) / systemd (Linux) service
coregraph server uninstall # remove itCheck status with coregraph server status, and start/stop/restart manually
with the matching server subcommands.
The graph lives entirely in process memory; nothing round-trips to a database. That is deliberate — the core use case is feeding fresh context to an LLM (or an IDE) with minimal latency, and an N-hop subgraph walk over an in-memory graph is far faster than the equivalent SQL recursive query. The one weakness of an in-memory design, cold start, is covered by snapshots.
The SymbolGraph (crates/graph/src/symbol_graph.rs) is a
petgraph::StableGraph<SymbolNode, DirectEdge>. StableGraph keeps node
indices valid across deletions, so incremental healing can remove and re-add
symbols without invalidating references held elsewhere. A
HashMap<SymbolId, NodeIndex> maps domain ids to petgraph indices for O(1)
lookup.
Alongside the graph, several auxiliary indexes keep queries fast:
| Index | Shape | Purpose |
|---|---|---|
| Name index | HashMap<String, Vec<SymbolId>> |
Look up symbols by short name |
| Qualified index | HashMap<String, Vec<SymbolId>> |
Look up by fully-qualified name |
| Value index | HashMap<String, Vec<SymbolId>> |
Reverse-lookup string/enum values (powers cross-value inconsistency detection) |
| File blooms | per-file SymbolBloom |
O(1) "does this file define a symbol named X?" membership test |
| Evidence index | file → evidence set | Determines the blast radius of a file change for invalidation |
Every node is a SymbolNode (id, kind, name, qualified_name, file, span,
status, visibility, is_test) and every edge a DirectEdge (from, to, kind,
origin, confidence, evidence file; its trust model is derived from the edge
kind). The edge evaluator (edge_evaluator.rs) computes each edge's confidence
from its kind and origin and applies stale-evidence decay — see
confidence.md for the math.
Each project's graph sits behind an Arc<RwLock<SymbolGraph>> (in the daemon's
project entry, and in the HTTP server's shared state). Queries take the read
lock; healing and invalidation take the write lock and mutate the graph in
place. This in-place model is the reason the graph is a StableGraph: node
indices stay valid as symbols are removed and re-added during a heal, so nothing
referencing them goes invalid.
To tell readers whether what they got is current, the graph carries a monotonic
version, GraphEpoch(u64) (crates/graph/src/epoch.rs). Each
invalidation-and-heal cycle returns epoch.next(), so the epoch increases every
time the graph changes.
The watcher reports changed files. Using the evidence index, the graph finds
exactly which symbols and edges those files affect and invalidates only those —
re-extracting and re-resolving them (a heal) rather than rebuilding the whole
graph — and bumps the epoch. Queries can include stale results with
--include-stale; by default they are healed or filtered first.
File extraction is data-parallel: the extractor (crates/extractor) uses rayon
to spread per-file parsing across cores.
A snapshot is a bincode binary blob (current schema v6) at
<project>/.coregraph/snapshot.bin. The daemon writes one when it unloads a
project that changed since its last save, recording the build time, and
warm-loads it on the next request — skipping re-indexing unless a source file
is newer than the snapshot, in which case it rebuilds from source.
You can drive snapshots directly:
coregraph index --snapshot path/to/snapshot.bin # write a snapshot while indexing
coregraph snapshot save --out path/to/snapshot.bin
coregraph snapshot load path/to/snapshot.binSchema v2 removed the earlier SCIP promotion layer (later versions added the documentation layer); SCIP is no longer part of CoreGraph.
Two config files, both TOML:
- Project:
<project>/.coregraph/config.toml, created on first index. - Global:
$XDG_CONFIG_HOME/coregraph/config.toml.
The runtime loads exactly one config file: the --config <PATH> if given,
otherwise the global config at $XDG_CONFIG_HOME/coregraph/config.toml.
coregraph config show displays the merged view of the global config and the
project-local .coregraph/config.toml (project values override global), with
the source file of each key:
Global config: ~/Library/Application Support/coregraph/config.toml
Project config: ./.coregraph/config.toml
limits.token_budget = 8000 [project]
# Default token budget for LLM output
limits.hop_limit = 3 [project]
# Default graph traversal depth
limits.min_confidence = 0.7 [project]
# Default minimum edge confidence (matches clap default)
server.max_loaded_projects = 5 [project]
# Maximum projects held in the daemon cache (LRU eviction above this)
server.graceful_shutdown_sec = 30 [project]
# Seconds the daemon waits for in-flight queries before hard-exit on SIGTERM
The HTTP listener has no config key — pass an address inline:
coregraph server start --http 127.0.0.1:27787 (bare --http defaults to
127.0.0.1:27787; add --allow-external to bind a non-localhost address).
See also: confidence.md for the trust/confidence model and graph-model.md for symbol kinds, edge kinds, and analysis origins.