Skip to content

Commit 037b769

Browse files
cdeustclaude
andcommitted
fix(ingest): self-healing graph memo + commit-triggered re-analyze
Make codebase-graph freshness trustable instead of periodic-and-stale. 1. Self-heal the cached graph_path memo. find_cached_graph returns a memoised path only when the graph still exists on disk (non-empty dir/file), preferring the most-recent such memo; otherwise None so the caller re-analyses. Previously it returned the first tag match unconditionally, so a memo outliving its deleted graph made ingest_codebase silently project an empty graph. One fix heals all three consumers (ensure_graph, pipeline_impact_bump, SessionStart bg worker). Dijkstra audit: the pointer is not the thing. 2. Make a commit the freshness signal. New PostToolUse hook post_commit_reindex spawns the detached background re-analyze worker (ingest_codebase_background --reindex) when a git commit touches indexable source. Closes the gap the 24h SessionStart TTL leaves open within a working session. Gated on changed source, detached (never blocks the commit), 120s coalesce cooldown (also serialises writes to the same graph dir). Two-speed model: facts fresh per commit, consolidation/clustering on the existing TTL cycles (perception continuous, consolidation in sleep). Honest scope: the upstream analyzer (ai-automatised-pipeline index_codebase, src/indexer/mod.rs) re-parses the whole tree -- it has NO per-file incremental skip. "Incremental" here means trigger-only-on-change, not reparse-only-the-changed-file. Full re-parse is seconds for harness-sized repos and runs detached. Extension gate sourced from AP src/parser/mod.rs Language::from_extension plus the .js light-link pass. Tests: self-heal memo (existence + recency + dead-path skip), handler-level re-analyze on dead memo, hook gating (non-commit/failed/docs-only/cooldown/ spawn), --reindex flag threading. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent e2e774b commit 037b769

8 files changed

Lines changed: 713 additions & 24 deletions

File tree

.claude-plugin/plugin.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,11 @@
6262
"type": "command",
6363
"command": "bash -c 'PY=$(command -v python3 || command -v python) && ROOT=\"${CLAUDE_PLUGIN_ROOT:-$PWD}\" && \"$PY\" \"$ROOT/scripts/launcher.py\" mcp_server.hooks.pipeline_impact_bump'",
6464
"timeout": 5
65+
},
66+
{
67+
"type": "command",
68+
"command": "bash -c 'PY=$(command -v python3 || command -v python) && ROOT=\"${CLAUDE_PLUGIN_ROOT:-$PWD}\" && \"$PY\" \"$ROOT/scripts/launcher.py\" mcp_server.hooks.post_commit_reindex'",
69+
"timeout": 10
6570
}
6671
]
6772
}

mcp_server/handlers/ingest_helpers.py

Lines changed: 85 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -36,27 +36,99 @@ def code_graph_tag(project_path: str) -> str:
3636
return f"{CODE_GRAPH_TAG_PREFIX}{project_key(project_path)}"
3737

3838

39-
def find_cached_graph(store, project_path: str) -> str | None:
40-
"""Return the cached graph_path for a project, or None if not cached.
39+
def graph_path_is_materialised(graph_path: str | None) -> bool:
40+
"""True when ``graph_path`` points at a graph that still exists on disk.
41+
42+
AP writes ``<output_dir>/graph`` as a LadybugDB directory; pre-3.14
43+
builds wrote a single file at the same slot. Either form counts as
44+
valid only when **non-empty** — an existing-but-empty directory is a
45+
half-built or wiped graph, which must read as a cache miss so the
46+
caller re-analyses rather than silently projecting zero symbols.
47+
48+
source: ingest staleness bug Jun-2026 — a memo outlived its graph
49+
(the graph directory was deleted) and ``find_cached_graph`` handed the
50+
dead path straight back to ``ensure_graph``, which then projected an
51+
empty graph. A memo must never outlive the artefact it points at
52+
(Dijkstra audit: the pointer is not the thing).
53+
"""
54+
if not graph_path:
55+
return False
56+
try:
57+
p = Path(graph_path).expanduser()
58+
if not p.exists():
59+
return False
60+
if p.is_dir():
61+
return any(p.iterdir())
62+
return p.stat().st_size > 0
63+
except OSError:
64+
return False
65+
66+
67+
def _memo_tags(mem: dict) -> list:
68+
"""Tags of a memory row as a list (rows may store JSON-encoded tags)."""
69+
raw = mem.get("tags", [])
70+
if isinstance(raw, str):
71+
try:
72+
raw = json.loads(raw)
73+
except (ValueError, TypeError):
74+
return []
75+
return raw if isinstance(raw, list) else []
76+
4177

42-
Reads the most-recent memory tagged with the project's code-graph tag.
78+
def _memo_recency_key(mem: dict) -> str:
79+
"""Sortable recency key for a memo row (most-recent sorts highest).
80+
81+
Prefers ``created_at`` (when the path was memoised); falls back to
82+
``heat_base_set_at`` then ``last_accessed``. Datetimes are
83+
ISO-normalised; a missing value sorts oldest. ISO-8601 strings sort
84+
lexicographically in chronological order, so plain string comparison
85+
is correct here.
86+
"""
87+
for field in ("created_at", "heat_base_set_at", "last_accessed"):
88+
val = mem.get(field)
89+
if val is None or val == "":
90+
continue
91+
iso = getattr(val, "isoformat", None)
92+
return iso() if callable(iso) else str(val)
93+
return ""
94+
95+
96+
def find_cached_graph(store, project_path: str) -> str | None:
97+
"""Return the cached graph_path for a project, or None.
98+
99+
Returns the path from the MOST-RECENT memo tagged with the project's
100+
code-graph tag whose graph **still exists on disk**. A memo whose
101+
graph was deleted (path missing or empty) is skipped, never returned
102+
— this is the self-heal: a stale memo can no longer make the caller
103+
project an empty graph. When no live graph is found, returns None so
104+
the caller re-analyses and re-memoises.
105+
106+
source: ingest staleness bug Jun-2026 (Dijkstra audit). Previously
107+
this returned the first tag match unconditionally, with no existence
108+
check and no recency ordering.
43109
"""
44110
tag = code_graph_tag(project_path)
45111
try:
46112
mems = store.get_all_memories_for_decay()
47113
except Exception:
48114
return None
115+
116+
candidates: list[tuple[str, str]] = [] # (recency_key, graph_path)
49117
for mem in mems:
50-
raw_tags = mem.get("tags", [])
51-
if isinstance(raw_tags, str):
52-
try:
53-
raw_tags = json.loads(raw_tags)
54-
except (ValueError, TypeError):
55-
raw_tags = []
56-
if tag in raw_tags:
57-
content = mem.get("content") or ""
58-
if content.startswith("graph_path="):
59-
return content[len("graph_path=") :].strip()
118+
if tag not in _memo_tags(mem):
119+
continue
120+
content = mem.get("content") or ""
121+
if not content.startswith("graph_path="):
122+
continue
123+
path = content[len("graph_path=") :].strip()
124+
if path:
125+
candidates.append((_memo_recency_key(mem), path))
126+
127+
# Most-recent first; return the first whose graph is materialised.
128+
candidates.sort(key=lambda c: c[0], reverse=True)
129+
for _, path in candidates:
130+
if graph_path_is_materialised(path):
131+
return path
60132
return None
61133

62134

mcp_server/hooks/ingest_codebase_background.py

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,20 @@
11
"""Background worker that invokes ``ingest_codebase`` for a project.
22
3-
Spawned by the SessionStart hook when the cached graph is stale or
4-
missing. Runs detached from the parent process so SessionStart returns
5-
immediately.
3+
Spawned by two triggers, both detached so the parent returns immediately:
4+
* the SessionStart hook, when the cached graph is missing or older than
5+
the TTL (``pipeline_graph_ttl.graph_is_stale``); and
6+
* the PostToolUse ``post_commit_reindex`` hook, after a commit that
7+
touched indexable source — there it passes ``--reindex`` because the
8+
commit IS the change signal, so the graph must be rebuilt even though
9+
a cached one exists.
610
711
Invocation:
812
python -m mcp_server.hooks.ingest_codebase_background /path/to/project
13+
python -m mcp_server.hooks.ingest_codebase_background /path/to/project --reindex
14+
15+
Without ``--reindex`` the handler reuses a fresh cached graph and only
16+
re-analyses when the cache is stale/absent (identical to interactive
17+
use). With ``--reindex`` it forces ``analyze_codebase`` to run.
918
1019
Exit code:
1120
* 0 on success
@@ -34,6 +43,7 @@ def main() -> None:
3443
sys.exit(2)
3544

3645
project_root = sys.argv[1]
46+
force_reindex = "--reindex" in sys.argv[2:]
3747

3848
# Lazy import so Claude Code hooks can fire even if core deps are
3949
# still installing on first session.
@@ -43,10 +53,12 @@ def main() -> None:
4353
print(f"[bg-ingest] ingest_codebase import failed: {exc}", file=sys.stderr)
4454
sys.exit(1)
4555

56+
# Without --reindex: reuse a fresh cached graph, auto-reindex when
57+
# stale (SessionStart trigger). With --reindex: force analyze_codebase
58+
# because a commit already told us the source changed (commit trigger).
4659
args: dict[str, Any] = {
4760
"project_path": project_root,
48-
# Don't force reindex — handler picks up cached graph when fresh
49-
# and auto-reindexes when stale. Identical to interactive use.
61+
"force_reindex": force_reindex,
5062
}
5163

5264
try:
@@ -60,7 +72,8 @@ def main() -> None:
6072
sys.exit(1)
6173

6274
counts = {k: v for k, v in (result or {}).items() if isinstance(v, (int, float))}
63-
print(f"[bg-ingest] ingest_codebase ok: {counts}")
75+
mode = "reindex" if force_reindex else "cached-or-stale"
76+
print(f"[bg-ingest] ingest_codebase ok ({mode}): {counts}")
6477
sys.exit(0)
6578

6679

0 commit comments

Comments
 (0)