Skip to content

Commit 66ee97b

Browse files
committed
docs: gap analysis v2 — CORRECTED after reading AP source + web search
v1 (docs/program/v3.14-gap-analysis-codebase-as-core.md) recommended "pull AP into Cortex" based on inference from the Cortex-side bridge, not inspection of AP itself. Aristotle's four-causes interrogation against direct inspection of automatised-pipeline exposed the category error: AP is not a parser — it is a 12,046-LOC Rust MCP server with LadybugDB, Tantivy BM25, 5-layer resolver (incl. LSP), Louvain+Traag clustering, Tarjan-SCC semantic diff, PRD validator, security gates, macro expansion, and stdlib indexing. Merging it into Cortex would dissolve the essential differentia: AP = immutable structural truth, Cortex = decaying behavioral belief. v2 retracts the Coase "pull in" moves. What holds: - Alexander's "two analyzers, no shared port" observation (Gap 1) - Von Neumann's behavioral-overlay idea — heat on FILE/SYMBOL nodes, but on Cortex's projection (entities table), NOT on AP's store - §4.1 violation of workflow_graph_source_ast.py (634 LOC) Real gaps, all addressable without rewriting AP: - Gap 1 (structural): retire in-house AST as fallback-only when AP enabled - Gap 2 (integrative): wire AP get_impact/detect_changes → neuromodulation surprise signal; Phase 4 of ADR-0046 never shipped to the memory side - Gap 3 (graceful degradation): ADR promises degradation, code delivers silence — add explanatory UI pill ADR-0046 is validated, not superseded. 4-of-5 Aristotelian axes pass; the ADR's architectural intent (sibling MCP, bridge-consumed, Cortex never writes there) follows directly from the differentia and is correct. GitNexus (github.com/abhigyanpatwari/GitNexus) web-searched: separate TypeScript product, similar structural thesis (tree-sitter → knowledge graph → MCP). Not a dependency of Cortex or AP. Validates direction: code-structure intelligence belongs as its own MCP server. 5 corrected moves, ordered by risk. v1 kept on disk as record of why reading the dependent repo matters before architecturing.
1 parent fa98cad commit 66ee97b

1 file changed

Lines changed: 266 additions & 0 deletions

File tree

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
# Gap Analysis v2 (CORRECTED) — Cortex ↔ automatised-pipeline
2+
3+
**Date:** 2026-04-24
4+
**Supersedes:** `docs/program/v3.14-gap-analysis-codebase-as-core.md` (v1)
5+
**Method evolution:** v1 dispatched Alexander / Coase / Von Neumann without
6+
reading the AP repository. Coase's "pull AP into Cortex" recommendation
7+
was based on *inference from the Cortex-side bridge* rather than AP's
8+
actual source. This v2 corrects the record using direct inspection of
9+
AP's 12k-LOC Rust codebase, a re-read of ADR-0046, and Aristotle's
10+
four-causes interrogation.
11+
12+
---
13+
14+
## 1. What was materially wrong in v1
15+
16+
The v1 Coase verdict — "pull `index_codebase` / `query_graph` / `get_impact`
17+
into Cortex; only 1200 LOC of bridge and 4-of-5 capabilities already
18+
in-core" — was wrong because the evidence was inferred, not inspected.
19+
20+
Direct inspection of `/Users/cdeust/Developments/anthropic/ai-automatised-pipeline`:
21+
22+
| Evidence | v1 assumed | Reality |
23+
|---|---|---|
24+
| AP's scope | "mostly tree-sitter" | **12,046 LOC Rust, 23 MCP tools, 220 tests** (`README.md:10-14`) |
25+
| Graph store | "Kuzu-like, skippable" | **LadybugDB via `lbug 0.15`** — in-process property graph with Cypher dialect (`graph_store.rs:1103 LOC`, 15+ typed node kinds, 7+ typed edge kinds) |
26+
| Search | "BM25 only, specialist" | **Tantivy BM25 + TF-IDF + RRF fusion** (`src/search/`: bm25.rs, rrf.rs, vector.rs) |
27+
| Resolution | "just import resolution" | **5-layer resolver**`resolver.rs` (882) + `resolver_layers.rs` + `lsp_resolver.rs` (444) + `lsp_client.rs` (728) — including LSP |
28+
| Clustering | "networkx Louvain" | **Rust Louvain + Traag C2 repair pass** (`clustering.rs:867 LOC`, Blondel 2008 + Traag 2019) |
29+
| Diff analysis | "git diff → symbols, simple" | **`git_diff.rs` (570) + `semantic_diff.rs` (673)** with Tarjan-SCC regression detection |
30+
| Additional | not mentioned | **`prd_validator.rs` (605)** symbol hallucination + scope check; **`security_gates.rs` (534)** auth/unsafe/public API; **`macro_expansion/`** (Rust/Python/TS); **`stdlib_index/`** |
31+
32+
Verdict: Coase's "pull in" would require re-writing ~7,000 LOC of
33+
specialized Rust functionality in Python. Not a refactor — a rewrite
34+
of a sibling project. **Retracted.**
35+
36+
---
37+
38+
## 2. The essential differentia (Aristotle)
39+
40+
**Genus (both):** MCP servers exposing a graph of development context to
41+
Claude via JSON-RPC.
42+
43+
**Differentia:** the *truth semantics of their nodes*.
44+
45+
- **AP = structural-truth graph.** A `Function` node exists iff the
46+
parser saw it in the file at index time. Re-indexing discards the
47+
previous graph. No heat, no decay, no reconsolidation, no confidence
48+
— only *is-this-in-the-AST-right-now*. `confidence=1.0` by
49+
construction on every `defined_in` / `calls` / `imports` edge
50+
(`workflow_graph_schema.py:193`).
51+
52+
- **Cortex = belief graph with thermodynamic decay.** A memory has heat
53+
(`thermodynamics.py`), stage (`cascade.py`), reconsolidation history
54+
(`reconsolidation.py`), predictive-coding gate survival
55+
(`hierarchical_predictive_coding.py`). Same content stored twice may
56+
merge, strengthen, or be rejected.
57+
58+
**Consequence:** the two stores must NOT be merged. A decaying AST is
59+
a bug; an immutable belief is amnesia. ADR-0046's "Cortex never writes
60+
there" (`ADR-0046:149`) follows directly from this differentia and is
61+
correct.
62+
63+
---
64+
65+
## 3. What held from v1
66+
67+
Three findings from v1 survive intact:
68+
69+
1. **Alexander's "no strong center"** — Cortex has two parallel code
70+
analyzers: in-core tree-sitter (`core/codebase_graph.py`,
71+
`ast_parser.py`, `ast_extractors*.py`) AND the AP bridge. No shared
72+
port. This is REAL and unchanged — see Gap 1 below.
73+
74+
2. **Von Neumann's behavioral-overlay proposal** — add `heat`,
75+
`last_touched`, `co_edited` to FILE/SYMBOL nodes; reuse memory
76+
thermodynamics. This is CORRECT regardless of whether AP lives
77+
in-process or external — the overlay is on Cortex's *own*
78+
workflow-graph projection of AP's data, not on AP's store. See
79+
Gap 2 below.
80+
81+
3. **`workflow_graph_source_ast.py` is 634 LOC** — violates the §4.1
82+
size rule. Still needs a behavior-preserving split.
83+
84+
---
85+
86+
## 4. Corrected gaps (after reading AP's source)
87+
88+
### Gap 1 — Structural: in-house AST in Cortex duplicates what AP owns (when AP is enabled).
89+
90+
**Evidence.** `mcp_server/core/codebase_graph.py` (272 LOC),
91+
`codebase_parser.py` (147), `codebase_type_resolver.py` (137),
92+
`codebase_extractors.py` (245), `ast_parser.py` (226),
93+
`ast_extractors.py` (224), `ast_extractors_extra.py` (198) — all
94+
in-house tree-sitter. `workflow_graph_builder.py` uses them. When
95+
`CORTEX_ENABLE_AP=1`, AP's deeper AST (5-layer resolver + LSP + macro
96+
expansion + stdlib) should be authoritative, but the in-house path
97+
still runs.
98+
99+
**Root cause.** ADR-0046 implemented the schema additions
100+
(NodeKind.SYMBOL, EdgeKind.CALLS/IMPORTS/DEFINED_IN/MEMBER_OF) and
101+
the bridge, but never retired the in-house AST. The result is two
102+
AST truths with no precedence.
103+
104+
**Smallest fix.** Mark `codebase_graph.py` + `ast_parser.py` +
105+
`ast_extractors*.py` as explicit *fallback only when AP is disabled*.
106+
Add a guard at module import:
107+
108+
```python
109+
# core/codebase_graph.py:1
110+
# Fallback AST used only when CORTEX_ENABLE_AP=0. When AP is
111+
# enabled, workflow_graph_source_ast.py is the authoritative
112+
# source — see ADR-0046.
113+
```
114+
115+
Then in `workflow_graph_builder.py`, prefer `workflow_graph_source_ast.load_symbols()`
116+
over `codebase_graph.resolve_all_imports()` when `ap_bridge.is_enabled()`.
117+
118+
**Note:** this REVERSES v1 move 4 (retire `ingest_codebase.py` in favor
119+
of `codebase_analyze.py`). The correct direction is the opposite:
120+
`codebase_analyze.py` becomes the fallback; `ingest_codebase.py` (AP
121+
path) is primary when AP is present.
122+
123+
### Gap 2 — Integrative: AP's efficient-cause signals never reach Cortex's memory subsystems.
124+
125+
**Evidence.** `ap_bridge.py` exposes `get_impact`, `detect_changes`,
126+
`get_context`. Consumers:
127+
128+
| Consumer | Uses AP output? |
129+
|---|---|
130+
| `core/memory_ingest.py` | No |
131+
| `core/consolidation_engine.py` | No |
132+
| `core/spreading_activation.py` | No |
133+
| `core/coupled_neuromodulation.py` | No |
134+
| `core/synaptic_plasticity.py` | No |
135+
| `handlers/consolidation/plasticity.py` | No — uses substring co-access (v1 gap G3 still holds) |
136+
| `handlers/consolidation/memify.py` | No |
137+
| `hooks/pipeline_impact_bump.py` | **Yes** — calls `detect_changes` + ILIKE sweep on memory content |
138+
| `workflow_graph_source_ast.py` | Yes — projects into visualization only |
139+
140+
AP produces a rich structural signal (`get_impact` returns blast
141+
radius across communities and processes); Cortex's neuromodulation
142+
(`coupled_neuromodulation.py`, DA/NE/ACh/5-HT cascade) takes
143+
"surprise" as input. A high-blast-radius edit IS surprise. The
144+
signal exists; the wire doesn't.
145+
146+
**Root cause.** Phase 4 of ADR-0046 ("change-impact annotation via
147+
post_tool_capture hook") is listed last and was never wired to the
148+
memory side — only to the visualization pulse.
149+
150+
**Smallest fix.** In `hooks/post_tool_capture.py`, after a code edit:
151+
1. Call `ap_bridge.detect_changes()` + `get_impact()`.
152+
2. Map blast-radius score → surprise modulator in `core/coupled_neuromodulation.py`.
153+
3. This raises DA/NE transiently → stronger encoding of the memory
154+
the agent just wrote about the edit.
155+
156+
New pure module: `core/ap_impact_to_surprise.py` — takes
157+
`{affected_symbols, community_count, process_count}`, returns
158+
`surprise ∈ [0, 1]`. No infra dependency; testable in isolation.
159+
160+
**This is where Von Neumann's behavioral-overlay idea lives.** The
161+
SYMBOL nodes get `heat` on the Cortex side (the projection), not on
162+
AP's side (immutable). AP stays read-only. The heat is *Cortex's
163+
memory of which symbols the agent cared about*, written into the
164+
PostgreSQL `entities`/`relationships` tables that already carry heat.
165+
166+
### Gap 3 — Graceful degradation: ADR promises it, code delivers silence.
167+
168+
**Evidence.** When AP is absent:
169+
- `ap_bridge.py:165` `connect()` returns False silently.
170+
- `workflow_graph_source_ast.py` returns `[]` from every loader.
171+
- Wiki-verify button, unified search, impact pulse all no-op with no UI message.
172+
- `ap_bridge.unavailable_reason` exists (line 158) but no caller surfaces it.
173+
174+
**Root cause.** Degradation was implemented as "return empty", not
175+
"return empty + explain why".
176+
177+
**Smallest fix.** New handler `mcp_server/handlers/get_ap_status.py`
178+
wrapping `APBridge().unavailable_reason`; UI renders a pill: "AST depth
179+
unavailable — install `automatised-pipeline` to enable symbol-level
180+
view." One handler, one pill. Respects the ADR promise.
181+
182+
---
183+
184+
## 5. On GitNexus (user's request — web search)
185+
186+
Web search surfaced [github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus).
187+
It is a **separate TypeScript/JavaScript product**, browser-based,
188+
client-side knowledge graph creator with an MCP server
189+
(`npx gitnexus analyze`). Architecture: tree-sitter → knowledge graph →
190+
Graph RAG agent → MCP tools. Features include Multi-File Rename,
191+
Git-Diff Impact Analysis, Process-Grouped Search, 360° Context, Claude
192+
Code Hooks, Multi-Repo MCP, Zero-Config Setup, 9+ language support.
193+
194+
ADR-0046:106 mentions "Multi-repo / workgroup operations (GitNexus
195+
`group_*`)" — a single reference suggesting AP may borrow the
196+
multi-repo concept. GitNexus is **NOT a dependency of Cortex or AP**;
197+
it is a structural cousin (same genus: MCP + code graph; similar
198+
differentia: structural-truth graph via tree-sitter).
199+
200+
**Implication for the gap analysis:** none direct. But GitNexus being
201+
a successful shipped product with the same structural thesis as AP
202+
validates the architectural direction — code-structure intelligence
203+
lives as its own MCP server, not merged into a memory/belief system.
204+
205+
---
206+
207+
## 6. Corrected move sequence (5 ordered moves)
208+
209+
v1 proposed 7 moves, most of which would dissolve AP into Cortex.
210+
The corrected sequence is 5 moves, none of which require rewriting
211+
AP functionality.
212+
213+
| # | Move | Gap | File |
214+
|---|---|---|---|
215+
| 1 | Guard in-house AST: add "fallback-only" docstring + precedence logic that prefers `workflow_graph_source_ast` when `ap_bridge.is_enabled()` | G1 | `core/codebase_graph.py`, `core/ast_parser.py`, `core/workflow_graph_builder.py` |
216+
| 2 | New module `core/ap_impact_to_surprise.py` — pure, maps AP blast radius → neuromodulation surprise signal | G2 | new |
217+
| 3 | Wire Gap 2 in `hooks/post_tool_capture.py`: post-edit, call `ap_bridge.detect_changes` + `get_impact`, pipe to `core/coupled_neuromodulation.py` | G2 | `hooks/post_tool_capture.py` |
218+
| 4 | New handler `handlers/get_ap_status.py`; UI pill in `ui/unified/js/workflow_graph.js` explaining degradation | G3 | new + UI |
219+
| 5 | Split `workflow_graph_source_ast.py` (634 LOC) behavior-preservingly into facade + `_symbols.py` + `_edges.py` to satisfy §4.1 | size rule | `infrastructure/workflow_graph_source_ast.py` |
220+
221+
**Replaced by this list (from v1, now retracted):**
222+
- v1 Move 1 "Build `core/git_diff_to_symbols.py`" — AP's `detect_changes` already does this far better. Retracted.
223+
- v1 Move 3 "Rewrite `hooks/pipeline_impact_bump.py` to in-process path" — the CORRECT move is to make the existing AP call feed the memory side (G2 above), not replicate AP. Retracted.
224+
- v1 Move 6 "Add `store.get_code_edges()` + union into plasticity" — partially retained: plasticity can still consume AP edges via the bridge, but we do NOT need a new store method; use `ap_bridge` directly from within the consolidation pipeline.
225+
- v1 Move 7 "Replace `compute_impact._bfs` with `spread_activation`" — retracted for AP's impact. However, the Cortex-side SYMBOL-heat update (Gap 2 wiring) implicitly gives spreading-activation-over-code when the agent asks about a file: `recall` already walks entities via `spread_activation`; adding SYMBOL entities with heat = "how much the agent cared about this code recently" gives the same effect without new code.
226+
227+
---
228+
229+
## 7. ADR-0046 status
230+
231+
After this correction, ADR-0046 is **not superseded**. It is **validated**.
232+
The score on Aristotle's 5 axes:
233+
234+
| Axis | Rating |
235+
|---|---|
236+
| Identifies both causes correctly | PASS |
237+
| Names the right differentia | PARTIAL (frames as temporal; real differentia is truth-semantics) |
238+
| Proposes integration respecting differentia | PASS |
239+
| Avoids category error | PASS |
240+
| Graceful degradation | PARTIAL (promise vs code; Gap 3 closes this) |
241+
242+
A one-line amendment to ADR-0046 should say: *The essential
243+
differentia between Cortex and AP is the truth-semantics of their
244+
nodes — AP is structural truth (immutable), Cortex is behavioral
245+
belief (decaying). Any integration that treats nodes as fungible
246+
across the two stores is a category error.*
247+
248+
---
249+
250+
## 8. Status
251+
252+
- v1 document superseded. Keep v1 on disk as historical record of the
253+
inferred-not-inspected analysis — it demonstrates why reading the
254+
actual dependent repo matters before architecturing.
255+
- v2 (this document) is the current gap analysis.
256+
- 5 moves above are the recommended work, ordered by risk.
257+
- GitNexus is informational only — no action item.
258+
259+
---
260+
261+
## References
262+
263+
- `docs/adr/ADR-0046-automatised-pipeline-integration.md`
264+
- `/Users/cdeust/Developments/anthropic/ai-automatised-pipeline/` — Cargo.toml, README.md, src/*
265+
- `docs/program/v3.14-gap-analysis-codebase-as-core.md` — v1 (superseded)
266+
- [github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus) — reference only

0 commit comments

Comments
 (0)