docs: gap analysis v2 — CORRECTED after reading AP source + web search

cdeust · cdeust · commit 66ee97b68412 · 2026-04-24T17:56:17.000+02:00
v1 (docs/program/v3.14-gap-analysis-codebase-as-core.md) recommended
"pull AP into Cortex" based on inference from the Cortex-side bridge,
not inspection of AP itself. Aristotle's four-causes interrogation
against direct inspection of automatised-pipeline exposed the category
error: AP is not a parser — it is a 12,046-LOC Rust MCP server with
LadybugDB, Tantivy BM25, 5-layer resolver (incl. LSP), Louvain+Traag
clustering, Tarjan-SCC semantic diff, PRD validator, security gates,
macro expansion, and stdlib indexing. Merging it into Cortex would
dissolve the essential differentia: AP = immutable structural truth,
Cortex = decaying behavioral belief.

v2 retracts the Coase "pull in" moves. What holds:
- Alexander's "two analyzers, no shared port" observation (Gap 1)
- Von Neumann's behavioral-overlay idea — heat on FILE/SYMBOL nodes,
  but on Cortex's projection (entities table), NOT on AP's store
- §4.1 violation of workflow_graph_source_ast.py (634 LOC)

Real gaps, all addressable without rewriting AP:
- Gap 1 (structural): retire in-house AST as fallback-only when AP enabled
- Gap 2 (integrative): wire AP get_impact/detect_changes → neuromodulation
  surprise signal; Phase 4 of ADR-0046 never shipped to the memory side
- Gap 3 (graceful degradation): ADR promises degradation, code delivers
  silence — add explanatory UI pill

ADR-0046 is validated, not superseded. 4-of-5 Aristotelian axes pass;
the ADR's architectural intent (sibling MCP, bridge-consumed, Cortex
never writes there) follows directly from the differentia and is
correct.

GitNexus (github.com/abhigyanpatwari/GitNexus) web-searched: separate
TypeScript product, similar structural thesis (tree-sitter → knowledge
graph → MCP). Not a dependency of Cortex or AP. Validates direction:
code-structure intelligence belongs as its own MCP server.

5 corrected moves, ordered by risk. v1 kept on disk as record of why
reading the dependent repo matters before architecturing.
diff --git a/docs/program/v3.14-gap-analysis-v2-corrected.md b/docs/program/v3.14-gap-analysis-v2-corrected.md
@@ -0,0 +1,266 @@
+# Gap Analysis v2 (CORRECTED) — Cortex ↔ automatised-pipeline
+
+**Date:** 2026-04-24
+**Supersedes:** `docs/program/v3.14-gap-analysis-codebase-as-core.md` (v1)
+**Method evolution:** v1 dispatched Alexander / Coase / Von Neumann without
+reading the AP repository. Coase's "pull AP into Cortex" recommendation
+was based on *inference from the Cortex-side bridge* rather than AP's
+actual source. This v2 corrects the record using direct inspection of
+AP's 12k-LOC Rust codebase, a re-read of ADR-0046, and Aristotle's
+four-causes interrogation.
+
+---
+
+## 1. What was materially wrong in v1
+
+The v1 Coase verdict — "pull `index_codebase` / `query_graph` / `get_impact`
+into Cortex; only 1200 LOC of bridge and 4-of-5 capabilities already
+in-core" — was wrong because the evidence was inferred, not inspected.
+
+Direct inspection of `/Users/cdeust/Developments/anthropic/ai-automatised-pipeline`:
+
+| Evidence | v1 assumed | Reality |
+|---|---|---|
+| AP's scope | "mostly tree-sitter" | **12,046 LOC Rust, 23 MCP tools, 220 tests** (`README.md:10-14`) |
+| Graph store | "Kuzu-like, skippable" | **LadybugDB via `lbug 0.15`** — in-process property graph with Cypher dialect (`graph_store.rs:1103 LOC`, 15+ typed node kinds, 7+ typed edge kinds) |
+| Search | "BM25 only, specialist" | **Tantivy BM25 + TF-IDF + RRF fusion** (`src/search/`: bm25.rs, rrf.rs, vector.rs) |
+| Resolution | "just import resolution" | **5-layer resolver** — `resolver.rs` (882) + `resolver_layers.rs` + `lsp_resolver.rs` (444) + `lsp_client.rs` (728) — including LSP |
+| Clustering | "networkx Louvain" | **Rust Louvain + Traag C2 repair pass** (`clustering.rs:867 LOC`, Blondel 2008 + Traag 2019) |
+| Diff analysis | "git diff → symbols, simple" | **`git_diff.rs` (570) + `semantic_diff.rs` (673)** with Tarjan-SCC regression detection |
+| Additional | not mentioned | **`prd_validator.rs` (605)** symbol hallucination + scope check; **`security_gates.rs` (534)** auth/unsafe/public API; **`macro_expansion/`** (Rust/Python/TS); **`stdlib_index/`** |
+
+Verdict: Coase's "pull in" would require re-writing ~7,000 LOC of
+specialized Rust functionality in Python. Not a refactor — a rewrite
+of a sibling project. **Retracted.**
+
+---
+
+## 2. The essential differentia (Aristotle)
+
+**Genus (both):** MCP servers exposing a graph of development context to
+Claude via JSON-RPC.
+
+**Differentia:** the *truth semantics of their nodes*.
+
+- **AP = structural-truth graph.** A `Function` node exists iff the
+  parser saw it in the file at index time. Re-indexing discards the
+  previous graph. No heat, no decay, no reconsolidation, no confidence
+  — only *is-this-in-the-AST-right-now*. `confidence=1.0` by
+  construction on every `defined_in` / `calls` / `imports` edge
+  (`workflow_graph_schema.py:193`).
+
+- **Cortex = belief graph with thermodynamic decay.** A memory has heat
+  (`thermodynamics.py`), stage (`cascade.py`), reconsolidation history
+  (`reconsolidation.py`), predictive-coding gate survival
+  (`hierarchical_predictive_coding.py`). Same content stored twice may
+  merge, strengthen, or be rejected.
+
+**Consequence:** the two stores must NOT be merged. A decaying AST is
+a bug; an immutable belief is amnesia. ADR-0046's "Cortex never writes
+there" (`ADR-0046:149`) follows directly from this differentia and is
+correct.
+
+---
+
+## 3. What held from v1
+
+Three findings from v1 survive intact:
+
+1. **Alexander's "no strong center"** — Cortex has two parallel code
+   analyzers: in-core tree-sitter (`core/codebase_graph.py`,
+   `ast_parser.py`, `ast_extractors*.py`) AND the AP bridge. No shared
+   port. This is REAL and unchanged — see Gap 1 below.
+
+2. **Von Neumann's behavioral-overlay proposal** — add `heat`,
+   `last_touched`, `co_edited` to FILE/SYMBOL nodes; reuse memory
+   thermodynamics. This is CORRECT regardless of whether AP lives
+   in-process or external — the overlay is on Cortex's *own*
+   workflow-graph projection of AP's data, not on AP's store. See
+   Gap 2 below.
+
+3. **`workflow_graph_source_ast.py` is 634 LOC** — violates the §4.1
+   size rule. Still needs a behavior-preserving split.
+
+---
+
+## 4. Corrected gaps (after reading AP's source)
+
+### Gap 1 — Structural: in-house AST in Cortex duplicates what AP owns (when AP is enabled).
+
+**Evidence.** `mcp_server/core/codebase_graph.py` (272 LOC),
+`codebase_parser.py` (147), `codebase_type_resolver.py` (137),
+`codebase_extractors.py` (245), `ast_parser.py` (226),
+`ast_extractors.py` (224), `ast_extractors_extra.py` (198) — all
+in-house tree-sitter. `workflow_graph_builder.py` uses them. When
+`CORTEX_ENABLE_AP=1`, AP's deeper AST (5-layer resolver + LSP + macro
+expansion + stdlib) should be authoritative, but the in-house path
+still runs.
+
+**Root cause.** ADR-0046 implemented the schema additions
+(NodeKind.SYMBOL, EdgeKind.CALLS/IMPORTS/DEFINED_IN/MEMBER_OF) and
+the bridge, but never retired the in-house AST. The result is two
+AST truths with no precedence.
+
+**Smallest fix.** Mark `codebase_graph.py` + `ast_parser.py` +
+`ast_extractors*.py` as explicit *fallback only when AP is disabled*.
+Add a guard at module import:
+
+```python
+# core/codebase_graph.py:1
+# Fallback AST used only when CORTEX_ENABLE_AP=0. When AP is
+# enabled, workflow_graph_source_ast.py is the authoritative
+# source — see ADR-0046.
+```
+
+Then in `workflow_graph_builder.py`, prefer `workflow_graph_source_ast.load_symbols()`
+over `codebase_graph.resolve_all_imports()` when `ap_bridge.is_enabled()`.
+
+**Note:** this REVERSES v1 move 4 (retire `ingest_codebase.py` in favor
+of `codebase_analyze.py`). The correct direction is the opposite:
+`codebase_analyze.py` becomes the fallback; `ingest_codebase.py` (AP
+path) is primary when AP is present.
+
+### Gap 2 — Integrative: AP's efficient-cause signals never reach Cortex's memory subsystems.
+
+**Evidence.** `ap_bridge.py` exposes `get_impact`, `detect_changes`,
+`get_context`. Consumers:
+
+| Consumer | Uses AP output? |
+|---|---|
+| `core/memory_ingest.py` | No |
+| `core/consolidation_engine.py` | No |
+| `core/spreading_activation.py` | No |
+| `core/coupled_neuromodulation.py` | No |
+| `core/synaptic_plasticity.py` | No |
+| `handlers/consolidation/plasticity.py` | No — uses substring co-access (v1 gap G3 still holds) |
+| `handlers/consolidation/memify.py` | No |
+| `hooks/pipeline_impact_bump.py` | **Yes** — calls `detect_changes` + ILIKE sweep on memory content |
+| `workflow_graph_source_ast.py` | Yes — projects into visualization only |
+
+AP produces a rich structural signal (`get_impact` returns blast
+radius across communities and processes); Cortex's neuromodulation
+(`coupled_neuromodulation.py`, DA/NE/ACh/5-HT cascade) takes
+"surprise" as input. A high-blast-radius edit IS surprise. The
+signal exists; the wire doesn't.
+
+**Root cause.** Phase 4 of ADR-0046 ("change-impact annotation via
+post_tool_capture hook") is listed last and was never wired to the
+memory side — only to the visualization pulse.
+
+**Smallest fix.** In `hooks/post_tool_capture.py`, after a code edit:
+1. Call `ap_bridge.detect_changes()` + `get_impact()`.
+2. Map blast-radius score → surprise modulator in `core/coupled_neuromodulation.py`.
+3. This raises DA/NE transiently → stronger encoding of the memory
+   the agent just wrote about the edit.
+
+New pure module: `core/ap_impact_to_surprise.py` — takes
+`{affected_symbols, community_count, process_count}`, returns
+`surprise ∈ [0, 1]`. No infra dependency; testable in isolation.
+
+**This is where Von Neumann's behavioral-overlay idea lives.** The
+SYMBOL nodes get `heat` on the Cortex side (the projection), not on
+AP's side (immutable). AP stays read-only. The heat is *Cortex's
+memory of which symbols the agent cared about*, written into the
+PostgreSQL `entities`/`relationships` tables that already carry heat.
+
+### Gap 3 — Graceful degradation: ADR promises it, code delivers silence.
+
+**Evidence.** When AP is absent:
+- `ap_bridge.py:165` `connect()` returns False silently.
+- `workflow_graph_source_ast.py` returns `[]` from every loader.
+- Wiki-verify button, unified search, impact pulse all no-op with no UI message.
+- `ap_bridge.unavailable_reason` exists (line 158) but no caller surfaces it.
+
+**Root cause.** Degradation was implemented as "return empty", not
+"return empty + explain why".
+
+**Smallest fix.** New handler `mcp_server/handlers/get_ap_status.py`
+wrapping `APBridge().unavailable_reason`; UI renders a pill: "AST depth
+unavailable — install `automatised-pipeline` to enable symbol-level
+view." One handler, one pill. Respects the ADR promise.
+
+---
+
+## 5. On GitNexus (user's request — web search)
+
+Web search surfaced [github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus).
+It is a **separate TypeScript/JavaScript product**, browser-based,
+client-side knowledge graph creator with an MCP server
+(`npx gitnexus analyze`). Architecture: tree-sitter → knowledge graph →
+Graph RAG agent → MCP tools. Features include Multi-File Rename,
+Git-Diff Impact Analysis, Process-Grouped Search, 360° Context, Claude
+Code Hooks, Multi-Repo MCP, Zero-Config Setup, 9+ language support.
+
+ADR-0046:106 mentions "Multi-repo / workgroup operations (GitNexus
+`group_*`)" — a single reference suggesting AP may borrow the
+multi-repo concept. GitNexus is **NOT a dependency of Cortex or AP**;
+it is a structural cousin (same genus: MCP + code graph; similar
+differentia: structural-truth graph via tree-sitter).
+
+**Implication for the gap analysis:** none direct. But GitNexus being
+a successful shipped product with the same structural thesis as AP
+validates the architectural direction — code-structure intelligence
+lives as its own MCP server, not merged into a memory/belief system.
+
+---
+
+## 6. Corrected move sequence (5 ordered moves)
+
+v1 proposed 7 moves, most of which would dissolve AP into Cortex.
+The corrected sequence is 5 moves, none of which require rewriting
+AP functionality.
+
+| # | Move | Gap | File |
+|---|---|---|---|
+| 1 | Guard in-house AST: add "fallback-only" docstring + precedence logic that prefers `workflow_graph_source_ast` when `ap_bridge.is_enabled()` | G1 | `core/codebase_graph.py`, `core/ast_parser.py`, `core/workflow_graph_builder.py` |
+| 2 | New module `core/ap_impact_to_surprise.py` — pure, maps AP blast radius → neuromodulation surprise signal | G2 | new |
+| 3 | Wire Gap 2 in `hooks/post_tool_capture.py`: post-edit, call `ap_bridge.detect_changes` + `get_impact`, pipe to `core/coupled_neuromodulation.py` | G2 | `hooks/post_tool_capture.py` |
+| 4 | New handler `handlers/get_ap_status.py`; UI pill in `ui/unified/js/workflow_graph.js` explaining degradation | G3 | new + UI |
+| 5 | Split `workflow_graph_source_ast.py` (634 LOC) behavior-preservingly into facade + `_symbols.py` + `_edges.py` to satisfy §4.1 | size rule | `infrastructure/workflow_graph_source_ast.py` |
+
+**Replaced by this list (from v1, now retracted):**
+- v1 Move 1 "Build `core/git_diff_to_symbols.py`" — AP's `detect_changes` already does this far better. Retracted.
+- v1 Move 3 "Rewrite `hooks/pipeline_impact_bump.py` to in-process path" — the CORRECT move is to make the existing AP call feed the memory side (G2 above), not replicate AP. Retracted.
+- v1 Move 6 "Add `store.get_code_edges()` + union into plasticity" — partially retained: plasticity can still consume AP edges via the bridge, but we do NOT need a new store method; use `ap_bridge` directly from within the consolidation pipeline.
+- v1 Move 7 "Replace `compute_impact._bfs` with `spread_activation`" — retracted for AP's impact. However, the Cortex-side SYMBOL-heat update (Gap 2 wiring) implicitly gives spreading-activation-over-code when the agent asks about a file: `recall` already walks entities via `spread_activation`; adding SYMBOL entities with heat = "how much the agent cared about this code recently" gives the same effect without new code.
+
+---
+
+## 7. ADR-0046 status
+
+After this correction, ADR-0046 is **not superseded**. It is **validated**.
+The score on Aristotle's 5 axes:
+
+| Axis | Rating |
+|---|---|
+| Identifies both causes correctly | PASS |
+| Names the right differentia | PARTIAL (frames as temporal; real differentia is truth-semantics) |
+| Proposes integration respecting differentia | PASS |
+| Avoids category error | PASS |
+| Graceful degradation | PARTIAL (promise vs code; Gap 3 closes this) |
+
+A one-line amendment to ADR-0046 should say: *The essential
+differentia between Cortex and AP is the truth-semantics of their
+nodes — AP is structural truth (immutable), Cortex is behavioral
+belief (decaying). Any integration that treats nodes as fungible
+across the two stores is a category error.*
+
+---
+
+## 8. Status
+
+- v1 document superseded. Keep v1 on disk as historical record of the
+  inferred-not-inspected analysis — it demonstrates why reading the
+  actual dependent repo matters before architecturing.
+- v2 (this document) is the current gap analysis.
+- 5 moves above are the recommended work, ordered by risk.
+- GitNexus is informational only — no action item.
+
+---
+
+## References
+
+- `docs/adr/ADR-0046-automatised-pipeline-integration.md`
+- `/Users/cdeust/Developments/anthropic/ai-automatised-pipeline/` — Cargo.toml, README.md, src/*
+- `docs/program/v3.14-gap-analysis-codebase-as-core.md` — v1 (superseded)
+- [github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus) — reference only