|
| 1 | +# Gap Analysis v2 (CORRECTED) — Cortex ↔ automatised-pipeline |
| 2 | + |
| 3 | +**Date:** 2026-04-24 |
| 4 | +**Supersedes:** `docs/program/v3.14-gap-analysis-codebase-as-core.md` (v1) |
| 5 | +**Method evolution:** v1 dispatched Alexander / Coase / Von Neumann without |
| 6 | +reading the AP repository. Coase's "pull AP into Cortex" recommendation |
| 7 | +was based on *inference from the Cortex-side bridge* rather than AP's |
| 8 | +actual source. This v2 corrects the record using direct inspection of |
| 9 | +AP's 12k-LOC Rust codebase, a re-read of ADR-0046, and Aristotle's |
| 10 | +four-causes interrogation. |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +## 1. What was materially wrong in v1 |
| 15 | + |
| 16 | +The v1 Coase verdict — "pull `index_codebase` / `query_graph` / `get_impact` |
| 17 | +into Cortex; only 1200 LOC of bridge and 4-of-5 capabilities already |
| 18 | +in-core" — was wrong because the evidence was inferred, not inspected. |
| 19 | + |
| 20 | +Direct inspection of `/Users/cdeust/Developments/anthropic/ai-automatised-pipeline`: |
| 21 | + |
| 22 | +| Evidence | v1 assumed | Reality | |
| 23 | +|---|---|---| |
| 24 | +| AP's scope | "mostly tree-sitter" | **12,046 LOC Rust, 23 MCP tools, 220 tests** (`README.md:10-14`) | |
| 25 | +| Graph store | "Kuzu-like, skippable" | **LadybugDB via `lbug 0.15`** — in-process property graph with Cypher dialect (`graph_store.rs:1103 LOC`, 15+ typed node kinds, 7+ typed edge kinds) | |
| 26 | +| Search | "BM25 only, specialist" | **Tantivy BM25 + TF-IDF + RRF fusion** (`src/search/`: bm25.rs, rrf.rs, vector.rs) | |
| 27 | +| Resolution | "just import resolution" | **5-layer resolver** — `resolver.rs` (882) + `resolver_layers.rs` + `lsp_resolver.rs` (444) + `lsp_client.rs` (728) — including LSP | |
| 28 | +| Clustering | "networkx Louvain" | **Rust Louvain + Traag C2 repair pass** (`clustering.rs:867 LOC`, Blondel 2008 + Traag 2019) | |
| 29 | +| Diff analysis | "git diff → symbols, simple" | **`git_diff.rs` (570) + `semantic_diff.rs` (673)** with Tarjan-SCC regression detection | |
| 30 | +| Additional | not mentioned | **`prd_validator.rs` (605)** symbol hallucination + scope check; **`security_gates.rs` (534)** auth/unsafe/public API; **`macro_expansion/`** (Rust/Python/TS); **`stdlib_index/`** | |
| 31 | + |
| 32 | +Verdict: Coase's "pull in" would require re-writing ~7,000 LOC of |
| 33 | +specialized Rust functionality in Python. Not a refactor — a rewrite |
| 34 | +of a sibling project. **Retracted.** |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## 2. The essential differentia (Aristotle) |
| 39 | + |
| 40 | +**Genus (both):** MCP servers exposing a graph of development context to |
| 41 | +Claude via JSON-RPC. |
| 42 | + |
| 43 | +**Differentia:** the *truth semantics of their nodes*. |
| 44 | + |
| 45 | +- **AP = structural-truth graph.** A `Function` node exists iff the |
| 46 | + parser saw it in the file at index time. Re-indexing discards the |
| 47 | + previous graph. No heat, no decay, no reconsolidation, no confidence |
| 48 | + — only *is-this-in-the-AST-right-now*. `confidence=1.0` by |
| 49 | + construction on every `defined_in` / `calls` / `imports` edge |
| 50 | + (`workflow_graph_schema.py:193`). |
| 51 | + |
| 52 | +- **Cortex = belief graph with thermodynamic decay.** A memory has heat |
| 53 | + (`thermodynamics.py`), stage (`cascade.py`), reconsolidation history |
| 54 | + (`reconsolidation.py`), predictive-coding gate survival |
| 55 | + (`hierarchical_predictive_coding.py`). Same content stored twice may |
| 56 | + merge, strengthen, or be rejected. |
| 57 | + |
| 58 | +**Consequence:** the two stores must NOT be merged. A decaying AST is |
| 59 | +a bug; an immutable belief is amnesia. ADR-0046's "Cortex never writes |
| 60 | +there" (`ADR-0046:149`) follows directly from this differentia and is |
| 61 | +correct. |
| 62 | + |
| 63 | +--- |
| 64 | + |
| 65 | +## 3. What held from v1 |
| 66 | + |
| 67 | +Three findings from v1 survive intact: |
| 68 | + |
| 69 | +1. **Alexander's "no strong center"** — Cortex has two parallel code |
| 70 | + analyzers: in-core tree-sitter (`core/codebase_graph.py`, |
| 71 | + `ast_parser.py`, `ast_extractors*.py`) AND the AP bridge. No shared |
| 72 | + port. This is REAL and unchanged — see Gap 1 below. |
| 73 | + |
| 74 | +2. **Von Neumann's behavioral-overlay proposal** — add `heat`, |
| 75 | + `last_touched`, `co_edited` to FILE/SYMBOL nodes; reuse memory |
| 76 | + thermodynamics. This is CORRECT regardless of whether AP lives |
| 77 | + in-process or external — the overlay is on Cortex's *own* |
| 78 | + workflow-graph projection of AP's data, not on AP's store. See |
| 79 | + Gap 2 below. |
| 80 | + |
| 81 | +3. **`workflow_graph_source_ast.py` is 634 LOC** — violates the §4.1 |
| 82 | + size rule. Still needs a behavior-preserving split. |
| 83 | + |
| 84 | +--- |
| 85 | + |
| 86 | +## 4. Corrected gaps (after reading AP's source) |
| 87 | + |
| 88 | +### Gap 1 — Structural: in-house AST in Cortex duplicates what AP owns (when AP is enabled). |
| 89 | + |
| 90 | +**Evidence.** `mcp_server/core/codebase_graph.py` (272 LOC), |
| 91 | +`codebase_parser.py` (147), `codebase_type_resolver.py` (137), |
| 92 | +`codebase_extractors.py` (245), `ast_parser.py` (226), |
| 93 | +`ast_extractors.py` (224), `ast_extractors_extra.py` (198) — all |
| 94 | +in-house tree-sitter. `workflow_graph_builder.py` uses them. When |
| 95 | +`CORTEX_ENABLE_AP=1`, AP's deeper AST (5-layer resolver + LSP + macro |
| 96 | +expansion + stdlib) should be authoritative, but the in-house path |
| 97 | +still runs. |
| 98 | + |
| 99 | +**Root cause.** ADR-0046 implemented the schema additions |
| 100 | +(NodeKind.SYMBOL, EdgeKind.CALLS/IMPORTS/DEFINED_IN/MEMBER_OF) and |
| 101 | +the bridge, but never retired the in-house AST. The result is two |
| 102 | +AST truths with no precedence. |
| 103 | + |
| 104 | +**Smallest fix.** Mark `codebase_graph.py` + `ast_parser.py` + |
| 105 | +`ast_extractors*.py` as explicit *fallback only when AP is disabled*. |
| 106 | +Add a guard at module import: |
| 107 | + |
| 108 | +```python |
| 109 | +# core/codebase_graph.py:1 |
| 110 | +# Fallback AST used only when CORTEX_ENABLE_AP=0. When AP is |
| 111 | +# enabled, workflow_graph_source_ast.py is the authoritative |
| 112 | +# source — see ADR-0046. |
| 113 | +``` |
| 114 | + |
| 115 | +Then in `workflow_graph_builder.py`, prefer `workflow_graph_source_ast.load_symbols()` |
| 116 | +over `codebase_graph.resolve_all_imports()` when `ap_bridge.is_enabled()`. |
| 117 | + |
| 118 | +**Note:** this REVERSES v1 move 4 (retire `ingest_codebase.py` in favor |
| 119 | +of `codebase_analyze.py`). The correct direction is the opposite: |
| 120 | +`codebase_analyze.py` becomes the fallback; `ingest_codebase.py` (AP |
| 121 | +path) is primary when AP is present. |
| 122 | + |
| 123 | +### Gap 2 — Integrative: AP's efficient-cause signals never reach Cortex's memory subsystems. |
| 124 | + |
| 125 | +**Evidence.** `ap_bridge.py` exposes `get_impact`, `detect_changes`, |
| 126 | +`get_context`. Consumers: |
| 127 | + |
| 128 | +| Consumer | Uses AP output? | |
| 129 | +|---|---| |
| 130 | +| `core/memory_ingest.py` | No | |
| 131 | +| `core/consolidation_engine.py` | No | |
| 132 | +| `core/spreading_activation.py` | No | |
| 133 | +| `core/coupled_neuromodulation.py` | No | |
| 134 | +| `core/synaptic_plasticity.py` | No | |
| 135 | +| `handlers/consolidation/plasticity.py` | No — uses substring co-access (v1 gap G3 still holds) | |
| 136 | +| `handlers/consolidation/memify.py` | No | |
| 137 | +| `hooks/pipeline_impact_bump.py` | **Yes** — calls `detect_changes` + ILIKE sweep on memory content | |
| 138 | +| `workflow_graph_source_ast.py` | Yes — projects into visualization only | |
| 139 | + |
| 140 | +AP produces a rich structural signal (`get_impact` returns blast |
| 141 | +radius across communities and processes); Cortex's neuromodulation |
| 142 | +(`coupled_neuromodulation.py`, DA/NE/ACh/5-HT cascade) takes |
| 143 | +"surprise" as input. A high-blast-radius edit IS surprise. The |
| 144 | +signal exists; the wire doesn't. |
| 145 | + |
| 146 | +**Root cause.** Phase 4 of ADR-0046 ("change-impact annotation via |
| 147 | +post_tool_capture hook") is listed last and was never wired to the |
| 148 | +memory side — only to the visualization pulse. |
| 149 | + |
| 150 | +**Smallest fix.** In `hooks/post_tool_capture.py`, after a code edit: |
| 151 | +1. Call `ap_bridge.detect_changes()` + `get_impact()`. |
| 152 | +2. Map blast-radius score → surprise modulator in `core/coupled_neuromodulation.py`. |
| 153 | +3. This raises DA/NE transiently → stronger encoding of the memory |
| 154 | + the agent just wrote about the edit. |
| 155 | + |
| 156 | +New pure module: `core/ap_impact_to_surprise.py` — takes |
| 157 | +`{affected_symbols, community_count, process_count}`, returns |
| 158 | +`surprise ∈ [0, 1]`. No infra dependency; testable in isolation. |
| 159 | + |
| 160 | +**This is where Von Neumann's behavioral-overlay idea lives.** The |
| 161 | +SYMBOL nodes get `heat` on the Cortex side (the projection), not on |
| 162 | +AP's side (immutable). AP stays read-only. The heat is *Cortex's |
| 163 | +memory of which symbols the agent cared about*, written into the |
| 164 | +PostgreSQL `entities`/`relationships` tables that already carry heat. |
| 165 | + |
| 166 | +### Gap 3 — Graceful degradation: ADR promises it, code delivers silence. |
| 167 | + |
| 168 | +**Evidence.** When AP is absent: |
| 169 | +- `ap_bridge.py:165` `connect()` returns False silently. |
| 170 | +- `workflow_graph_source_ast.py` returns `[]` from every loader. |
| 171 | +- Wiki-verify button, unified search, impact pulse all no-op with no UI message. |
| 172 | +- `ap_bridge.unavailable_reason` exists (line 158) but no caller surfaces it. |
| 173 | + |
| 174 | +**Root cause.** Degradation was implemented as "return empty", not |
| 175 | +"return empty + explain why". |
| 176 | + |
| 177 | +**Smallest fix.** New handler `mcp_server/handlers/get_ap_status.py` |
| 178 | +wrapping `APBridge().unavailable_reason`; UI renders a pill: "AST depth |
| 179 | +unavailable — install `automatised-pipeline` to enable symbol-level |
| 180 | +view." One handler, one pill. Respects the ADR promise. |
| 181 | + |
| 182 | +--- |
| 183 | + |
| 184 | +## 5. On GitNexus (user's request — web search) |
| 185 | + |
| 186 | +Web search surfaced [github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus). |
| 187 | +It is a **separate TypeScript/JavaScript product**, browser-based, |
| 188 | +client-side knowledge graph creator with an MCP server |
| 189 | +(`npx gitnexus analyze`). Architecture: tree-sitter → knowledge graph → |
| 190 | +Graph RAG agent → MCP tools. Features include Multi-File Rename, |
| 191 | +Git-Diff Impact Analysis, Process-Grouped Search, 360° Context, Claude |
| 192 | +Code Hooks, Multi-Repo MCP, Zero-Config Setup, 9+ language support. |
| 193 | + |
| 194 | +ADR-0046:106 mentions "Multi-repo / workgroup operations (GitNexus |
| 195 | +`group_*`)" — a single reference suggesting AP may borrow the |
| 196 | +multi-repo concept. GitNexus is **NOT a dependency of Cortex or AP**; |
| 197 | +it is a structural cousin (same genus: MCP + code graph; similar |
| 198 | +differentia: structural-truth graph via tree-sitter). |
| 199 | + |
| 200 | +**Implication for the gap analysis:** none direct. But GitNexus being |
| 201 | +a successful shipped product with the same structural thesis as AP |
| 202 | +validates the architectural direction — code-structure intelligence |
| 203 | +lives as its own MCP server, not merged into a memory/belief system. |
| 204 | + |
| 205 | +--- |
| 206 | + |
| 207 | +## 6. Corrected move sequence (5 ordered moves) |
| 208 | + |
| 209 | +v1 proposed 7 moves, most of which would dissolve AP into Cortex. |
| 210 | +The corrected sequence is 5 moves, none of which require rewriting |
| 211 | +AP functionality. |
| 212 | + |
| 213 | +| # | Move | Gap | File | |
| 214 | +|---|---|---|---| |
| 215 | +| 1 | Guard in-house AST: add "fallback-only" docstring + precedence logic that prefers `workflow_graph_source_ast` when `ap_bridge.is_enabled()` | G1 | `core/codebase_graph.py`, `core/ast_parser.py`, `core/workflow_graph_builder.py` | |
| 216 | +| 2 | New module `core/ap_impact_to_surprise.py` — pure, maps AP blast radius → neuromodulation surprise signal | G2 | new | |
| 217 | +| 3 | Wire Gap 2 in `hooks/post_tool_capture.py`: post-edit, call `ap_bridge.detect_changes` + `get_impact`, pipe to `core/coupled_neuromodulation.py` | G2 | `hooks/post_tool_capture.py` | |
| 218 | +| 4 | New handler `handlers/get_ap_status.py`; UI pill in `ui/unified/js/workflow_graph.js` explaining degradation | G3 | new + UI | |
| 219 | +| 5 | Split `workflow_graph_source_ast.py` (634 LOC) behavior-preservingly into facade + `_symbols.py` + `_edges.py` to satisfy §4.1 | size rule | `infrastructure/workflow_graph_source_ast.py` | |
| 220 | + |
| 221 | +**Replaced by this list (from v1, now retracted):** |
| 222 | +- v1 Move 1 "Build `core/git_diff_to_symbols.py`" — AP's `detect_changes` already does this far better. Retracted. |
| 223 | +- v1 Move 3 "Rewrite `hooks/pipeline_impact_bump.py` to in-process path" — the CORRECT move is to make the existing AP call feed the memory side (G2 above), not replicate AP. Retracted. |
| 224 | +- v1 Move 6 "Add `store.get_code_edges()` + union into plasticity" — partially retained: plasticity can still consume AP edges via the bridge, but we do NOT need a new store method; use `ap_bridge` directly from within the consolidation pipeline. |
| 225 | +- v1 Move 7 "Replace `compute_impact._bfs` with `spread_activation`" — retracted for AP's impact. However, the Cortex-side SYMBOL-heat update (Gap 2 wiring) implicitly gives spreading-activation-over-code when the agent asks about a file: `recall` already walks entities via `spread_activation`; adding SYMBOL entities with heat = "how much the agent cared about this code recently" gives the same effect without new code. |
| 226 | + |
| 227 | +--- |
| 228 | + |
| 229 | +## 7. ADR-0046 status |
| 230 | + |
| 231 | +After this correction, ADR-0046 is **not superseded**. It is **validated**. |
| 232 | +The score on Aristotle's 5 axes: |
| 233 | + |
| 234 | +| Axis | Rating | |
| 235 | +|---|---| |
| 236 | +| Identifies both causes correctly | PASS | |
| 237 | +| Names the right differentia | PARTIAL (frames as temporal; real differentia is truth-semantics) | |
| 238 | +| Proposes integration respecting differentia | PASS | |
| 239 | +| Avoids category error | PASS | |
| 240 | +| Graceful degradation | PARTIAL (promise vs code; Gap 3 closes this) | |
| 241 | + |
| 242 | +A one-line amendment to ADR-0046 should say: *The essential |
| 243 | +differentia between Cortex and AP is the truth-semantics of their |
| 244 | +nodes — AP is structural truth (immutable), Cortex is behavioral |
| 245 | +belief (decaying). Any integration that treats nodes as fungible |
| 246 | +across the two stores is a category error.* |
| 247 | + |
| 248 | +--- |
| 249 | + |
| 250 | +## 8. Status |
| 251 | + |
| 252 | +- v1 document superseded. Keep v1 on disk as historical record of the |
| 253 | + inferred-not-inspected analysis — it demonstrates why reading the |
| 254 | + actual dependent repo matters before architecturing. |
| 255 | +- v2 (this document) is the current gap analysis. |
| 256 | +- 5 moves above are the recommended work, ordered by risk. |
| 257 | +- GitNexus is informational only — no action item. |
| 258 | + |
| 259 | +--- |
| 260 | + |
| 261 | +## References |
| 262 | + |
| 263 | +- `docs/adr/ADR-0046-automatised-pipeline-integration.md` |
| 264 | +- `/Users/cdeust/Developments/anthropic/ai-automatised-pipeline/` — Cargo.toml, README.md, src/* |
| 265 | +- `docs/program/v3.14-gap-analysis-codebase-as-core.md` — v1 (superseded) |
| 266 | +- [github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus) — reference only |
0 commit comments