docs: GitNexus competitive analysis + 5-move science-grounded plan

cdeust · cdeust · commit 7c061c932b0a · 2026-04-24T18:09:23.000+02:00
Deep competitive analysis of github.com/abhigyanpatwari/GitNexus
against Cortex + automatised-pipeline. Synthesis of three genius
passes:
- Popper (falsifiability): "Graph RAG &gt; vector RAG for structural
  questions" is unfalsifiable as stated — no published benchmark,
  no pre-registered question set, scope fitted to evidence after
  the fact. Our moat is provenance: paper citations prevent silent
  constant drift.
- Taleb (fragility): GitNexus is fragile on evidence (no papers,
  no benchmarks, single-author, PolyForm Noncommercial blocks
  enterprise). Cortex is antifragile on evidence but fragile on
  Postgres dependency, FlashRank coupling, and "biological fidelity"
  framing attack surface.
- Altshuller (TRIZ): the contradiction GitNexus cannot frame is
  "a fact can be simultaneously true-structurally and false-
  practically." Their graph has no variable for aliveness;
  Cortex has heat.

Genuine gaps both ways. They have: 14-language coverage,
multi-file rename, MRO, overload disambiguation with arity/type-
hash, browser-native UI, zero-config CLI. We have: persistent
cross-session memory, paper-cited mechanisms, benchmarks beating
published SOTA (97.8% / 92.6% / 0.543), cognitive profiling,
predictive-coding write gate, cascade consolidation, homeostatic
scaling, microglial pruning, security gates + PRD validator + SCC
semantic diff via AP, MIT license.

5-move competitive plan (in execution order):
1. M1 — Benchmark slap: run GitNexus on LongMemEval/LoCoMo/BEAM,
   publish reproducible delta. 2 weeks.
2. M2 — Ablation matrix + constant audit: every mechanism earns
   its keep on a benchmark; every constant has # source. 4 weeks.
3. M3 — Code-corpus benchmark: pre-registered questions, both
   systems run, publish. Tests the exact cross-session-code-memory
   intersection. 4 weeks.
4. M4 — Absorb 5 unused resources: git blame as heat prior, test
   coverage as edge weight, PR sentiment as valence, LSP as
   surprise signal, commit verbs as typed edges. 8 weeks.
5. M5 — Cortex-Edge WASM: privacy parity without abandoning
   science; DuckDB+VSS or sqlite-vec; retain thermodynamic core,
   degrade advanced mechanisms gracefully. 12 weeks.

Plus 4 subtractive moves from Taleb (via negativa):
- Stop "biological fidelity" marketing (keep the code, reframe)
- Delete dead code enforcement
- Unbind FlashRank into a port
- Demote 42 of 47 MCP tools to "advanced"

Everything traces to a citable justification. Nothing ships
without a measurement. The moat is that peer-reviewed
benchmarks cannot be out-marketed.
diff --git a/docs/program/gitnexus-competitive-analysis.md b/docs/program/gitnexus-competitive-analysis.md
@@ -0,0 +1,233 @@
+# GitNexus — Competitive Analysis and Science-Grounded Plan
+
+**Date:** 2026-04-24
+**Scope:** compare GitNexus (closest competitor) against Cortex + AP
+(automatised-pipeline); identify gaps both ways; produce a 5-move
+science-backed competitive plan.
+**Method:** web-fetch of GitNexus repo, architecture, Claude.md, and
+the pebblous.ai review post, synthesised through three geniuses —
+Popper (falsifiability), Taleb (fragility), Altshuller (TRIZ
+contradiction resolution). All three converged tightly; findings below
+are the consensus.
+
+---
+
+## 1. GitNexus — what it actually is
+
+[github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus)
+(PolyForm Noncommercial; commercial use via akonlabs.com SaaS).
+
+### 1.1 Capability surface (extracted from README + ARCHITECTURE.md)
+
+| Dimension | GitNexus |
+|---|---|
+| Tech stack | Node.js + TypeScript + React 18 + Vite + Tailwind v4; Sigma.js + Graphology for WebGL viz |
+| Parsers | Tree-sitter grammars for **14 languages** (TS/JS/Py/Java/C#/Go/Rust/PHP/Swift/Kotlin/Ruby/C/C++/Dart) |
+| Storage | LadybugDB (Kuzu-clone) native + WASM builds; CSV-streamed graph load; per-node-type FTS tables |
+| Embeddings | Snowflake arctic-embed-xs (384D) via transformers.js; skipped if >50 k nodes |
+| Search | BM25 + TF-IDF semantic + RRF K=60 |
+| Clustering | Leiden (community detection, no paper cited) |
+| Indexing pipeline | **12-phase DAG**: scan → structure → [markdown/cobol] → parse → [routes/tools/orm] → crossFile → mro → communities → processes |
+| Node kinds | **44** (File, Folder, Function, Class, Interface, Method, Constructor, Struct, Enum, Macro, Route, Tool, Community, Process, Module, etc.) |
+| Edge kinds | **21** (CONTAINS, DEFINES, CALLS, STEP_IN_PROCESS, IMPORTS, EXTENDS, IMPLEMENTS, HAS_METHOD, METHOD_OVERRIDES, METHOD_IMPLEMENTS, ACCESSES, USES, FETCHES, HANDLES_ROUTE, HANDLES_TOOL, ENTRY_POINT_OF, MEMBER_OF, …) — each carries `confidence` + `reason` |
+| Overload disambiguation | arity suffix `#N`, type hash `~<sig>`, const marker `$const` (C++) |
+| MCP tools | **16** — `list_repos`, `query`, `context`, `impact`, `detect_changes`, `rename`, `cypher`, `group_list`, `group_sync`, `group_contracts`, `group_query`, `group_status`, plus 2 prompts and 4 skills |
+| Unique features | multi-file `rename`, raw `cypher` queries, multi-repo `group_*` coordination, LLM-generated wiki with mermaid, C3/Ruby-mixin/first-wins MRO |
+| Deployment | Docker images (`ghcr.io/abhigyanpatwari/gitnexus:latest`), cosign-signed, browser mode ≤5k files |
+
+### 1.2 What their own docs admit
+
+- **No benchmarks published.** The pebblous.ai review states explicitly "no performance benchmarks or comparative metrics; no latency comparisons, accuracy measurements, or head-to-head evaluations."
+- **No paper citations.** ARCHITECTURE.md documents proprietary algorithms (DAG runner, scope-resolution pipeline, cross-impact walk) without academic references. Leiden, C3, RRF are *named* but not cited to specific papers.
+- **Tested it on their own HTML/CSS blog repo and said it was "not very useful"** — language/content scope is a real constraint.
+- **"Browser-native" is partial**: requires Node.js + local server on port 4747 even in web mode.
+- **No contributor pool visible** — single-author project.
+
+---
+
+## 2. The Cortex+AP stack in the same table
+
+| Dimension | Cortex | AP |
+|---|---|---|
+| Tech stack | Python 3.10 + FastMCP + pydantic + numpy | Rust 1.94 + lbug + tantivy + tree-sitter |
+| Parsers | tree-sitter Python/JS/TS/Go/Swift/Rust (file-level) | tree-sitter Rust/Python/TS (symbol-level, 5-layer resolution w/ LSP) |
+| Storage | PostgreSQL 15 + pgvector + pg_trgm | LadybugDB (same as GitNexus) |
+| Embeddings | sentence-transformers 384-dim + FlashRank ONNX cross-encoder rerank | TF-IDF only (per search/mod.rs) |
+| Search | PL/pgSQL WRRF over vector + FTS + trigram + heat + recency, client-side rerank | Tantivy BM25 + TF-IDF + RRF |
+| Clustering | Per-domain cognitive profile + cross-domain bridges | Louvain + Traag C2 repair (Blondel 2008, Traag 2019 — cited) |
+| Scale | 108 core modules, 47 MCP tools, 2500+ tests | 12 046 LOC, 23 MCP tools, 220 tests |
+| Scientific grounding | Every mechanism cites papers: cascade (Kandel 2001), homeostatic (Turrigiano 2008), neuromodulation (Doya 2002), synaptic tagging (Frey & Morris 1997), microglial pruning (Wang 2020), predictive coding (Friston 2010), … | Every stage cites papers: Louvain (Blondel 2008), Traag (2019), RRF K=60 (Cormack et al 2009), Tarjan SCC, tree-sitter … |
+| Benchmarks | **LongMemEval R@10 97.8%** (paper SOTA 78.4%); **LoCoMo 92.6%**; **BEAM 0.543** (paper SOTA 0.329) — all on clean DB, reproducible | 220 unit tests; no external benchmark yet |
+| Unique features | persistent cross-session memory, thermodynamic decay, cascade consolidation, neuromodulation, synaptic tagging, cognitive profile per domain, predictive-coding write gate, hippocampal replay | PRD validator (symbol hallucination check), security gates (auth-critical/unsafe/public API), Tarjan-SCC semantic diff, 5-layer resolver with LSP, macro expansion, stdlib indexing |
+| License | MIT | MIT-equivalent |
+
+---
+
+## 3. Gaps — GitNexus can do that we can't
+
+Honest enumeration. These are genuine capability gaps.
+
+| # | Gap | Why GitNexus has it | Impact |
+|---|---|---|---|
+| **N1** | 14 languages at symbol-level (we have 3 via AP + 6 file-level via Cortex core) | Tree-sitter grammars are commoditized; they ported more | Polyglot codebases retrieve poorly for us |
+| **N2** | Multi-file coordinated rename (`rename` tool) | Graph + text-search + confidence-weighted edits as one tool | We don't ship a write-path refactor tool at all |
+| **N3** | Method-resolution order (C3 / Ruby-mixin / first-wins) | Needed for accurate method-dispatch queries | Our method queries can misattribute overrides |
+| **N4** | Overload disambiguation (arity + type hash + const) | Node-ID format encodes it | We treat `foo(int)` and `foo(vector<int>)` as the same node |
+| **N5** | Browser-native UI with in-memory WASM graph store | transformers.js + LadybugDB WASM + Sigma.js | We require Postgres server — harder onboarding |
+| **N6** | Multi-repo `group_*` coordination (contract extraction across repos) | Registry + group-scoped Cypher | Our cross-project story is cognitive-profile-based, not graph-based |
+| **N7** | Zero-config CLI (`npx gitnexus analyze`) | One binary, no DB to provision | Our setup requires Postgres + pgvector |
+
+**Of these, only N1/N2/N3/N4/N5 are genuine capability gaps for a code-intelligence competitor. N6 is a different axis (we have it differently via `core/bridge_finder.py`); N7 is DX, addressable in a weekend.**
+
+## 4. Gaps — we can do that GitNexus can't
+
+The structural asymmetry.
+
+| # | Gap | Why only we have it | Impact |
+|---|---|---|---|
+| **C1** | Persistent memory across sessions | Full thermodynamic store (`core/thermodynamics.py`) + decay (`core/decay_cycle.py`) + reconsolidation (`core/reconsolidation.py`) — GitNexus is stateless-per-query code intelligence | The headline moat — see §6 |
+| **C2** | Paper-cited mechanisms | 100+ citations across `core/*.py`; GitNexus cites zero papers | Every challenge to our implementation has a paper retreat; every challenge to theirs has nothing |
+| **C3** | Reproducible benchmarks that beat published SOTA | LongMemEval 97.8% vs paper's 78.4%; BEAM 0.543 vs 0.329 | Concrete track record; GitNexus has none |
+| **C4** | Cognitive profile per domain (Felder-Silverman style) | `core/style_classifier.py` + `core/domain_detector.py` + behavioural persona vector | Tailors retrieval to the agent's actual reasoning pattern |
+| **C5** | Predictive-coding write gate (Friston 2010) | 4-signal novelty filter prevents contaminated memory | Their re-index-on-change model has no write gate — garbage accumulates |
+| **C6** | Security gates + PRD validator + Tarjan-SCC semantic diff (via AP) | AP `prd_validator.rs`, `security_gates.rs`, `semantic_diff.rs` | Structural-truth layer shields PRDs from symbol hallucination |
+| **C7** | Cascade consolidation (LABILE → EARLY_LTP → LATE_LTP → CONSOLIDATED) | `core/cascade.py` + `core/two_stage_model.py` (McClelland 1995) | Memories stabilize with replay; GitNexus has no notion of memory maturation |
+| **C8** | Homeostatic scaling + microglial pruning | `core/homeostatic_plasticity.py` + `core/microglial_pruning.py` | Self-regulating store; theirs has no feedback loop |
+| **C9** | MIT license | Commercial use allowed without payment | GitNexus PolyForm Noncommercial blocks the paying audience |
+
+---
+
+## 5. The science moat — why it is real (Popper)
+
+Paper citations prevent one specific failure mode: **silent constant drift under benchmark pressure**.
+
+When a benchmark goes from 94% → 97.8%, the temptation is to tune one more constant to get 98.3%. Without a paper anchor, the constant becomes corpus-fitted — a form of overfitting invisible until the next distribution shift (new corpus, new user, new language). With a paper anchor, moving the constant requires either (a) a new paper or (b) a public benchmark measurement — both leave an audit trail. GitNexus has no anchors. Their BM25+RRF fusion has no cited weights; their Leiden resolution parameter is unstated. They can tune freely, overfit invisibly, and collapse silently on the first independent evaluation.
+
+The moat is **provenance forces honesty**. It's the same moat peer-reviewed science has over blog-driven opinion.
+
+### 5.1 Our own unfalsifiable soft-spots (Popper's honesty move)
+
+| Soft spot | Current state | Falsifiable version |
+|---|---|---|
+| "Biologically inspired" | Suggestive framing, not testable | "Ablating mechanism X drops benchmark Y by ≥N%" — measurable |
+| "Thermodynamic memory" | Evocative, no thermodynamic law constrains our decay | "Memories heat<0.3 retrieve at R@10<20% after 30d" |
+| "Cognitive profiling works" | Uses Felder-Silverman which has weak independent validation | "Seeding profile reduces Claude tool-choice entropy by X%" |
+
+We need to ship **ablation evidence** before a critic runs it for us.
+
+---
+
+## 6. Five-move competitive plan
+
+Combining Popper's piecemeal bets + Taleb's barbell + Altshuller's inventive moves.
+
+### M1 — Benchmark-slap publication (2 weeks)
+
+**Action:** Run GitNexus on LongMemEval-S / LoCoMo / BEAM (the same benchmarks where we have numbers). Publish a reproducible one-page comparison: both systems, clean DB, same hardware, same questions. Include our ablation matrix (which mechanisms contribute how much) so the result isn't a black box.
+
+**Why this beats them:** they can't match it — no baseline, no harness, no published script. The public delta becomes permanent marketing.
+
+**Falsification:** if GitNexus ties or wins, our "benchmark moat" is a mirage and we pivot. Honest risk.
+
+**Science grounding:** Popper severity — a test we could lose, making victory informative.
+
+### M2 — Ablation + constant audit (4 weeks, in parallel with M1)
+
+**Action:** Two deliverables, both in `tasks/paper-implementation-audit.md`:
+1. **Ablation matrix** — for each of the 23 ablatable mechanisms (see `core/ablation.py`), measure the benchmark delta when it's turned off. Publish. Any mechanism with <1% delta is either non-load-bearing (kept but unclaimed) or wrong (removed).
+2. **Constant-justification audit** — every numeric constant in `core/thermodynamics.py`, `core/cascade.py`, `core/decay_cycle.py`, `core/homeostatic_plasticity.py` gets a `# source:` comment tracing to a paper equation, a benchmark measurement, or a measured constant. Zero unsourced numbers by end of month.
+
+**Why this beats them:** every mechanism we keep is justified; every mechanism we cut shrinks our surface area. GitNexus can never replicate the audit trail.
+
+**Science grounding:** Feynman-style "lean over backwards to report what might invalidate the result."
+
+### M3 — Code-corpus benchmark (4 weeks)
+
+**Action:** Our three benchmarks are conversational (LongMemEval, LoCoMo, BEAM). A critic could reasonably say "these don't test code retrieval — Cortex's real domain is chat, not code." Pre-empt: add SWE-bench-retrieval or a purpose-built code-memory corpus that tests "which file did we decide to change this behaviour in, 3 sessions ago?"-style questions. Pre-register the question set **before** either system runs.
+
+**Why this beats them:** code-memory is the exact intersection where our persistent-memory advantage (C1) meets their strength (structural intelligence). If we win, we win on their home turf. If we lose, we know exactly which mechanism failed.
+
+**Science grounding:** Popper pre-registration discipline; Fisher-style pre-specified hypothesis.
+
+### M4 — Absorb the five unused resources (8 weeks)
+
+Altshuller identified five signals available but not used in any code-intelligence system. We absorb them first:
+
+| Resource | Wire into | Effect |
+|---|---|---|
+| **Git blame age** | `core/decay_cycle.py` as initial-heat prior | Code touched yesterday starts hot; 3-year-old code starts cold. Zero-measurement freshness signal. |
+| **Test coverage %** | Edge weight in call graph (via `handlers/consolidation/plasticity.py`) | Uncovered calls get decayed faster — risk-weighted graph. |
+| **PR-review comment sentiment** | `core/emotional_tagging.py` valence | NACK comments reduce symbol priority; LGTM comments raise it. |
+| **LSP diagnostics stream** | `ap_bridge.py` + `core/ap_impact_to_surprise.py` (new) | Continuous validation gate; diagnostic errors raise surprise → stronger encoding of the edit. |
+| **Commit-message verbs** | Typed edges in `core/knowledge_graph.py` | "fix" / "refactor" / "deprecate" / "add" become relationship labels — causal recall becomes free. |
+
+**Why this beats them:** each resource is an unused asymmetric advantage. Once we ingest git blame and test coverage, GitNexus's pure AST graph looks flat. Retrofitting heat into their architecture is a rewrite, not a feature.
+
+**Science grounding:** every ingest has a citable justification (Snow-style outbreak tracing via git blame; Turrigiano-style heat as the integrative variable).
+
+### M5 — Privacy parity via Cortex-Edge (12 weeks)
+
+The one real gap where GitNexus has us beat: **browser-native, zero-server privacy**. We need parity without abandoning science.
+
+**Action:** Cortex-Edge — a subset of Cortex core compiled to WASM-compatible Python or transpiled. Swap PostgreSQL for DuckDB+VSS or sqlite-vec; embeddings via ONNX in-process; retain `core/thermodynamics.py`, `core/decay_cycle.py`, `core/write_gate.py`, `core/query_intent.py`, `core/fractal.py`, `core/hopfield.py`. Drop advanced mechanisms that require server state (homeostatic scaling, microglial pruning, cascade) — degrade gracefully and say so explicitly.
+
+**Why this beats them:** they ship "browser-only" but actually require Node.js server on port 4747. We ship true zero-server Cortex-Edge + a server-enhanced Cortex-Pro. Two tiers, clear story, their differentiator evaporated.
+
+**Science grounding:** TRIZ #1 (Segmentation) + #27 (Cheap disposable) — same core algorithms, segmented deployment.
+
+---
+
+## 7. Reframing and subtractive moves (Taleb via negativa)
+
+Alongside the 5 positive moves, 4 subtractive moves we ship immediately:
+
+1. **Stop marketing "biological fidelity."** Keep the 108 neuro-modules — they earn their keep on benchmarks. But reframe to "thermodynamic memory model" in user-facing copy. Eliminates the "cargo-cult neuroscience" Black Swan without losing the code.
+2. **Delete dead code** — enforce CLAUDE.md's "if it's built, it must be called" rule. Run `vulture` across `core/`; cut anything unwired. Every dead module is fragility surface area with zero benchmark weight.
+3. **Unbind FlashRank** — `core/reranker.py` becomes a port, not a binding. DIP-correct, one swap away from any future reranker. Removes the Black Swan of vendor disappearance.
+4. **Demote 42 of 47 MCP tools** — the five that drive 80% of value (`remember`, `recall`, `query_methodology`, `anchor`, `run_pipeline`) are the public surface; the rest become "advanced." Fewer tools = less to learn = less to document = less to break.
+
+---
+
+## 8. Execution order + gates
+
+| Week | Deliverable | Gate | Risk if it fails |
+|---|---|---|---|
+| 1-2 | M1: GitNexus benchmark run | Numbers published, reproducible script | "Benchmark moat" collapses, pivot to M4 |
+| 3-6 | M2: ablation + constant audit | `tasks/paper-implementation-audit.md` at 100% constant coverage | Mechanisms <1% delta cut; 2500-test baseline holds |
+| 3-6 | Subtractive moves (1-4) | Dead code gone, reranker port shipped, UI reframed | — |
+| 5-8 | M3: code-corpus benchmark | Pre-registered question set public; both systems run; numbers published | Reveals a real weakness, informs the next iteration |
+| 7-14 | M4: absorb 5 unused resources | Git-blame + coverage + PR-sentiment + LSP + commit-verbs all wired into consolidation + retrieval | Measures improve or don't; either way, honest |
+| 10-22 | M5: Cortex-Edge WASM | Shippable binary; zero-server deployment works | Privacy parity reached; differentiator collapsed |
+
+---
+
+## 9. Summary
+
+**We do not compete with GitNexus on structural intelligence** — their
+14-language tree-sitter surface is wider than AP's 3-language surface.
+We compete on **a dimension they structurally cannot enter**:
+persistent memory with thermodynamic decay, paper-cited mechanisms,
+benchmark-proven retrieval, cognitive profiling per domain, a write
+gate that prevents contamination, and a consolidation loop that
+matures memories over time.
+
+Their architecture cannot absorb these in <12 months. Our architecture
+can absorb their structural advantages (M4 resources + M5 edge mode)
+in <3 months. Time works for us.
+
+**The one thing we must do today:** run the benchmarks (M1). Every
+month we don't publish numbers is a month they have to narrow the gap
+on rhetoric alone. Once our numbers are published and reproducible,
+the science moat becomes permanent — and peer-reviewed benchmarks
+cannot be out-marketed.
+
+---
+
+## 10. Sources
+
+- [github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus) — repo
+- [GitNexus ARCHITECTURE.md](https://github.com/abhigyanpatwari/GitNexus/blob/main/ARCHITECTURE.md) — 12-phase DAG, 44 nodes, 21 edges
+- [GitNexus CLAUDE.md](https://github.com/abhigyanpatwari/GitNexus/blob/main/CLAUDE.md) — Claude Code integration
+- [Pebblous review](https://blog.pebblous.ai/blog/gitnexus-code-knowledge-graph-2026/en/) — acknowledged limitations, no benchmarks
+- ADR-0046 `docs/adr/ADR-0046-automatised-pipeline-integration.md`
+- v2 gap analysis `docs/program/v3.14-gap-analysis-v2-corrected.md`