|
| 1 | +# GitNexus — Competitive Analysis and Science-Grounded Plan |
| 2 | + |
| 3 | +**Date:** 2026-04-24 |
| 4 | +**Scope:** compare GitNexus (closest competitor) against Cortex + AP |
| 5 | +(automatised-pipeline); identify gaps both ways; produce a 5-move |
| 6 | +science-backed competitive plan. |
| 7 | +**Method:** web-fetch of GitNexus repo, architecture, Claude.md, and |
| 8 | +the pebblous.ai review post, synthesised through three geniuses — |
| 9 | +Popper (falsifiability), Taleb (fragility), Altshuller (TRIZ |
| 10 | +contradiction resolution). All three converged tightly; findings below |
| 11 | +are the consensus. |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## 1. GitNexus — what it actually is |
| 16 | + |
| 17 | +[github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus) |
| 18 | +(PolyForm Noncommercial; commercial use via akonlabs.com SaaS). |
| 19 | + |
| 20 | +### 1.1 Capability surface (extracted from README + ARCHITECTURE.md) |
| 21 | + |
| 22 | +| Dimension | GitNexus | |
| 23 | +|---|---| |
| 24 | +| Tech stack | Node.js + TypeScript + React 18 + Vite + Tailwind v4; Sigma.js + Graphology for WebGL viz | |
| 25 | +| Parsers | Tree-sitter grammars for **14 languages** (TS/JS/Py/Java/C#/Go/Rust/PHP/Swift/Kotlin/Ruby/C/C++/Dart) | |
| 26 | +| Storage | LadybugDB (Kuzu-clone) native + WASM builds; CSV-streamed graph load; per-node-type FTS tables | |
| 27 | +| Embeddings | Snowflake arctic-embed-xs (384D) via transformers.js; skipped if >50 k nodes | |
| 28 | +| Search | BM25 + TF-IDF semantic + RRF K=60 | |
| 29 | +| Clustering | Leiden (community detection, no paper cited) | |
| 30 | +| Indexing pipeline | **12-phase DAG**: scan → structure → [markdown/cobol] → parse → [routes/tools/orm] → crossFile → mro → communities → processes | |
| 31 | +| Node kinds | **44** (File, Folder, Function, Class, Interface, Method, Constructor, Struct, Enum, Macro, Route, Tool, Community, Process, Module, etc.) | |
| 32 | +| Edge kinds | **21** (CONTAINS, DEFINES, CALLS, STEP_IN_PROCESS, IMPORTS, EXTENDS, IMPLEMENTS, HAS_METHOD, METHOD_OVERRIDES, METHOD_IMPLEMENTS, ACCESSES, USES, FETCHES, HANDLES_ROUTE, HANDLES_TOOL, ENTRY_POINT_OF, MEMBER_OF, …) — each carries `confidence` + `reason` | |
| 33 | +| Overload disambiguation | arity suffix `#N`, type hash `~<sig>`, const marker `$const` (C++) | |
| 34 | +| MCP tools | **16** — `list_repos`, `query`, `context`, `impact`, `detect_changes`, `rename`, `cypher`, `group_list`, `group_sync`, `group_contracts`, `group_query`, `group_status`, plus 2 prompts and 4 skills | |
| 35 | +| Unique features | multi-file `rename`, raw `cypher` queries, multi-repo `group_*` coordination, LLM-generated wiki with mermaid, C3/Ruby-mixin/first-wins MRO | |
| 36 | +| Deployment | Docker images (`ghcr.io/abhigyanpatwari/gitnexus:latest`), cosign-signed, browser mode ≤5k files | |
| 37 | + |
| 38 | +### 1.2 What their own docs admit |
| 39 | + |
| 40 | +- **No benchmarks published.** The pebblous.ai review states explicitly "no performance benchmarks or comparative metrics; no latency comparisons, accuracy measurements, or head-to-head evaluations." |
| 41 | +- **No paper citations.** ARCHITECTURE.md documents proprietary algorithms (DAG runner, scope-resolution pipeline, cross-impact walk) without academic references. Leiden, C3, RRF are *named* but not cited to specific papers. |
| 42 | +- **Tested it on their own HTML/CSS blog repo and said it was "not very useful"** — language/content scope is a real constraint. |
| 43 | +- **"Browser-native" is partial**: requires Node.js + local server on port 4747 even in web mode. |
| 44 | +- **No contributor pool visible** — single-author project. |
| 45 | + |
| 46 | +--- |
| 47 | + |
| 48 | +## 2. The Cortex+AP stack in the same table |
| 49 | + |
| 50 | +| Dimension | Cortex | AP | |
| 51 | +|---|---|---| |
| 52 | +| Tech stack | Python 3.10 + FastMCP + pydantic + numpy | Rust 1.94 + lbug + tantivy + tree-sitter | |
| 53 | +| Parsers | tree-sitter Python/JS/TS/Go/Swift/Rust (file-level) | tree-sitter Rust/Python/TS (symbol-level, 5-layer resolution w/ LSP) | |
| 54 | +| Storage | PostgreSQL 15 + pgvector + pg_trgm | LadybugDB (same as GitNexus) | |
| 55 | +| Embeddings | sentence-transformers 384-dim + FlashRank ONNX cross-encoder rerank | TF-IDF only (per search/mod.rs) | |
| 56 | +| Search | PL/pgSQL WRRF over vector + FTS + trigram + heat + recency, client-side rerank | Tantivy BM25 + TF-IDF + RRF | |
| 57 | +| Clustering | Per-domain cognitive profile + cross-domain bridges | Louvain + Traag C2 repair (Blondel 2008, Traag 2019 — cited) | |
| 58 | +| Scale | 108 core modules, 47 MCP tools, 2500+ tests | 12 046 LOC, 23 MCP tools, 220 tests | |
| 59 | +| Scientific grounding | Every mechanism cites papers: cascade (Kandel 2001), homeostatic (Turrigiano 2008), neuromodulation (Doya 2002), synaptic tagging (Frey & Morris 1997), microglial pruning (Wang 2020), predictive coding (Friston 2010), … | Every stage cites papers: Louvain (Blondel 2008), Traag (2019), RRF K=60 (Cormack et al 2009), Tarjan SCC, tree-sitter … | |
| 60 | +| Benchmarks | **LongMemEval R@10 97.8%** (paper SOTA 78.4%); **LoCoMo 92.6%**; **BEAM 0.543** (paper SOTA 0.329) — all on clean DB, reproducible | 220 unit tests; no external benchmark yet | |
| 61 | +| Unique features | persistent cross-session memory, thermodynamic decay, cascade consolidation, neuromodulation, synaptic tagging, cognitive profile per domain, predictive-coding write gate, hippocampal replay | PRD validator (symbol hallucination check), security gates (auth-critical/unsafe/public API), Tarjan-SCC semantic diff, 5-layer resolver with LSP, macro expansion, stdlib indexing | |
| 62 | +| License | MIT | MIT-equivalent | |
| 63 | + |
| 64 | +--- |
| 65 | + |
| 66 | +## 3. Gaps — GitNexus can do that we can't |
| 67 | + |
| 68 | +Honest enumeration. These are genuine capability gaps. |
| 69 | + |
| 70 | +| # | Gap | Why GitNexus has it | Impact | |
| 71 | +|---|---|---|---| |
| 72 | +| **N1** | 14 languages at symbol-level (we have 3 via AP + 6 file-level via Cortex core) | Tree-sitter grammars are commoditized; they ported more | Polyglot codebases retrieve poorly for us | |
| 73 | +| **N2** | Multi-file coordinated rename (`rename` tool) | Graph + text-search + confidence-weighted edits as one tool | We don't ship a write-path refactor tool at all | |
| 74 | +| **N3** | Method-resolution order (C3 / Ruby-mixin / first-wins) | Needed for accurate method-dispatch queries | Our method queries can misattribute overrides | |
| 75 | +| **N4** | Overload disambiguation (arity + type hash + const) | Node-ID format encodes it | We treat `foo(int)` and `foo(vector<int>)` as the same node | |
| 76 | +| **N5** | Browser-native UI with in-memory WASM graph store | transformers.js + LadybugDB WASM + Sigma.js | We require Postgres server — harder onboarding | |
| 77 | +| **N6** | Multi-repo `group_*` coordination (contract extraction across repos) | Registry + group-scoped Cypher | Our cross-project story is cognitive-profile-based, not graph-based | |
| 78 | +| **N7** | Zero-config CLI (`npx gitnexus analyze`) | One binary, no DB to provision | Our setup requires Postgres + pgvector | |
| 79 | + |
| 80 | +**Of these, only N1/N2/N3/N4/N5 are genuine capability gaps for a code-intelligence competitor. N6 is a different axis (we have it differently via `core/bridge_finder.py`); N7 is DX, addressable in a weekend.** |
| 81 | + |
| 82 | +## 4. Gaps — we can do that GitNexus can't |
| 83 | + |
| 84 | +The structural asymmetry. |
| 85 | + |
| 86 | +| # | Gap | Why only we have it | Impact | |
| 87 | +|---|---|---|---| |
| 88 | +| **C1** | Persistent memory across sessions | Full thermodynamic store (`core/thermodynamics.py`) + decay (`core/decay_cycle.py`) + reconsolidation (`core/reconsolidation.py`) — GitNexus is stateless-per-query code intelligence | The headline moat — see §6 | |
| 89 | +| **C2** | Paper-cited mechanisms | 100+ citations across `core/*.py`; GitNexus cites zero papers | Every challenge to our implementation has a paper retreat; every challenge to theirs has nothing | |
| 90 | +| **C3** | Reproducible benchmarks that beat published SOTA | LongMemEval 97.8% vs paper's 78.4%; BEAM 0.543 vs 0.329 | Concrete track record; GitNexus has none | |
| 91 | +| **C4** | Cognitive profile per domain (Felder-Silverman style) | `core/style_classifier.py` + `core/domain_detector.py` + behavioural persona vector | Tailors retrieval to the agent's actual reasoning pattern | |
| 92 | +| **C5** | Predictive-coding write gate (Friston 2010) | 4-signal novelty filter prevents contaminated memory | Their re-index-on-change model has no write gate — garbage accumulates | |
| 93 | +| **C6** | Security gates + PRD validator + Tarjan-SCC semantic diff (via AP) | AP `prd_validator.rs`, `security_gates.rs`, `semantic_diff.rs` | Structural-truth layer shields PRDs from symbol hallucination | |
| 94 | +| **C7** | Cascade consolidation (LABILE → EARLY_LTP → LATE_LTP → CONSOLIDATED) | `core/cascade.py` + `core/two_stage_model.py` (McClelland 1995) | Memories stabilize with replay; GitNexus has no notion of memory maturation | |
| 95 | +| **C8** | Homeostatic scaling + microglial pruning | `core/homeostatic_plasticity.py` + `core/microglial_pruning.py` | Self-regulating store; theirs has no feedback loop | |
| 96 | +| **C9** | MIT license | Commercial use allowed without payment | GitNexus PolyForm Noncommercial blocks the paying audience | |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +## 5. The science moat — why it is real (Popper) |
| 101 | + |
| 102 | +Paper citations prevent one specific failure mode: **silent constant drift under benchmark pressure**. |
| 103 | + |
| 104 | +When a benchmark goes from 94% → 97.8%, the temptation is to tune one more constant to get 98.3%. Without a paper anchor, the constant becomes corpus-fitted — a form of overfitting invisible until the next distribution shift (new corpus, new user, new language). With a paper anchor, moving the constant requires either (a) a new paper or (b) a public benchmark measurement — both leave an audit trail. GitNexus has no anchors. Their BM25+RRF fusion has no cited weights; their Leiden resolution parameter is unstated. They can tune freely, overfit invisibly, and collapse silently on the first independent evaluation. |
| 105 | + |
| 106 | +The moat is **provenance forces honesty**. It's the same moat peer-reviewed science has over blog-driven opinion. |
| 107 | + |
| 108 | +### 5.1 Our own unfalsifiable soft-spots (Popper's honesty move) |
| 109 | + |
| 110 | +| Soft spot | Current state | Falsifiable version | |
| 111 | +|---|---|---| |
| 112 | +| "Biologically inspired" | Suggestive framing, not testable | "Ablating mechanism X drops benchmark Y by ≥N%" — measurable | |
| 113 | +| "Thermodynamic memory" | Evocative, no thermodynamic law constrains our decay | "Memories heat<0.3 retrieve at R@10<20% after 30d" | |
| 114 | +| "Cognitive profiling works" | Uses Felder-Silverman which has weak independent validation | "Seeding profile reduces Claude tool-choice entropy by X%" | |
| 115 | + |
| 116 | +We need to ship **ablation evidence** before a critic runs it for us. |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +## 6. Five-move competitive plan |
| 121 | + |
| 122 | +Combining Popper's piecemeal bets + Taleb's barbell + Altshuller's inventive moves. |
| 123 | + |
| 124 | +### M1 — Benchmark-slap publication (2 weeks) |
| 125 | + |
| 126 | +**Action:** Run GitNexus on LongMemEval-S / LoCoMo / BEAM (the same benchmarks where we have numbers). Publish a reproducible one-page comparison: both systems, clean DB, same hardware, same questions. Include our ablation matrix (which mechanisms contribute how much) so the result isn't a black box. |
| 127 | + |
| 128 | +**Why this beats them:** they can't match it — no baseline, no harness, no published script. The public delta becomes permanent marketing. |
| 129 | + |
| 130 | +**Falsification:** if GitNexus ties or wins, our "benchmark moat" is a mirage and we pivot. Honest risk. |
| 131 | + |
| 132 | +**Science grounding:** Popper severity — a test we could lose, making victory informative. |
| 133 | + |
| 134 | +### M2 — Ablation + constant audit (4 weeks, in parallel with M1) |
| 135 | + |
| 136 | +**Action:** Two deliverables, both in `tasks/paper-implementation-audit.md`: |
| 137 | +1. **Ablation matrix** — for each of the 23 ablatable mechanisms (see `core/ablation.py`), measure the benchmark delta when it's turned off. Publish. Any mechanism with <1% delta is either non-load-bearing (kept but unclaimed) or wrong (removed). |
| 138 | +2. **Constant-justification audit** — every numeric constant in `core/thermodynamics.py`, `core/cascade.py`, `core/decay_cycle.py`, `core/homeostatic_plasticity.py` gets a `# source:` comment tracing to a paper equation, a benchmark measurement, or a measured constant. Zero unsourced numbers by end of month. |
| 139 | + |
| 140 | +**Why this beats them:** every mechanism we keep is justified; every mechanism we cut shrinks our surface area. GitNexus can never replicate the audit trail. |
| 141 | + |
| 142 | +**Science grounding:** Feynman-style "lean over backwards to report what might invalidate the result." |
| 143 | + |
| 144 | +### M3 — Code-corpus benchmark (4 weeks) |
| 145 | + |
| 146 | +**Action:** Our three benchmarks are conversational (LongMemEval, LoCoMo, BEAM). A critic could reasonably say "these don't test code retrieval — Cortex's real domain is chat, not code." Pre-empt: add SWE-bench-retrieval or a purpose-built code-memory corpus that tests "which file did we decide to change this behaviour in, 3 sessions ago?"-style questions. Pre-register the question set **before** either system runs. |
| 147 | + |
| 148 | +**Why this beats them:** code-memory is the exact intersection where our persistent-memory advantage (C1) meets their strength (structural intelligence). If we win, we win on their home turf. If we lose, we know exactly which mechanism failed. |
| 149 | + |
| 150 | +**Science grounding:** Popper pre-registration discipline; Fisher-style pre-specified hypothesis. |
| 151 | + |
| 152 | +### M4 — Absorb the five unused resources (8 weeks) |
| 153 | + |
| 154 | +Altshuller identified five signals available but not used in any code-intelligence system. We absorb them first: |
| 155 | + |
| 156 | +| Resource | Wire into | Effect | |
| 157 | +|---|---|---| |
| 158 | +| **Git blame age** | `core/decay_cycle.py` as initial-heat prior | Code touched yesterday starts hot; 3-year-old code starts cold. Zero-measurement freshness signal. | |
| 159 | +| **Test coverage %** | Edge weight in call graph (via `handlers/consolidation/plasticity.py`) | Uncovered calls get decayed faster — risk-weighted graph. | |
| 160 | +| **PR-review comment sentiment** | `core/emotional_tagging.py` valence | NACK comments reduce symbol priority; LGTM comments raise it. | |
| 161 | +| **LSP diagnostics stream** | `ap_bridge.py` + `core/ap_impact_to_surprise.py` (new) | Continuous validation gate; diagnostic errors raise surprise → stronger encoding of the edit. | |
| 162 | +| **Commit-message verbs** | Typed edges in `core/knowledge_graph.py` | "fix" / "refactor" / "deprecate" / "add" become relationship labels — causal recall becomes free. | |
| 163 | + |
| 164 | +**Why this beats them:** each resource is an unused asymmetric advantage. Once we ingest git blame and test coverage, GitNexus's pure AST graph looks flat. Retrofitting heat into their architecture is a rewrite, not a feature. |
| 165 | + |
| 166 | +**Science grounding:** every ingest has a citable justification (Snow-style outbreak tracing via git blame; Turrigiano-style heat as the integrative variable). |
| 167 | + |
| 168 | +### M5 — Privacy parity via Cortex-Edge (12 weeks) |
| 169 | + |
| 170 | +The one real gap where GitNexus has us beat: **browser-native, zero-server privacy**. We need parity without abandoning science. |
| 171 | + |
| 172 | +**Action:** Cortex-Edge — a subset of Cortex core compiled to WASM-compatible Python or transpiled. Swap PostgreSQL for DuckDB+VSS or sqlite-vec; embeddings via ONNX in-process; retain `core/thermodynamics.py`, `core/decay_cycle.py`, `core/write_gate.py`, `core/query_intent.py`, `core/fractal.py`, `core/hopfield.py`. Drop advanced mechanisms that require server state (homeostatic scaling, microglial pruning, cascade) — degrade gracefully and say so explicitly. |
| 173 | + |
| 174 | +**Why this beats them:** they ship "browser-only" but actually require Node.js server on port 4747. We ship true zero-server Cortex-Edge + a server-enhanced Cortex-Pro. Two tiers, clear story, their differentiator evaporated. |
| 175 | + |
| 176 | +**Science grounding:** TRIZ #1 (Segmentation) + #27 (Cheap disposable) — same core algorithms, segmented deployment. |
| 177 | + |
| 178 | +--- |
| 179 | + |
| 180 | +## 7. Reframing and subtractive moves (Taleb via negativa) |
| 181 | + |
| 182 | +Alongside the 5 positive moves, 4 subtractive moves we ship immediately: |
| 183 | + |
| 184 | +1. **Stop marketing "biological fidelity."** Keep the 108 neuro-modules — they earn their keep on benchmarks. But reframe to "thermodynamic memory model" in user-facing copy. Eliminates the "cargo-cult neuroscience" Black Swan without losing the code. |
| 185 | +2. **Delete dead code** — enforce CLAUDE.md's "if it's built, it must be called" rule. Run `vulture` across `core/`; cut anything unwired. Every dead module is fragility surface area with zero benchmark weight. |
| 186 | +3. **Unbind FlashRank** — `core/reranker.py` becomes a port, not a binding. DIP-correct, one swap away from any future reranker. Removes the Black Swan of vendor disappearance. |
| 187 | +4. **Demote 42 of 47 MCP tools** — the five that drive 80% of value (`remember`, `recall`, `query_methodology`, `anchor`, `run_pipeline`) are the public surface; the rest become "advanced." Fewer tools = less to learn = less to document = less to break. |
| 188 | + |
| 189 | +--- |
| 190 | + |
| 191 | +## 8. Execution order + gates |
| 192 | + |
| 193 | +| Week | Deliverable | Gate | Risk if it fails | |
| 194 | +|---|---|---|---| |
| 195 | +| 1-2 | M1: GitNexus benchmark run | Numbers published, reproducible script | "Benchmark moat" collapses, pivot to M4 | |
| 196 | +| 3-6 | M2: ablation + constant audit | `tasks/paper-implementation-audit.md` at 100% constant coverage | Mechanisms <1% delta cut; 2500-test baseline holds | |
| 197 | +| 3-6 | Subtractive moves (1-4) | Dead code gone, reranker port shipped, UI reframed | — | |
| 198 | +| 5-8 | M3: code-corpus benchmark | Pre-registered question set public; both systems run; numbers published | Reveals a real weakness, informs the next iteration | |
| 199 | +| 7-14 | M4: absorb 5 unused resources | Git-blame + coverage + PR-sentiment + LSP + commit-verbs all wired into consolidation + retrieval | Measures improve or don't; either way, honest | |
| 200 | +| 10-22 | M5: Cortex-Edge WASM | Shippable binary; zero-server deployment works | Privacy parity reached; differentiator collapsed | |
| 201 | + |
| 202 | +--- |
| 203 | + |
| 204 | +## 9. Summary |
| 205 | + |
| 206 | +**We do not compete with GitNexus on structural intelligence** — their |
| 207 | +14-language tree-sitter surface is wider than AP's 3-language surface. |
| 208 | +We compete on **a dimension they structurally cannot enter**: |
| 209 | +persistent memory with thermodynamic decay, paper-cited mechanisms, |
| 210 | +benchmark-proven retrieval, cognitive profiling per domain, a write |
| 211 | +gate that prevents contamination, and a consolidation loop that |
| 212 | +matures memories over time. |
| 213 | + |
| 214 | +Their architecture cannot absorb these in <12 months. Our architecture |
| 215 | +can absorb their structural advantages (M4 resources + M5 edge mode) |
| 216 | +in <3 months. Time works for us. |
| 217 | + |
| 218 | +**The one thing we must do today:** run the benchmarks (M1). Every |
| 219 | +month we don't publish numbers is a month they have to narrow the gap |
| 220 | +on rhetoric alone. Once our numbers are published and reproducible, |
| 221 | +the science moat becomes permanent — and peer-reviewed benchmarks |
| 222 | +cannot be out-marketed. |
| 223 | + |
| 224 | +--- |
| 225 | + |
| 226 | +## 10. Sources |
| 227 | + |
| 228 | +- [github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus) — repo |
| 229 | +- [GitNexus ARCHITECTURE.md](https://github.com/abhigyanpatwari/GitNexus/blob/main/ARCHITECTURE.md) — 12-phase DAG, 44 nodes, 21 edges |
| 230 | +- [GitNexus CLAUDE.md](https://github.com/abhigyanpatwari/GitNexus/blob/main/CLAUDE.md) — Claude Code integration |
| 231 | +- [Pebblous review](https://blog.pebblous.ai/blog/gitnexus-code-knowledge-graph-2026/en/) — acknowledged limitations, no benchmarks |
| 232 | +- ADR-0046 `docs/adr/ADR-0046-automatised-pipeline-integration.md` |
| 233 | +- v2 gap analysis `docs/program/v3.14-gap-analysis-v2-corrected.md` |
0 commit comments