Skip to content

Commit 7c061c9

Browse files
committed
docs: GitNexus competitive analysis + 5-move science-grounded plan
Deep competitive analysis of github.com/abhigyanpatwari/GitNexus against Cortex + automatised-pipeline. Synthesis of three genius passes: - Popper (falsifiability): "Graph RAG > vector RAG for structural questions" is unfalsifiable as stated — no published benchmark, no pre-registered question set, scope fitted to evidence after the fact. Our moat is provenance: paper citations prevent silent constant drift. - Taleb (fragility): GitNexus is fragile on evidence (no papers, no benchmarks, single-author, PolyForm Noncommercial blocks enterprise). Cortex is antifragile on evidence but fragile on Postgres dependency, FlashRank coupling, and "biological fidelity" framing attack surface. - Altshuller (TRIZ): the contradiction GitNexus cannot frame is "a fact can be simultaneously true-structurally and false- practically." Their graph has no variable for aliveness; Cortex has heat. Genuine gaps both ways. They have: 14-language coverage, multi-file rename, MRO, overload disambiguation with arity/type- hash, browser-native UI, zero-config CLI. We have: persistent cross-session memory, paper-cited mechanisms, benchmarks beating published SOTA (97.8% / 92.6% / 0.543), cognitive profiling, predictive-coding write gate, cascade consolidation, homeostatic scaling, microglial pruning, security gates + PRD validator + SCC semantic diff via AP, MIT license. 5-move competitive plan (in execution order): 1. M1 — Benchmark slap: run GitNexus on LongMemEval/LoCoMo/BEAM, publish reproducible delta. 2 weeks. 2. M2 — Ablation matrix + constant audit: every mechanism earns its keep on a benchmark; every constant has # source. 4 weeks. 3. M3 — Code-corpus benchmark: pre-registered questions, both systems run, publish. Tests the exact cross-session-code-memory intersection. 4 weeks. 4. M4 — Absorb 5 unused resources: git blame as heat prior, test coverage as edge weight, PR sentiment as valence, LSP as surprise signal, commit verbs as typed edges. 8 weeks. 5. M5 — Cortex-Edge WASM: privacy parity without abandoning science; DuckDB+VSS or sqlite-vec; retain thermodynamic core, degrade advanced mechanisms gracefully. 12 weeks. Plus 4 subtractive moves from Taleb (via negativa): - Stop "biological fidelity" marketing (keep the code, reframe) - Delete dead code enforcement - Unbind FlashRank into a port - Demote 42 of 47 MCP tools to "advanced" Everything traces to a citable justification. Nothing ships without a measurement. The moat is that peer-reviewed benchmarks cannot be out-marketed.
1 parent 66ee97b commit 7c061c9

1 file changed

Lines changed: 233 additions & 0 deletions

File tree

Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
# GitNexus — Competitive Analysis and Science-Grounded Plan
2+
3+
**Date:** 2026-04-24
4+
**Scope:** compare GitNexus (closest competitor) against Cortex + AP
5+
(automatised-pipeline); identify gaps both ways; produce a 5-move
6+
science-backed competitive plan.
7+
**Method:** web-fetch of GitNexus repo, architecture, Claude.md, and
8+
the pebblous.ai review post, synthesised through three geniuses —
9+
Popper (falsifiability), Taleb (fragility), Altshuller (TRIZ
10+
contradiction resolution). All three converged tightly; findings below
11+
are the consensus.
12+
13+
---
14+
15+
## 1. GitNexus — what it actually is
16+
17+
[github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus)
18+
(PolyForm Noncommercial; commercial use via akonlabs.com SaaS).
19+
20+
### 1.1 Capability surface (extracted from README + ARCHITECTURE.md)
21+
22+
| Dimension | GitNexus |
23+
|---|---|
24+
| Tech stack | Node.js + TypeScript + React 18 + Vite + Tailwind v4; Sigma.js + Graphology for WebGL viz |
25+
| Parsers | Tree-sitter grammars for **14 languages** (TS/JS/Py/Java/C#/Go/Rust/PHP/Swift/Kotlin/Ruby/C/C++/Dart) |
26+
| Storage | LadybugDB (Kuzu-clone) native + WASM builds; CSV-streamed graph load; per-node-type FTS tables |
27+
| Embeddings | Snowflake arctic-embed-xs (384D) via transformers.js; skipped if >50 k nodes |
28+
| Search | BM25 + TF-IDF semantic + RRF K=60 |
29+
| Clustering | Leiden (community detection, no paper cited) |
30+
| Indexing pipeline | **12-phase DAG**: scan → structure → [markdown/cobol] → parse → [routes/tools/orm] → crossFile → mro → communities → processes |
31+
| Node kinds | **44** (File, Folder, Function, Class, Interface, Method, Constructor, Struct, Enum, Macro, Route, Tool, Community, Process, Module, etc.) |
32+
| Edge kinds | **21** (CONTAINS, DEFINES, CALLS, STEP_IN_PROCESS, IMPORTS, EXTENDS, IMPLEMENTS, HAS_METHOD, METHOD_OVERRIDES, METHOD_IMPLEMENTS, ACCESSES, USES, FETCHES, HANDLES_ROUTE, HANDLES_TOOL, ENTRY_POINT_OF, MEMBER_OF, …) — each carries `confidence` + `reason` |
33+
| Overload disambiguation | arity suffix `#N`, type hash `~<sig>`, const marker `$const` (C++) |
34+
| MCP tools | **16**`list_repos`, `query`, `context`, `impact`, `detect_changes`, `rename`, `cypher`, `group_list`, `group_sync`, `group_contracts`, `group_query`, `group_status`, plus 2 prompts and 4 skills |
35+
| Unique features | multi-file `rename`, raw `cypher` queries, multi-repo `group_*` coordination, LLM-generated wiki with mermaid, C3/Ruby-mixin/first-wins MRO |
36+
| Deployment | Docker images (`ghcr.io/abhigyanpatwari/gitnexus:latest`), cosign-signed, browser mode ≤5k files |
37+
38+
### 1.2 What their own docs admit
39+
40+
- **No benchmarks published.** The pebblous.ai review states explicitly "no performance benchmarks or comparative metrics; no latency comparisons, accuracy measurements, or head-to-head evaluations."
41+
- **No paper citations.** ARCHITECTURE.md documents proprietary algorithms (DAG runner, scope-resolution pipeline, cross-impact walk) without academic references. Leiden, C3, RRF are *named* but not cited to specific papers.
42+
- **Tested it on their own HTML/CSS blog repo and said it was "not very useful"** — language/content scope is a real constraint.
43+
- **"Browser-native" is partial**: requires Node.js + local server on port 4747 even in web mode.
44+
- **No contributor pool visible** — single-author project.
45+
46+
---
47+
48+
## 2. The Cortex+AP stack in the same table
49+
50+
| Dimension | Cortex | AP |
51+
|---|---|---|
52+
| Tech stack | Python 3.10 + FastMCP + pydantic + numpy | Rust 1.94 + lbug + tantivy + tree-sitter |
53+
| Parsers | tree-sitter Python/JS/TS/Go/Swift/Rust (file-level) | tree-sitter Rust/Python/TS (symbol-level, 5-layer resolution w/ LSP) |
54+
| Storage | PostgreSQL 15 + pgvector + pg_trgm | LadybugDB (same as GitNexus) |
55+
| Embeddings | sentence-transformers 384-dim + FlashRank ONNX cross-encoder rerank | TF-IDF only (per search/mod.rs) |
56+
| Search | PL/pgSQL WRRF over vector + FTS + trigram + heat + recency, client-side rerank | Tantivy BM25 + TF-IDF + RRF |
57+
| Clustering | Per-domain cognitive profile + cross-domain bridges | Louvain + Traag C2 repair (Blondel 2008, Traag 2019 — cited) |
58+
| Scale | 108 core modules, 47 MCP tools, 2500+ tests | 12 046 LOC, 23 MCP tools, 220 tests |
59+
| Scientific grounding | Every mechanism cites papers: cascade (Kandel 2001), homeostatic (Turrigiano 2008), neuromodulation (Doya 2002), synaptic tagging (Frey & Morris 1997), microglial pruning (Wang 2020), predictive coding (Friston 2010), … | Every stage cites papers: Louvain (Blondel 2008), Traag (2019), RRF K=60 (Cormack et al 2009), Tarjan SCC, tree-sitter … |
60+
| Benchmarks | **LongMemEval R@10 97.8%** (paper SOTA 78.4%); **LoCoMo 92.6%**; **BEAM 0.543** (paper SOTA 0.329) — all on clean DB, reproducible | 220 unit tests; no external benchmark yet |
61+
| Unique features | persistent cross-session memory, thermodynamic decay, cascade consolidation, neuromodulation, synaptic tagging, cognitive profile per domain, predictive-coding write gate, hippocampal replay | PRD validator (symbol hallucination check), security gates (auth-critical/unsafe/public API), Tarjan-SCC semantic diff, 5-layer resolver with LSP, macro expansion, stdlib indexing |
62+
| License | MIT | MIT-equivalent |
63+
64+
---
65+
66+
## 3. Gaps — GitNexus can do that we can't
67+
68+
Honest enumeration. These are genuine capability gaps.
69+
70+
| # | Gap | Why GitNexus has it | Impact |
71+
|---|---|---|---|
72+
| **N1** | 14 languages at symbol-level (we have 3 via AP + 6 file-level via Cortex core) | Tree-sitter grammars are commoditized; they ported more | Polyglot codebases retrieve poorly for us |
73+
| **N2** | Multi-file coordinated rename (`rename` tool) | Graph + text-search + confidence-weighted edits as one tool | We don't ship a write-path refactor tool at all |
74+
| **N3** | Method-resolution order (C3 / Ruby-mixin / first-wins) | Needed for accurate method-dispatch queries | Our method queries can misattribute overrides |
75+
| **N4** | Overload disambiguation (arity + type hash + const) | Node-ID format encodes it | We treat `foo(int)` and `foo(vector<int>)` as the same node |
76+
| **N5** | Browser-native UI with in-memory WASM graph store | transformers.js + LadybugDB WASM + Sigma.js | We require Postgres server — harder onboarding |
77+
| **N6** | Multi-repo `group_*` coordination (contract extraction across repos) | Registry + group-scoped Cypher | Our cross-project story is cognitive-profile-based, not graph-based |
78+
| **N7** | Zero-config CLI (`npx gitnexus analyze`) | One binary, no DB to provision | Our setup requires Postgres + pgvector |
79+
80+
**Of these, only N1/N2/N3/N4/N5 are genuine capability gaps for a code-intelligence competitor. N6 is a different axis (we have it differently via `core/bridge_finder.py`); N7 is DX, addressable in a weekend.**
81+
82+
## 4. Gaps — we can do that GitNexus can't
83+
84+
The structural asymmetry.
85+
86+
| # | Gap | Why only we have it | Impact |
87+
|---|---|---|---|
88+
| **C1** | Persistent memory across sessions | Full thermodynamic store (`core/thermodynamics.py`) + decay (`core/decay_cycle.py`) + reconsolidation (`core/reconsolidation.py`) — GitNexus is stateless-per-query code intelligence | The headline moat — see §6 |
89+
| **C2** | Paper-cited mechanisms | 100+ citations across `core/*.py`; GitNexus cites zero papers | Every challenge to our implementation has a paper retreat; every challenge to theirs has nothing |
90+
| **C3** | Reproducible benchmarks that beat published SOTA | LongMemEval 97.8% vs paper's 78.4%; BEAM 0.543 vs 0.329 | Concrete track record; GitNexus has none |
91+
| **C4** | Cognitive profile per domain (Felder-Silverman style) | `core/style_classifier.py` + `core/domain_detector.py` + behavioural persona vector | Tailors retrieval to the agent's actual reasoning pattern |
92+
| **C5** | Predictive-coding write gate (Friston 2010) | 4-signal novelty filter prevents contaminated memory | Their re-index-on-change model has no write gate — garbage accumulates |
93+
| **C6** | Security gates + PRD validator + Tarjan-SCC semantic diff (via AP) | AP `prd_validator.rs`, `security_gates.rs`, `semantic_diff.rs` | Structural-truth layer shields PRDs from symbol hallucination |
94+
| **C7** | Cascade consolidation (LABILE → EARLY_LTP → LATE_LTP → CONSOLIDATED) | `core/cascade.py` + `core/two_stage_model.py` (McClelland 1995) | Memories stabilize with replay; GitNexus has no notion of memory maturation |
95+
| **C8** | Homeostatic scaling + microglial pruning | `core/homeostatic_plasticity.py` + `core/microglial_pruning.py` | Self-regulating store; theirs has no feedback loop |
96+
| **C9** | MIT license | Commercial use allowed without payment | GitNexus PolyForm Noncommercial blocks the paying audience |
97+
98+
---
99+
100+
## 5. The science moat — why it is real (Popper)
101+
102+
Paper citations prevent one specific failure mode: **silent constant drift under benchmark pressure**.
103+
104+
When a benchmark goes from 94% → 97.8%, the temptation is to tune one more constant to get 98.3%. Without a paper anchor, the constant becomes corpus-fitted — a form of overfitting invisible until the next distribution shift (new corpus, new user, new language). With a paper anchor, moving the constant requires either (a) a new paper or (b) a public benchmark measurement — both leave an audit trail. GitNexus has no anchors. Their BM25+RRF fusion has no cited weights; their Leiden resolution parameter is unstated. They can tune freely, overfit invisibly, and collapse silently on the first independent evaluation.
105+
106+
The moat is **provenance forces honesty**. It's the same moat peer-reviewed science has over blog-driven opinion.
107+
108+
### 5.1 Our own unfalsifiable soft-spots (Popper's honesty move)
109+
110+
| Soft spot | Current state | Falsifiable version |
111+
|---|---|---|
112+
| "Biologically inspired" | Suggestive framing, not testable | "Ablating mechanism X drops benchmark Y by ≥N%" — measurable |
113+
| "Thermodynamic memory" | Evocative, no thermodynamic law constrains our decay | "Memories heat<0.3 retrieve at R@10<20% after 30d" |
114+
| "Cognitive profiling works" | Uses Felder-Silverman which has weak independent validation | "Seeding profile reduces Claude tool-choice entropy by X%" |
115+
116+
We need to ship **ablation evidence** before a critic runs it for us.
117+
118+
---
119+
120+
## 6. Five-move competitive plan
121+
122+
Combining Popper's piecemeal bets + Taleb's barbell + Altshuller's inventive moves.
123+
124+
### M1 — Benchmark-slap publication (2 weeks)
125+
126+
**Action:** Run GitNexus on LongMemEval-S / LoCoMo / BEAM (the same benchmarks where we have numbers). Publish a reproducible one-page comparison: both systems, clean DB, same hardware, same questions. Include our ablation matrix (which mechanisms contribute how much) so the result isn't a black box.
127+
128+
**Why this beats them:** they can't match it — no baseline, no harness, no published script. The public delta becomes permanent marketing.
129+
130+
**Falsification:** if GitNexus ties or wins, our "benchmark moat" is a mirage and we pivot. Honest risk.
131+
132+
**Science grounding:** Popper severity — a test we could lose, making victory informative.
133+
134+
### M2 — Ablation + constant audit (4 weeks, in parallel with M1)
135+
136+
**Action:** Two deliverables, both in `tasks/paper-implementation-audit.md`:
137+
1. **Ablation matrix** — for each of the 23 ablatable mechanisms (see `core/ablation.py`), measure the benchmark delta when it's turned off. Publish. Any mechanism with <1% delta is either non-load-bearing (kept but unclaimed) or wrong (removed).
138+
2. **Constant-justification audit** — every numeric constant in `core/thermodynamics.py`, `core/cascade.py`, `core/decay_cycle.py`, `core/homeostatic_plasticity.py` gets a `# source:` comment tracing to a paper equation, a benchmark measurement, or a measured constant. Zero unsourced numbers by end of month.
139+
140+
**Why this beats them:** every mechanism we keep is justified; every mechanism we cut shrinks our surface area. GitNexus can never replicate the audit trail.
141+
142+
**Science grounding:** Feynman-style "lean over backwards to report what might invalidate the result."
143+
144+
### M3 — Code-corpus benchmark (4 weeks)
145+
146+
**Action:** Our three benchmarks are conversational (LongMemEval, LoCoMo, BEAM). A critic could reasonably say "these don't test code retrieval — Cortex's real domain is chat, not code." Pre-empt: add SWE-bench-retrieval or a purpose-built code-memory corpus that tests "which file did we decide to change this behaviour in, 3 sessions ago?"-style questions. Pre-register the question set **before** either system runs.
147+
148+
**Why this beats them:** code-memory is the exact intersection where our persistent-memory advantage (C1) meets their strength (structural intelligence). If we win, we win on their home turf. If we lose, we know exactly which mechanism failed.
149+
150+
**Science grounding:** Popper pre-registration discipline; Fisher-style pre-specified hypothesis.
151+
152+
### M4 — Absorb the five unused resources (8 weeks)
153+
154+
Altshuller identified five signals available but not used in any code-intelligence system. We absorb them first:
155+
156+
| Resource | Wire into | Effect |
157+
|---|---|---|
158+
| **Git blame age** | `core/decay_cycle.py` as initial-heat prior | Code touched yesterday starts hot; 3-year-old code starts cold. Zero-measurement freshness signal. |
159+
| **Test coverage %** | Edge weight in call graph (via `handlers/consolidation/plasticity.py`) | Uncovered calls get decayed faster — risk-weighted graph. |
160+
| **PR-review comment sentiment** | `core/emotional_tagging.py` valence | NACK comments reduce symbol priority; LGTM comments raise it. |
161+
| **LSP diagnostics stream** | `ap_bridge.py` + `core/ap_impact_to_surprise.py` (new) | Continuous validation gate; diagnostic errors raise surprise → stronger encoding of the edit. |
162+
| **Commit-message verbs** | Typed edges in `core/knowledge_graph.py` | "fix" / "refactor" / "deprecate" / "add" become relationship labels — causal recall becomes free. |
163+
164+
**Why this beats them:** each resource is an unused asymmetric advantage. Once we ingest git blame and test coverage, GitNexus's pure AST graph looks flat. Retrofitting heat into their architecture is a rewrite, not a feature.
165+
166+
**Science grounding:** every ingest has a citable justification (Snow-style outbreak tracing via git blame; Turrigiano-style heat as the integrative variable).
167+
168+
### M5 — Privacy parity via Cortex-Edge (12 weeks)
169+
170+
The one real gap where GitNexus has us beat: **browser-native, zero-server privacy**. We need parity without abandoning science.
171+
172+
**Action:** Cortex-Edge — a subset of Cortex core compiled to WASM-compatible Python or transpiled. Swap PostgreSQL for DuckDB+VSS or sqlite-vec; embeddings via ONNX in-process; retain `core/thermodynamics.py`, `core/decay_cycle.py`, `core/write_gate.py`, `core/query_intent.py`, `core/fractal.py`, `core/hopfield.py`. Drop advanced mechanisms that require server state (homeostatic scaling, microglial pruning, cascade) — degrade gracefully and say so explicitly.
173+
174+
**Why this beats them:** they ship "browser-only" but actually require Node.js server on port 4747. We ship true zero-server Cortex-Edge + a server-enhanced Cortex-Pro. Two tiers, clear story, their differentiator evaporated.
175+
176+
**Science grounding:** TRIZ #1 (Segmentation) + #27 (Cheap disposable) — same core algorithms, segmented deployment.
177+
178+
---
179+
180+
## 7. Reframing and subtractive moves (Taleb via negativa)
181+
182+
Alongside the 5 positive moves, 4 subtractive moves we ship immediately:
183+
184+
1. **Stop marketing "biological fidelity."** Keep the 108 neuro-modules — they earn their keep on benchmarks. But reframe to "thermodynamic memory model" in user-facing copy. Eliminates the "cargo-cult neuroscience" Black Swan without losing the code.
185+
2. **Delete dead code** — enforce CLAUDE.md's "if it's built, it must be called" rule. Run `vulture` across `core/`; cut anything unwired. Every dead module is fragility surface area with zero benchmark weight.
186+
3. **Unbind FlashRank**`core/reranker.py` becomes a port, not a binding. DIP-correct, one swap away from any future reranker. Removes the Black Swan of vendor disappearance.
187+
4. **Demote 42 of 47 MCP tools** — the five that drive 80% of value (`remember`, `recall`, `query_methodology`, `anchor`, `run_pipeline`) are the public surface; the rest become "advanced." Fewer tools = less to learn = less to document = less to break.
188+
189+
---
190+
191+
## 8. Execution order + gates
192+
193+
| Week | Deliverable | Gate | Risk if it fails |
194+
|---|---|---|---|
195+
| 1-2 | M1: GitNexus benchmark run | Numbers published, reproducible script | "Benchmark moat" collapses, pivot to M4 |
196+
| 3-6 | M2: ablation + constant audit | `tasks/paper-implementation-audit.md` at 100% constant coverage | Mechanisms <1% delta cut; 2500-test baseline holds |
197+
| 3-6 | Subtractive moves (1-4) | Dead code gone, reranker port shipped, UI reframed ||
198+
| 5-8 | M3: code-corpus benchmark | Pre-registered question set public; both systems run; numbers published | Reveals a real weakness, informs the next iteration |
199+
| 7-14 | M4: absorb 5 unused resources | Git-blame + coverage + PR-sentiment + LSP + commit-verbs all wired into consolidation + retrieval | Measures improve or don't; either way, honest |
200+
| 10-22 | M5: Cortex-Edge WASM | Shippable binary; zero-server deployment works | Privacy parity reached; differentiator collapsed |
201+
202+
---
203+
204+
## 9. Summary
205+
206+
**We do not compete with GitNexus on structural intelligence** — their
207+
14-language tree-sitter surface is wider than AP's 3-language surface.
208+
We compete on **a dimension they structurally cannot enter**:
209+
persistent memory with thermodynamic decay, paper-cited mechanisms,
210+
benchmark-proven retrieval, cognitive profiling per domain, a write
211+
gate that prevents contamination, and a consolidation loop that
212+
matures memories over time.
213+
214+
Their architecture cannot absorb these in <12 months. Our architecture
215+
can absorb their structural advantages (M4 resources + M5 edge mode)
216+
in <3 months. Time works for us.
217+
218+
**The one thing we must do today:** run the benchmarks (M1). Every
219+
month we don't publish numbers is a month they have to narrow the gap
220+
on rhetoric alone. Once our numbers are published and reproducible,
221+
the science moat becomes permanent — and peer-reviewed benchmarks
222+
cannot be out-marketed.
223+
224+
---
225+
226+
## 10. Sources
227+
228+
- [github.com/abhigyanpatwari/GitNexus](https://github.com/abhigyanpatwari/GitNexus) — repo
229+
- [GitNexus ARCHITECTURE.md](https://github.com/abhigyanpatwari/GitNexus/blob/main/ARCHITECTURE.md) — 12-phase DAG, 44 nodes, 21 edges
230+
- [GitNexus CLAUDE.md](https://github.com/abhigyanpatwari/GitNexus/blob/main/CLAUDE.md) — Claude Code integration
231+
- [Pebblous review](https://blog.pebblous.ai/blog/gitnexus-code-knowledge-graph-2026/en/) — acknowledged limitations, no benchmarks
232+
- ADR-0046 `docs/adr/ADR-0046-automatised-pipeline-integration.md`
233+
- v2 gap analysis `docs/program/v3.14-gap-analysis-v2-corrected.md`

0 commit comments

Comments
 (0)