docs: add colbymchenry/codegraph to comparison table and backlog (#1239)

carlos-alm · web-flow · commit 8c4bfc2b629f · 2026-05-28T01:41:01.000-06:00
* docs: add colbymchenry/codegraph to comparison table and backlog Replace cpg in the feature comparison table with colbymchenry/codegraph (28.5k stars, context-serving focus). Footnote ⁶ disambiguates the two tools sharing the same name. Add four backlog items from the competitive analysis: - #105: framework-aware HTTP route linking - #106: cross-language symbol bridging (Swift↔ObjC, React Native, Expo) - #107: prominent auto-sync UX (codegraph sync alias, staleness message) - #108: agent ecosystem integration guides * docs: fix footnote order, star count, and backlog tier for items 107-108 (#1239)
diff --git a/README.md b/README.md
@@ -76,24 +76,24 @@ No config files, no Docker, no JVM, no API keys, no accounts. Point your agent a
 
 ### Feature comparison
 
-<sub>Comparison last verified: March 2026. Claims verified against each repo's README/docs. Full analysis: <a href="generated/competitive/COMPETITIVE_ANALYSIS.md">COMPETITIVE_ANALYSIS.md</a></sub>
+<sub>Comparison last verified: May 2026. Claims verified against each repo's README/docs. Full analysis: <a href="generated/competitive/COMPETITIVE_ANALYSIS.md">COMPETITIVE_ANALYSIS.md</a></sub>
 
-| Capability | codegraph | [joern](https://github.com/joernio/joern) | [narsil-mcp](https://github.com/postrv/narsil-mcp) | [cpg](https://github.com/Fraunhofer-AISEC/cpg) | [axon](https://github.com/harshkedia177/axon) | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) |
+| Capability | codegraph | [joern](https://github.com/joernio/joern) | [narsil-mcp](https://github.com/postrv/narsil-mcp) | [codegraph⁴](https://github.com/colbymchenry/codegraph) | [axon](https://github.com/harshkedia177/axon) | [GitNexus](https://github.com/abhigyanpatwari/GitNexus) |
 |---|:---:|:---:|:---:|:---:|:---:|:---:|
-| Languages | **34** | ~12 | **32** | ~10 | 3 | 13 |
+| Languages | **34** | ~12 | **32** | ~20 | 3 | 13 |
 | MCP server | **Yes** | — | **Yes** | **Yes** | **Yes** | **Yes** |
-| Dataflow + CFG + AST querying | **Yes** | **Yes** | **Yes**¹ | **Yes** | — | — |
-| Hybrid search (BM25 + semantic) | **Yes** | — | — | — | **Yes** | **Yes** |
+| Dataflow + CFG + AST querying | **Yes** | **Yes** | **Yes**¹ | — | — | — |
+| Hybrid search (BM25 + semantic) | **Yes** | — | — | Keyword only | **Yes** | **Yes** |
 | Git-aware (diff impact, co-change, branch diff) | **All 3** | — | — | — | **All 3** | — |
 | Dead code / role classification | **Yes** | — | **Yes** | — | **Yes** | — |
-| Incremental rebuilds | **O(changed)** | — | O(n) | — | **Yes** | Commit-level⁴ |
+| Incremental rebuilds | **O(changed)** | — | O(n) | File-watcher⁵ | **Yes** | Commit-level⁶ |
 | Architecture rules + CI gate | **Yes** | — | — | — | — | — |
-| Security scanning (SAST / vuln detection) | Intentionally out of scope² | **Yes** | **Yes** | **Yes** | — | — |
-| Zero config, `npm install` | **Yes** | — | **Yes** | — | **Yes** | **Yes** |
+| Security scanning (SAST / vuln detection) | Intentionally out of scope² | **Yes** | **Yes** | — | — | — |
+| Zero config, `npm install` | **Yes** | — | **Yes** | **Yes** | **Yes** | **Yes** |
 | Graph export (GraphML / Neo4j / DOT) | **Yes** | **Yes** | — | — | — | — |
-| Open source + commercial use | **Yes** (Apache-2.0) | **Yes** (Apache-2.0) | **Yes** (MIT/Apache-2.0) | **Yes** (Apache-2.0) | Source-available³ | Non-commercial⁵ |
+| Open source + commercial use | **Yes** (Apache-2.0) | **Yes** (Apache-2.0) | **Yes** (MIT/Apache-2.0) | **Yes** (MIT) | Source-available³ | Non-commercial⁷ |
 
-<sup>¹ narsil-mcp added CFG and dataflow in recent versions. ² Codegraph focuses on structural understanding, not vulnerability detection — use dedicated SAST tools (Semgrep, CodeQL, Snyk) for that. ³ axon claims MIT in pyproject.toml but has no LICENSE file in the repo. ⁴ GitNexus skips re-index if the git commit hasn't changed, but re-processes the entire repo when it does — no per-file incremental parsing. ⁵ GitNexus uses the PolyForm Noncommercial 1.0.0 license.</sup>
+<sup>¹ narsil-mcp added CFG and dataflow in recent versions. ² Codegraph focuses on structural understanding, not vulnerability detection — use dedicated SAST tools (Semgrep, CodeQL, Snyk) for that. ³ axon claims MIT in pyproject.toml but has no LICENSE file in the repo. ⁴ colbymchenry/codegraph is an unrelated tool that shares the name. It focuses on reducing AI agent token consumption by pre-indexing code structure for fast context retrieval — not on structural analysis, CI gates, or complexity metrics. 28.1k stars. ⁵ colbymchenry/codegraph uses OS file watchers (chokidar) for auto-sync — rebuild triggers on file change but re-parses from scratch per file, not O(changed) hashing. ⁶ GitNexus skips re-index if the git commit hasn't changed, but re-processes the entire repo when it does — no per-file incremental parsing. ⁷ GitNexus uses the PolyForm Noncommercial 1.0.0 license.</sup>
 
 ### What makes codegraph different
 
diff --git a/docs/roadmap/BACKLOG.md b/docs/roadmap/BACKLOG.md
@@ -1,6 +1,6 @@
 # Codegraph Feature Backlog
 
-**Last updated:** 2026-05-01
+**Last updated:** 2026-05-27
 **Source:** Features derived from [COMPETITIVE_ANALYSIS.md](../../generated/competitive/COMPETITIVE_ANALYSIS.md) and internal roadmap discussions.
 
 ---
@@ -170,6 +170,7 @@ These address fundamental limitations in the parsing and resolution pipeline tha
 | 72 | Interprocedural dataflow analysis | Extend the existing intraprocedural dataflow (ID 14) to propagate `flows_to`/`returns`/`mutates` edges across function boundaries. When function A calls B with argument X, and B's dataflow shows X flows to its return value, connect A's call site to the downstream consumers of B's return. Requires stitching per-function dataflow summaries at call edges — no new parsing, just graph traversal over existing `dataflow` + `edges` tables. Start with single-level propagation (caller↔callee), not transitive closure. | Analysis | Current dataflow stops at function boundaries, missing the most important flows — data passing through helper functions, middleware chains, and factory patterns. Single-function scope means `dataflow` can't answer "where does this user input end up?" across call boundaries. Cross-function propagation is the difference between toy dataflow and useful taint-like analysis | ✓ | ✓ | 5 | No | 14 |
 | 73 | ~~Improved dynamic call resolution~~ | ~~Upgrade the current "best-effort" dynamic dispatch resolution for Python, Ruby, and JavaScript. Three concrete improvements: **(a)** receiver-type tracking — when `x = SomeClass()` is followed by `x.method()`, resolve `method` to `SomeClass.method` using the assignment chain (leverages existing `ast_nodes` + `dataflow` tables); **(b)** common pattern recognition — resolve `EventEmitter.on('event', handler)` callback registration, `Promise.then/catch` chains, `Array.map/filter/reduce` with named function arguments, and decorator/annotation patterns; **(c)** confidence-tiered edges — mark dynamically-resolved edges with a confidence score (high for direct assignment, medium for pattern match, low for heuristic) so consumers can filter by reliability.~~ | Resolution | ~~In Python/Ruby/JS, 30-60% of real calls go through dynamic dispatch — method calls on variables, callbacks, event handlers, higher-order functions. The current best-effort resolution misses most of these, leaving massive gaps in the call graph for the languages where codegraph is most commonly used. Even partial improvement here has outsized impact on graph completeness~~ | ✓ | ✓ | 5 | No | — | **PROMOTED** — Moved to ROADMAP Phase 4.2 (Receiver Type Tracking for Method Dispatch) |
 | 81 | Track dynamic `import()` and re-exports as graph edges | Extract `import()` expressions as `dynamic-imports` edges in both WASM extraction paths (query-based and walk-based). Destructured names (`const { a } = await import(...)`) feed into `importedNames` for call resolution. **Partially done:** WASM JS/TS extraction works (PR #389). Remaining: **(a)** native Rust engine support — `crates/codegraph-core/src/extractors/javascript.rs` doesn't extract `import()` calls; **(b)** non-static paths (`import(\`./plugins/${name}.js\`)`, `import(variable)`) are skipped with a debug warning; **(c)** re-export consumer counting in `exports --unused` only checks `calls` edges, not `imports`/`dynamic-imports` — symbols consumed only via import edges show as zero-consumer false positives. | Resolution | Fixes false "zero consumers" reports for symbols consumed via dynamic imports. 95 `dynamic-imports` edges found in codegraph's own codebase — these were previously invisible to impact analysis, exports audit, and dead-export hooks | ✓ | ✓ | 5 | No | — |
+| 106 | Cross-language symbol bridging | Detect and link call boundaries that single-language tree-sitter parsing misses: **(a)** Swift↔Objective-C — `@objc`-annotated Swift methods callable from ObjC; ObjC bridging headers exposing ObjC symbols to Swift; **(b)** React Native bridge modules — `RCT_EXPORT_METHOD` in ObjC/Swift maps to `NativeModules.ModuleName.method` in JS; **(c)** Expo modules — `ExpoModule` definitions map to `requireNativeModule('ModuleName')` calls in JS/TS. Store cross-language edges as a `cross-lang-calls` type with confidence score. New `codegraph bridges` command lists all cross-language linkages. | Resolution | React Native and Expo codebases have significant logic in native modules that is currently completely invisible to impact analysis — changing a native method breaks JS callers with no warning from codegraph. Closes a gap for a large and growing category of mobile codebases | ✓ | ✓ | 3 | No | — |
 | 82 | Extract names from `import().then()` callback patterns | `extractDynamicImportNames` only extracts destructured names from `const { a } = await import(...)` (walks up to `variable_declarator`). The `.then()` pattern — `import('./foo.js').then(({ a, b }) => ...)` — produces an edge with empty names because the destructured parameters live in the `.then()` callback, not a `variable_declarator`. Detect when an `import()` call's parent is a `member_expression` with `.then`, find the arrow/function callback in `.then()`'s arguments, and extract parameter names from its destructuring pattern. | Resolution | `.then()`-style dynamic imports are common in older codebases and lazy-loading patterns (React.lazy, Webpack code splitting). Without name extraction, these produce file-level edges only — no symbol-level `calls` edges, so the imported symbols still appear as zero-consumer false positives | ✓ | ✓ | 4 | No | 81 |
 
 ### Tier 1i — Search, navigation, and monitoring improvements
@@ -184,6 +185,7 @@ These close gaps in search expressiveness, cross-repo navigation, implementation
 | 77 | Metric trend tracking (code insights) | `codegraph trends` computes key graph metrics (total symbols, avg complexity, dead code count, cycle count, community drift score, boundary violations) at historical git revisions and outputs a time-series table or JSON. Uses `git stash && git checkout <rev> && build && collect && restore` loop over sampled commits (configurable `--samples N` defaulting to 10 evenly-spaced commits). Stores results in a `metric_snapshots` table for incremental updates. `--since` and `--until` for date range. `--metric` to select specific metrics. Enables tracking migration progress ("how many files still use old API?"), tech debt trends, and codebase growth over time without external dashboards. | Intelligence | Agents and teams can answer "is our codebase getting healthier or worse?" with data instead of intuition — tracks complexity trends, dead code accumulation, architectural drift, and migration progress over time. Historical backfill from git history means instant visibility into months of trends | ✓ | ✓ | 3 | No | — |
 | 78 | Cross-repo symbol resolution | In multi-repo mode, resolve import edges that cross repository boundaries. When repo A imports `@org/shared-lib`, and repo B is `@org/shared-lib` in the registry, create cross-repo edges linking A's import to B's actual exported symbol. Requires matching npm/pip/go package names to registered repos. Store cross-repo edges with a `repo` qualifier in the `edges` table. Enables cross-repo `fn-impact` (changing a shared library function shows impact across all consuming repos), cross-repo `path` queries, and cross-repo `diff-impact`. | Navigation | Multi-repo mode currently treats each repo as isolated — agents can search across repos but can't trace dependencies between them. Cross-repo edges enable "if I change this shared utility, which downstream repos break?" — the highest-value question in monorepo and multi-repo architectures | ✓ | ✓ | 5 | No | — |
 | 79 | Advanced query language with boolean operators and output shaping | Extend `codegraph search` and `codegraph where` with a structured query syntax supporting: **(a)** boolean operators — `kind:function AND file:src/` , `name:parse OR name:extract`, `NOT kind:class`; **(b)** compound filters — `kind:method AND complexity.cognitive>15 AND role:core`; **(c)** output shaping — `--select symbols` (just names), `--select files` (distinct files), `--select owners` (CODEOWNERS for matches), `--select stats` (aggregate counts by kind/file/role); **(d)** result aggregation — `--group-by file`, `--group-by kind`, `--group-by community` with counts. Parse the query into a SQL WHERE clause against the `nodes`/`function_complexity`/`edges` tables. Expose as `query_language` MCP tool parameter. | Search | Current search is either keyword/semantic (fuzzy) or exact-name (`where`). Agents needing "all core functions with cognitive complexity > 15 in src/api/" must chain multiple commands and filter manually — wasting tokens on intermediate results. A structured query language answers compound questions in one call | ✓ | ✓ | 4 | No | — |
+| 105 | Framework-aware HTTP route linking | Extract route definitions from framework-specific patterns (Express `app.get('/path', handler)`, NestJS `@Get('/path')`, Django `path('url', view)`, Flask `@app.route('/path')`, FastAPI `@router.get('/path')`, Gin `r.GET(...)`, Spring `@GetMapping`) and store them as nodes with a `route` kind and a `handles_route` edge linking the pattern to its handler function. `codegraph flow` gains a `--route "POST /login"` mode that traces the full execution path from HTTP entry to leaf. `diff-impact` shows the associated route pattern when a handler changes. | Navigation | Agents can answer "what handles POST /login?" in one call — currently impossible without reading every file. Directly reduces orientation tokens in backend codebases. Covers at least 7 web frameworks across TS/JS, Python, Go, Java, PHP, Ruby | ✓ | ✓ | 4 | No | — |
 | 97 | Unified multi-repo graph | New `codegraph build --repos <path1> <path2> ...` (or `.codegraphrc.json` `repos[]` list) that builds a single unified graph spanning multiple repositories. Each repo is parsed independently, then a merge step stitches them into one SQLite DB with repo-qualified file paths (`repo:path`). Three connection modes: **(a)** npm/pip/go package imports — repo A imports `@org/lib` which is repo B, resolved via `package.json`/`setup.py`/`go.mod` name matching; **(b)** API boundary inference — repo A calls `fetch('/api/users')` and repo B defines an Express/Flask/Gin route for `/api/users`, linked as a `cross-repo-api` edge with lower confidence; **(c)** shared schema/proto — repos sharing `.proto`, OpenAPI, or GraphQL schema files get edges through the shared contract types. All existing query commands (`fn-impact`, `diff-impact`, `path`, `audit`, `triage`, `exports`) work transparently on the unified graph — changing a shared library function shows impact across all consuming repos in one query. Requires a `repos` registry mapping package names to local paths (extend existing `~/.codegraph/registry.json`). Store a `repo` column on `nodes` and `edges` tables to partition ownership. | Navigation | Current multi-repo mode (`--multi-repo`) keeps each repo's graph isolated — you can search across repos but can't trace how a change in one repo impacts another. Real-world systems span multiple repos connected by package imports, API integrations, or shared schemas. A unified graph answers "if I change this endpoint handler, which frontend components break?" or "if I update this shared utility, which downstream services are affected?" — the highest-value cross-cutting questions that currently require manual tracing across repo boundaries | ✓ | ✓ | 5 | Yes | 78 |
 | 80 | ~~Find implementations in impact analysis~~ | ~~When a function signature or interface definition changes, automatically include all implementations/subtypes in `fn-impact` and `diff-impact` blast radius. Currently impact only follows `calls` edges — changing an interface method signature breaks every implementor, but this is invisible. Requires ID 74's `implements` edges. Add `--include-implementations` flag (on by default) to impact commands.~~ | Analysis | ~~Catches the most dangerous class of missed blast radius — interface/trait changes that silently break all implementors. A single method signature change on a widely-implemented interface can break dozens of files, none of which appear in the current call-graph-only impact analysis~~ | ✓ | ✓ | 5 | No | 74 | **PROMOTED** — Folded into ROADMAP Phase 4.3 (`--include-implementations` flag on impact commands) |
 
@@ -205,6 +207,15 @@ Items identified by the architectural audit (v3.1.4) that don't fit existing tie
 | 95 | SARIF output for cycle detection | Add SARIF output format so cycle detection integrates with GitHub Code Scanning, showing issues inline in PRs. Currently planned for Phase 11 but could be delivered as early as Phase 7 since it's a pure output format addition. | CI | GitHub Code Scanning integration surfaces cycle violations directly in PR review — no separate CI step or comment bot needed | ✓ | ✓ | 3 | No | — |
 | 96 | Fix README runtime dependency count | README claims "Only 3 runtime dependencies" but there are 5 — it omits `graphology` and `graphology-communities-louvain` which are in `package.json` `dependencies` (not optional). Correct to 5. | Documentation | Accuracy — users and contributors should be able to trust the README | ✓ | ✓ | 1 | No | #545 | **SUPERSEDED** — PR #545 removes `graphology` and `graphology-communities-louvain`, making the README's "3 runtime dependencies" claim correct again. No further action needed once #545 merges. |
 
+### Tier 1k — Competitive-analysis gaps (from colbymchenry/codegraph comparison)
+
+Items identified through competitive analysis of [colbymchenry/codegraph](https://github.com/colbymchenry/codegraph) and similar tools. These address UX and ecosystem onboarding gaps highlighted by tools that lead on auto-sync and agent integration marketing. All are zero-dep and foundation-aligned.
+
+| ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking | Depends on |
+|----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------|
+| 107 | Prominent auto-sync UX (zero-config live mode) | Reposition `codegraph build --watch` as a first-class feature. Add `codegraph sync` as an alias. Add a startup staleness message: "graph last updated 3 minutes ago — run `codegraph sync` to keep it live." Consider a `codegraph daemon` subcommand for always-on background sync without a terminal. Document watch mode prominently in README alongside the init/build workflow. | Developer Experience | Users who forget to rebuild after editing silently lose graph accuracy. A prominent live-sync mode matches competitive tools that market "zero-config auto-sync" as a headline feature — it removes the most common source of stale graph results | ✓ | ✓ | 3 | No | — |
+| 108 | Agent ecosystem integration guides | Write explicit integration guides for top AI coding tools: Claude Code (MCP server), Cursor (MCP), VS Code Copilot (MCP), Codex/OpenCode, Gemini CLI, and Kiro. Each guide covers: install, configuration snippet, which MCP tools are exposed, and a worked example query. Add integration badges to README header. | Developer Experience | Lower onboarding friction for AI agent users — currently they must figure out MCP integration themselves. Competitors with explicit multi-agent documentation convert more users. Each guide is also a search/SEO surface. Documentation-only change; no code required | ✓ | ✓ | 2 | No | — |
+
 ### Tier 2 — Foundation-aligned, needs dependencies
 
 Ordered by problem-fit: