You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor: migrate raw SQL into repository pattern (Phase 3.3) (#396)
* feat: unified AST analysis framework with pluggable visitor pattern
Implement Phase 3.1 of the architectural refactoring roadmap. Replace 4
independent AST analysis passes (complexity, dataflow, AST-store, CFG)
with a shared visitor framework that runs complexity, dataflow, and
AST-store visitors in a single DFS walk per file.
New framework files:
- visitor.js: shared DFS walker with enterNode/exitNode/enterFunction/
exitFunction hooks, per-visitor skipChildren, nesting/scope tracking
- engine.js: orchestrates all analyses in one coordinated pass, replaces
4 sequential buildXxx blocks + WASM pre-parse in builder.js
- metrics.js: extracted Halstead derived math, LOC, MI from complexity.js
- visitor-utils.js: extracted shared helpers from dataflow.js
- visitors/complexity-visitor.js: cognitive/cyclomatic/nesting + Halstead
- visitors/ast-store-visitor.js: new/throw/await/string/regex extraction
- visitors/dataflow-visitor.js: scope stack + define-use chains
- tests/unit/visitor.test.js: 9 tests for walker framework
Modified modules:
- builder.js: 4 sequential buildXxx blocks → single runAnalyses call
- complexity.js: delegates walk to complexity visitor
- dataflow.js: delegates walk to dataflow visitor
- ast.js: delegates walk to ast-store visitor, removed dead code
Bug fixes:
- Fix Ruby dataflow parity: skip nested return keyword tokens inside
return statements (tree-sitter Ruby grammar nests return/return)
Hook fixes:
- check-dead-exports.sh: exclude public API (index.js) and dynamic
import() consumers from dead export detection
- check-readme.sh: match ROADMAP.md at any path; allow commit when at
least one doc is staged (developer reviewed which docs need updating)
Docs:
- ROADMAP.md: update 3.1 with completed items and remaining work
- BACKLOG.md: add Tier 1h with dynamic import/re-export tracking (ID 71)
- CLAUDE.md: add ast-analysis/ to architecture table
Impact: 51 functions changed, 47 affected
* revert: restore original check-dead-exports.sh hook
The hook correctly catches a real limitation — codegraph doesn't track
dynamic import() consumers as graph edges. The proper fix is to add
dynamic import edge tracking (backlog ID 81), not to exclude them
from the hook. Reverting the workaround.
* fix: address Greptile review — indexOf→m.index, precise publicAPI regex, remove unused var
Impact: 2 functions changed, 23 affected
* fix: address Greptile review — halsteadSkip depth counter, debug logging, perf import
- Replace halsteadSkip boolean with halsteadSkipDepth counter to handle
nested skip-type nodes correctly (e.g. return_type containing
type_annotation in TypeScript)
- Add debug() logging to all silent catch blocks in engine.js so
failures are discoverable with --verbose
- Explicitly import performance from node:perf_hooks for clarity
Impact: 5 functions changed, 24 affected
* fix: remove function nodes from nestingNodeTypes and eliminate O(n²) lookup
Address two Greptile review comments on PR #388:
1. Function nodes in nestingNodeTypes inflates cognitive complexity by +1
for every function body — funcDepth already tracks function-level
nesting, so adding functionNodes to the walker's nestingNodeTypes
double-counts.
2. Replace O(n²) complexityResults.find() with O(1) resultByLine map
lookup by storing the full result object (metrics + funcNode) instead
of just metrics.
Impact: 1 functions changed, 0 affected
* fix: remove function nesting inflation in computeAllMetrics and pass langId to LOC
Address two Greptile review comments:
1. computeAllMetrics added functionNodes to nestingNodeTypes, inflating
cognitive complexity by +1 for every function body (same bug as the
engine.js fix in a47eb47, but in the per-function code path).
2. collectResult in complexity-visitor passed null as langId to
computeLOCMetrics, causing C-style comment detection for all
languages. Now accepts langId via options and forwards it, so
Python/Ruby/PHP get correct comment-line counts and MI values.
Impact: 4 functions changed, 5 affected
* fix: guard runAnalyses call, fix nested function nesting, rename _engineOpts
Address three Greptile review comments:
1. Wrap runAnalyses call in builder.js with try/catch + debug() so a
walk-phase throw doesn't crash the entire build — matches the
resilience of the previous individual buildXxx try/catch blocks.
2. In function-level mode, enterFunction/exitFunction were no-ops, so
funcDepth stayed 0 for nested functions (callbacks, closures).
Now increments/decrements funcDepth in function-level mode too,
restoring parity with the original computeAllMetrics algorithm.
3. Rename _engineOpts to engineOpts in engine.js since it's actively
forwarded to all four buildXxx delegate calls — the underscore
prefix falsely signaled an unused parameter.
Impact: 5 functions changed, 4 affected
* fix: remove redundant processed Set and fix multi-line destructuring in hook
Address two Greptile review comments:
1. Remove the `processed` Set from dataflow-visitor.js — walkWithVisitors
visits each node exactly once via DFS on a proper tree, so the
deduplication guard never fires and just wastes memory.
2. Replace single-line destructuring regex in check-dead-exports.sh with
a multi-line-safe pattern (using the `s` flag) so multi-line
`const { ... } = await import(...)` patterns are correctly detected
and their names added to publicAPI.
Impact: 2 functions changed, 0 affected
* refactor: migrate raw SQL into repository pattern (Phase 3.3)
Split src/db/repository.js into src/db/repository/ directory with 10
domain files: nodes, edges, build-stmts, complexity, cfg, dataflow,
cochange, embeddings, graph-read, and barrel index.
Extracted ~120 inline db.prepare() calls from 14 consumer modules
into shared repository functions, eliminating duplication of common
patterns like getNodeId (5 files), bulkNodeIdsByFile (3 files),
callee/caller joins (6+ call sites), and file purge cascades.
Net reduction: -437 lines across the codebase. All 1573 tests pass.
Impact: 73 functions changed, 112 affected
* fix: address Greptile review — deduplicate relatedTests, hoist prepared stmts, fix .raw() no-op
- queries.js: restore DISTINCT-by-file deduplication for relatedTests in
explainFunctionImpl (lost when switching to findCallers)
- build-stmts.js: prepare SQL statements once in preparePurgeStmts() and
loop with runPurge() instead of calling db.prepare() per file per table
- cfg.js: replace misleading .raw() with .get() in hasCfgTables
Impact: 7 functions changed, 14 affected
* perf: cache prepared statements in hot-path repository functions
bulkNodeIdsByFile, getNodeId, and getFunctionNodeId each called
db.prepare() on every invocation. These are called inside per-file
loops in engine.js, ast.js, builder.js, complexity.js, and cfg.js,
producing O(N) prepare() calls instead of O(1).
Add WeakMap caches keyed on the db instance so each statement is
prepared once per connection and reused on subsequent calls. For a
300-file project this eliminates ~900 redundant prepare() calls per
build pass.
Also fix commit-msg hook: IMPACT=$(...||...) under sh -e exits with
code 1 when both diff-impact fallbacks return non-zero; append || true
so the assignment is always tolerated.
docs check acknowledged
* fix: guard pre-push hook against sh -e failure when diff-impact unavailable
Same issue as commit-msg: IMPACT=$(...||...) exits non-zero under sh -e
when both codegraph and node fallbacks fail. Append || true to absorb it.
docs check acknowledged
Copy file name to clipboardExpand all lines: docs/roadmap/ROADMAP.md
+38-47Lines changed: 38 additions & 47 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -586,11 +586,11 @@ Key principles:
586
586
587
587
**Context:** Phases 2.5 and 2.7 added 38 modules and grew the codebase from 5K to 26,277 lines without introducing shared abstractions. The dual-function anti-pattern was replicated across 19 modules. Three independent AST analysis engines (complexity, CFG, dataflow) totaling 4,801 lines share the same fundamental pattern but no infrastructure. Raw SQL is scattered across 25+ modules touching 13 tables. The priority ordering has been revised based on actual growth patterns -- the new #1 priority is the unified AST analysis framework.
Unify the independent AST analysis engines (complexity, CFG, dataflow) plus AST node storage into a shared visitor framework. These four modules independently implement the same pattern: per-language rules map → AST walk → collect data → write to DB → query → format.
592
592
593
-
**Completed:**Phases 1-7 implemented a pluggable visitor framework with a shared DFS walker (`walkWithVisitors`), an analysis engine orchestrator (`runAnalyses`), and three visitors (complexity, dataflow, AST-store) that share a single tree traversal per file. `builder.js` collapsed from 4 sequential `buildXxx` blocks into one `runAnalyses` call.
593
+
**Completed:**All 4 analyses (complexity, CFG, dataflow, AST-store) now run in a single DFS walk via `walkWithVisitors`. The CFG visitor rewrite ([#392](https://github.com/optave/codegraph/pull/392)) eliminated the Mode A/B split, replaced the 813-line `buildFunctionCFG` with a node-level visitor, and derives cyclomatic complexity directly from CFG structure (`E - N + 2`). `cfg.js` reduced from 1,242 → 518 lines.
- ✅ Extracted pure computations to `metrics.js` (Halstead derived math, LOC, MI)
617
618
- ✅ Extracted shared helpers to `visitor-utils.js` (from dataflow.js)
618
-
- ✅ **CFG visitor rewrite** — `createCfgVisitor` in `ast-analysis/visitors/cfg-visitor.js`, integrated into engine.js unified walk, Mode A/B split eliminated
619
-
620
-
**Remaining: CFG visitor rewrite.**`buildFunctionCFG` (813 lines) uses a statement-level traversal (`getStatements` + `processStatement` with `loopStack`, `labelMap`, `blockIndex`) that is fundamentally incompatible with the node-level DFS used by `walkWithVisitors`. This is why the engine runs CFG as a separate Mode B pass — the only analysis that can't participate in the shared single-DFS walk.
621
-
622
-
Rewrite the CFG algorithm as a node-level visitor that builds basic blocks and edges incrementally via `enterNode`/`exitNode` hooks, tracking block boundaries at branch/loop/return nodes the same way the complexity visitor tracks nesting. This eliminates the last redundant tree traversal during build and lets CFG share the exact same DFS pass as complexity, dataflow, and AST extraction. The statement-level `getStatements` helper and per-language `CFG_RULES.statementTypes` can be replaced by detecting block-terminating node types in `enterNode`. Also simplifies `engine.js` by removing the Mode A/B split and WASM pre-parse special-casing for CFG.
623
-
624
-
**Remaining: Derive cyclomatic complexity from CFG.** Once CFG participates in the unified walk, cyclomatic complexity can be derived directly from CFG edge/block counts (`edges - nodes + 2`) rather than independently computed by the complexity visitor. This creates a single source of truth for control flow metrics and eliminates redundant computation. Can also be done as a simpler SQL-only approach against stored `cfg_blocks`/`cfg_edges` tables (see backlog ID 45).
625
-
626
-
**Follow-up tasks (post CFG visitor rewrite):**
627
-
- ✅ **Derive cyclomatic complexity from CFG** — CFG visitor computes `E - N + 2` per function; engine.js overrides complexity visitor's cyclomatic with CFG-derived value (single source of truth)
628
-
- ✅ **Remove `buildFunctionCFG` implementation** — 813-line standalone implementation replaced with thin visitor wrapper (~15 lines); `buildCFGData` WASM fallback uses file-level visitor walk instead of per-function `findFunctionNode` calls
- ✅ Cyclomatic complexity derived from CFG (`E - N + 2`) — single source of truth for control flow metrics ([#392](https://github.com/optave/codegraph/pull/392))
629
621
630
622
**Affected files:**`src/complexity.js`, `src/cfg.js`, `src/dataflow.js`, `src/ast.js` → split into `src/ast-analysis/`
631
623
632
-
### 3.2 -- Command/Query Separation ★ Critical 🔄
633
-
634
-
> **v3.1.1 progress:** CLI display wrappers for all query commands extracted to `queries-cli.js` (866 lines, 15 functions). Shared `result-formatter.js` (`outputResult()` for JSON/NDJSON) and `test-filter.js` created. `queries.js` reduced from 3,395 → 2,490 lines — all `*Data()` functions remain, CLI formatting fully separated ([#373](https://github.com/optave/codegraph/pull/373)).
- ✅ Move shared utilities to `src/infrastructure/` (result-formatter.js, test-filter.js)
642
-
- 🔲 Introduce `CommandRunner` shared lifecycle (command files vary too much for a single pattern today — revisit once commands stabilize)
643
-
644
-
Eliminate the `*Data()` / `*()` dual-function pattern replicated across 19 modules. Every analysis module (queries, audit, batch, check, cochange, communities, complexity, cfg, dataflow, ast, flow, manifesto, owners, structure, triage, branch-compare, viewer) currently implements both data extraction AND CLI formatting.
624
+
### 3.2 -- Command/Query Separation ★ Critical ✅
645
625
646
-
Introduce a shared `CommandRunner` that handles the open-DB -> validate -> execute -> format -> paginate -> output lifecycle. Each command only implements unique query + analysis logic. Formatting is always separate and pluggable (CLI text, JSON, NDJSON, Mermaid, DOT).
626
+
CLI display wrappers extracted from all 19 analysis modules into dedicated `src/commands/` files. Shared infrastructure (`result-formatter.js`, `test-filter.js`) moved to `src/infrastructure/`. `*Data()` functions remain in original modules — MCP dynamic imports unchanged. ~1,059 lines of CLI formatting code separated from analysis logic ([#373](https://github.com/optave/codegraph/pull/373), [#393](https://github.com/optave/codegraph/pull/393)).
- ✅ `src/infrastructure/` directory for shared utilities ([#393](https://github.com/optave/codegraph/pull/393))
646
+
- ⏭️ `CommandRunner` shared lifecycle — deferred (command files vary too much for a single pattern today)
647
+
665
648
**Affected files:** All 19 modules with dual-function pattern, `src/cli.js`, `src/mcp.js`
666
649
667
-
### 3.3 -- Repository Pattern for Data Access ★ Critical 🔄
650
+
### 3.3 -- Repository Pattern for Data Access ★ Critical ✅
668
651
669
652
> **v3.1.1 progress:**`src/db/` directory created with `repository.js` (134 lines), `query-builder.js` (280 lines), and `migrations.js` (312 lines). All db usage across the codebase wrapped in try/finally for reliable `db.close()` ([#371](https://github.com/optave/codegraph/pull/371), [#384](https://github.com/optave/codegraph/pull/384), [#383](https://github.com/optave/codegraph/pull/383)).
653
+
>
654
+
> **v3.1.2 progress:**`repository.js` split into `src/db/repository/` directory with 10 domain files (nodes, edges, build-stmts, complexity, cfg, dataflow, cochange, embeddings, graph-read, barrel). Raw SQL migrated from 14 src/ modules into repository layer. `connection.js` already complete (89 lines handling open/close/WAL/pragma/locks/readonly).
Consolidate all SQL into a single `Repository` class. Currently SQL is scattered across 25+ modules that each independently open the DB and write raw SQL inline across 13 tables.
662
+
- ✅ Migrate remaining raw SQL from 14 modules into Repository
graph-read.js # Cross-table reads for export/communities
688
681
```
689
682
690
-
Add a query builder for the common pattern "find nodes WHERE kind IN (...) AND file NOT LIKE '%test%' ORDER BY ... LIMIT ? OFFSET ?". Not an ORM -- a thin SQL builder that eliminates string construction across 25 modules.
691
-
692
-
**Affected files:**`src/db.js` -> split into `src/db/`, SQL extracted from all modules
683
+
**Affected files:**`src/db.js` barrel updated, raw SQL extracted from `queries.js`, `builder.js`, `watcher.js`, `structure.js`, `complexity.js`, `cfg.js`, `dataflow.js`, `ast.js`, `ast-analysis/engine.js`, `embedder.js`, `sequence.js`, `communities.js`
0 commit comments