Skip to content

Commit 698b509

Browse files
authored
feat: CFG visitor rewrite (Phase 3.1 completion) (#392)
* feat: cfg visitor rewrite — unified walk, derived cyclomatic, removed standalone impl Rewrite CFG construction as a visitor (createCfgVisitor) that plugs into the walkWithVisitors framework, eliminating the last redundant tree traversal (Mode B) in engine.js. All 4 analyses now run in a single DFS walk. - Create src/ast-analysis/visitors/cfg-visitor.js (~570 lines) with full control-flow support: if patterns A/B/C, for/while/do-while/infinite loops, switch/match, try/catch/finally, break/continue with labels, nested functions - Derive cyclomatic complexity from CFG (E - N + 2) as single source of truth, overriding the independently-computed value from the complexity visitor - Replace 813-line buildFunctionCFG with 15-line thin visitor wrapper - Simplify buildCFGData WASM fallback: file-level visitor walk instead of per-function findFunctionNode calls - Remove Mode A/B split from engine.js; simplify WASM pre-parse (CFG no longer triggers it) - cfg.js reduced from 1242 to 518 lines (-724 lines) - Add 31 new tests: 23 parity (visitor vs original) + 8 cyclomatic formula Impact: 35 functions changed, 61 affected * fix: allow nested function CFG and recompute MI on cyclomatic override Remove skipChildren from cfg-visitor's enterNode so the walker recurses into nested function definitions, enabling the funcStateStack to work as designed. Recompute maintainabilityIndex in engine.js whenever the CFG-derived cyclomatic overrides the complexity visitor's value, keeping metrics consistent. Addresses Greptile review feedback on PR #392. Impact: 3 functions changed, 1 affected * fix: match CFG results by node identity and disambiguate line collisions - buildFunctionCFG: use .find() with node identity instead of cfgResults[0] so nested functions don't shadow the target function's CFG - engine.js: replace line-only Map keying with line → results[] grouping, disambiguating by function name when multiple functions share a line Addresses Greptile re-review feedback on PR #392. Impact: 2 functions changed, 0 affected * fix: disambiguate line collisions in buildCFGData WASM fallback Apply the same array-plus-name-disambiguation pattern from engine.js to the visitorCfgByLine map in buildCFGData, preventing silent overwrites when multiple functions share a source line. Addresses Greptile re-review feedback on PR #392. Impact: 1 functions changed, 0 affected
1 parent d33388a commit 698b509

5 files changed

Lines changed: 1334 additions & 808 deletions

File tree

docs/roadmap/ROADMAP.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -591,14 +591,18 @@ src/
591591
-`builder.js` → single `runAnalyses` call replaces 4 sequential blocks + WASM pre-parse
592592
- ✅ Extracted pure computations to `metrics.js` (Halstead derived math, LOC, MI)
593593
- ✅ Extracted shared helpers to `visitor-utils.js` (from dataflow.js)
594-
- 🔲 **CFG visitor rewrite** (see below)
594+
- **CFG visitor rewrite** `createCfgVisitor` in `ast-analysis/visitors/cfg-visitor.js`, integrated into engine.js unified walk, Mode A/B split eliminated
595595

596596
**Remaining: CFG visitor rewrite.** `buildFunctionCFG` (813 lines) uses a statement-level traversal (`getStatements` + `processStatement` with `loopStack`, `labelMap`, `blockIndex`) that is fundamentally incompatible with the node-level DFS used by `walkWithVisitors`. This is why the engine runs CFG as a separate Mode B pass — the only analysis that can't participate in the shared single-DFS walk.
597597

598598
Rewrite the CFG algorithm as a node-level visitor that builds basic blocks and edges incrementally via `enterNode`/`exitNode` hooks, tracking block boundaries at branch/loop/return nodes the same way the complexity visitor tracks nesting. This eliminates the last redundant tree traversal during build and lets CFG share the exact same DFS pass as complexity, dataflow, and AST extraction. The statement-level `getStatements` helper and per-language `CFG_RULES.statementTypes` can be replaced by detecting block-terminating node types in `enterNode`. Also simplifies `engine.js` by removing the Mode A/B split and WASM pre-parse special-casing for CFG.
599599

600600
**Remaining: Derive cyclomatic complexity from CFG.** Once CFG participates in the unified walk, cyclomatic complexity can be derived directly from CFG edge/block counts (`edges - nodes + 2`) rather than independently computed by the complexity visitor. This creates a single source of truth for control flow metrics and eliminates redundant computation. Can also be done as a simpler SQL-only approach against stored `cfg_blocks`/`cfg_edges` tables (see backlog ID 45).
601601

602+
**Follow-up tasks (post CFG visitor rewrite):**
603+
-**Derive cyclomatic complexity from CFG** — CFG visitor computes `E - N + 2` per function; engine.js overrides complexity visitor's cyclomatic with CFG-derived value (single source of truth)
604+
-**Remove `buildFunctionCFG` implementation** — 813-line standalone implementation replaced with thin visitor wrapper (~15 lines); `buildCFGData` WASM fallback uses file-level visitor walk instead of per-function `findFunctionNode` calls
605+
602606
**Affected files:** `src/complexity.js`, `src/cfg.js`, `src/dataflow.js`, `src/ast.js` → split into `src/ast-analysis/`
603607

604608
### 3.2 -- Command/Query Separation ★ Critical 🔄

src/ast-analysis/engine.js

Lines changed: 86 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,12 @@
77
* - CFG construction (basic blocks + edges)
88
* - Dataflow analysis (define-use chains, arg flows, mutations)
99
*
10-
* Two modes:
11-
* Mode A (node-level visitor): AST + complexity + dataflow — single DFS per file
12-
* Mode B (statement-level): CFG keeps its own traversal via buildFunctionCFG
10+
* All 4 analyses run as visitors in a single DFS walk via walkWithVisitors.
1311
*
1412
* Optimization strategy: for files with WASM trees, run all applicable visitors
15-
* in a single walkWithVisitors call, then store results in the format that the
16-
* existing buildXxx functions expect as pre-computed data. This eliminates ~3
17-
* redundant tree traversals per file.
13+
* in a single walkWithVisitors call. Store results in the format that buildXxx
14+
* functions already expect as pre-computed data (same fields as native engine
15+
* output). This eliminates redundant tree traversals per file.
1816
*/
1917

2018
import path from 'node:path';
@@ -32,6 +30,7 @@ import { buildExtensionSet, buildExtToLangMap } from './shared.js';
3230
import { walkWithVisitors } from './visitor.js';
3331
import { functionName as getFuncName } from './visitor-utils.js';
3432
import { createAstStoreVisitor } from './visitors/ast-store-visitor.js';
33+
import { createCfgVisitor } from './visitors/cfg-visitor.js';
3534
import { createComplexityVisitor } from './visitors/complexity-visitor.js';
3635
import { createDataflowVisitor } from './visitors/dataflow-visitor.js';
3736

@@ -74,25 +73,15 @@ export async function runAnalyses(db, fileSymbols, rootDir, opts, engineOpts) {
7473
const extToLang = buildExtToLangMap();
7574

7675
// ── WASM pre-parse for files that need it ───────────────────────────
77-
if (doCfg || doDataflow) {
76+
// CFG now runs as a visitor in the unified walk, so only dataflow
77+
// triggers WASM pre-parse when no tree exists.
78+
if (doDataflow) {
7879
let needsWasmTrees = false;
7980
for (const [relPath, symbols] of fileSymbols) {
8081
if (symbols._tree) continue;
8182
const ext = path.extname(relPath).toLowerCase();
8283

83-
if (doCfg && CFG_EXTENSIONS.has(ext)) {
84-
const fnDefs = (symbols.definitions || []).filter(
85-
(d) => (d.kind === 'function' || d.kind === 'method') && d.line,
86-
);
87-
if (
88-
fnDefs.length > 0 &&
89-
!fnDefs.every((d) => d.cfg === null || Array.isArray(d.cfg?.blocks))
90-
) {
91-
needsWasmTrees = true;
92-
break;
93-
}
94-
}
95-
if (doDataflow && !symbols.dataflow && DATAFLOW_EXTENSIONS.has(ext)) {
84+
if (!symbols.dataflow && DATAFLOW_EXTENSIONS.has(ext)) {
9685
needsWasmTrees = true;
9786
break;
9887
}
@@ -185,6 +174,24 @@ export async function runAnalyses(db, fileSymbols, rootDir, opts, engineOpts) {
185174
}
186175
}
187176

177+
// ─ CFG visitor ─
178+
const cfgRulesForLang = CFG_RULES.get(langId);
179+
let cfgVisitor = null;
180+
if (doCfg && cfgRulesForLang && CFG_EXTENSIONS.has(ext)) {
181+
// Only use visitor if some functions lack pre-computed CFG
182+
const needsWasmCfg = defs.some(
183+
(d) =>
184+
(d.kind === 'function' || d.kind === 'method') &&
185+
d.line &&
186+
d.cfg !== null &&
187+
!Array.isArray(d.cfg?.blocks),
188+
);
189+
if (needsWasmCfg) {
190+
cfgVisitor = createCfgVisitor(cfgRulesForLang);
191+
visitors.push(cfgVisitor);
192+
}
193+
}
194+
188195
// ─ Dataflow visitor ─
189196
const dfRules = DATAFLOW_RULES.get(langId);
190197
let dataflowVisitor = null;
@@ -216,12 +223,21 @@ export async function runAnalyses(db, fileSymbols, rootDir, opts, engineOpts) {
216223
for (const r of complexityResults) {
217224
if (r.funcNode) {
218225
const line = r.funcNode.startPosition.row + 1;
219-
resultByLine.set(line, r);
226+
if (!resultByLine.has(line)) resultByLine.set(line, []);
227+
resultByLine.get(line).push(r);
220228
}
221229
}
222230
for (const def of defs) {
223231
if ((def.kind === 'function' || def.kind === 'method') && def.line && !def.complexity) {
224-
const funcResult = resultByLine.get(def.line);
232+
const candidates = resultByLine.get(def.line);
233+
const funcResult = !candidates
234+
? undefined
235+
: candidates.length === 1
236+
? candidates[0]
237+
: (candidates.find((r) => {
238+
const n = r.funcNode.childForFieldName('name');
239+
return n && n.text === def.name;
240+
}) ?? candidates[0]);
225241
if (funcResult) {
226242
const { metrics } = funcResult;
227243
const loc = computeLOCMetrics(funcResult.funcNode, langId);
@@ -247,6 +263,54 @@ export async function runAnalyses(db, fileSymbols, rootDir, opts, engineOpts) {
247263
}
248264
}
249265

266+
// ─ Store CFG results on definitions (buildCFGData will find def.cfg and skip its walk) ─
267+
if (cfgVisitor) {
268+
const cfgResults = results.cfg || [];
269+
const cfgByLine = new Map();
270+
for (const r of cfgResults) {
271+
if (r.funcNode) {
272+
const line = r.funcNode.startPosition.row + 1;
273+
if (!cfgByLine.has(line)) cfgByLine.set(line, []);
274+
cfgByLine.get(line).push(r);
275+
}
276+
}
277+
for (const def of defs) {
278+
if (
279+
(def.kind === 'function' || def.kind === 'method') &&
280+
def.line &&
281+
!def.cfg?.blocks?.length
282+
) {
283+
const candidates = cfgByLine.get(def.line);
284+
const cfgResult = !candidates
285+
? undefined
286+
: candidates.length === 1
287+
? candidates[0]
288+
: (candidates.find((r) => {
289+
const n = r.funcNode.childForFieldName('name');
290+
return n && n.text === def.name;
291+
}) ?? candidates[0]);
292+
if (cfgResult) {
293+
def.cfg = { blocks: cfgResult.blocks, edges: cfgResult.edges };
294+
295+
// Override complexity's cyclomatic with CFG-derived value (single source of truth)
296+
// and recompute maintainability index to stay consistent
297+
if (def.complexity && cfgResult.cyclomatic != null) {
298+
def.complexity.cyclomatic = cfgResult.cyclomatic;
299+
const { loc, halstead } = def.complexity;
300+
const volume = halstead ? halstead.volume : 0;
301+
const commentRatio = loc?.loc > 0 ? loc.commentLines / loc.loc : 0;
302+
def.complexity.maintainabilityIndex = computeMaintainabilityIndex(
303+
volume,
304+
cfgResult.cyclomatic,
305+
loc?.sloc ?? 0,
306+
commentRatio,
307+
);
308+
}
309+
}
310+
}
311+
}
312+
}
313+
250314
// ─ Store dataflow results (buildDataflowEdges will find symbols.dataflow and skip its walk) ─
251315
if (dataflowVisitor) {
252316
symbols.dataflow = results.dataflow;

0 commit comments

Comments
 (0)