Skip to content

Commit 8140b89

Browse files
authored
fix(parity): restore call AST node extraction in WASM engine (#705)
* refactor(extractors): add parser abstraction layer (Phase 7.1) Extract shared patterns from 9 language extractors into 4 reusable helpers in helpers.ts, reducing per-language boilerplate by ~30 lines: - findParentNode: replaces 6 findParent*/findCurrentImpl functions - extractBodyMembers: replaces 5 body-iteration patterns for enums/structs - stripQuotes: replaces inline .replace(/"/g,'') across 3 extractors - lastPathSegment: replaces inline .split('.').pop() across 6 extractors Net: +77 helper lines, -159 extractor lines = -82 lines total. * fix: address Greptile review comments (#700) Add clarifying comment to empty self_parameter if-block in rust.ts and document anchored-vs-global semantics in stripQuotes JSDoc. * fix: revert subpath imports to relative imports in builder stages * fix(parity): restore call AST node extraction in WASM engine (#697) PR #686 incorrectly removed call_expression extraction from the WASM ast-store-visitor and added filters to strip call nodes from native output — documenting a parity gap as expected behavior rather than fixing it. This restores full parity: - Restore call_expression in JS astTypes map - Restore call extraction logic in ast-store-visitor with receiver extraction, argument-only recursion, and dedup tracking - Remove three call-kind filters (engine.ts, ast.ts bulk path, ast.ts JS fallback path) that stripped native call nodes - Clear stale engineOpts.nativeDb reference after pipeline closes the NativeDatabase to prevent stages from using a closed connection - Update ast-nodes test to assert calls ARE stored * fix: make enterNode skipChildren guard unconditional for matched nodes (#705) When enterNode encountered an already-matched node it returned undefined, relying on the parent call's skipChildren to prevent redundant descent. This was correct but fragile — a future change to walkCallArguments could silently break the invariant. Now returns { skipChildren: true } explicitly. * fix: pin exact call node count in ast-nodes test (#705) Replace loose toBeGreaterThanOrEqual(1) with exact toBe(3) and verify each expected call name (eval, result.set, console.log). The fixture produces exactly 3 call nodes — fetch is captured as kind:await instead. Pinning the count catches future parity regressions immediately. * fix: revert enterNode skipChildren guard — node.id reuse causes false skips (#705) Reverts the { skipChildren: true } guard for matched nodes. Tree-sitter node.id is a memory address that can be reused across different nodes in the same tree. Returning skipChildren on an ID collision suppresses an unrelated subtree, breaking WASM ast_node extraction in the build-parity test. The original undefined return is correct: it prevents re-collection without suppressing children, and the parent call's skipChildren handles the intended dedup case. * ci: retry CI run (#705) * fix: add diagnostic logging for ast_nodes parity failure (#705) When native and WASM ast_node counts diverge, log both counts and the distinct kinds present in each. This will reveal whether the CI-only failure is caused by native extracting calls that WASM misses, or a deeper issue with the WASM walker path. * ci: add diagnostic logging to trace WASM ast_nodes CI failure (#705) * ci: expand diagnostic logging to trace buildAstNodes path (#705) * fix: coerce null receiver to empty string for NAPI compat (#705) The prebuilt native binary (v3.5.0) expects `receiver` as `String` (not `Option<String>`), so passing JS `null` from WASM-extracted ast_nodes crashes the bulk insert with: Failed to convert JavaScript value `Null` into rust type `String` Coerce `null`/`undefined` receiver to `""` before passing to the native `bulkInsertAstNodes` path. This fixes the build-parity test (native 19 vs WASM 0 ast_nodes) and the typed-method-call test. * fix: remove parity-diag console.error logging (#705) Remove temporary diagnostic logging added to trace the WASM ast_nodes CI failure. The root cause (null receiver in NAPI bulk insert) is now fixed, so these stderr diagnostics are no longer needed.
1 parent 082a954 commit 8140b89

10 files changed

Lines changed: 158 additions & 67 deletions

File tree

src/ast-analysis/engine.ts

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -427,17 +427,6 @@ export async function runAnalyses(
427427

428428
if (!doAst && !doComplexity && !doCfg && !doDataflow) return timing;
429429

430-
// Strip dead 'call' kind from native astNodes upfront. Call AST nodes are no
431-
// longer extracted by the WASM visitor; native binaries still emit them until
432-
// the Rust extractors are updated (see #701). Clear the array when only calls
433-
// remain so the WASM visitor runs and extracts non-call kinds.
434-
for (const [, symbols] of fileSymbols) {
435-
if (Array.isArray(symbols.astNodes)) {
436-
const filtered = symbols.astNodes.filter((n) => n.kind !== 'call');
437-
symbols.astNodes = filtered.length > 0 ? (filtered as typeof symbols.astNodes) : undefined;
438-
}
439-
}
440-
441430
const extToLang = buildExtToLangMap();
442431

443432
// WASM pre-parse for files that need it

src/ast-analysis/rules/javascript.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,7 @@ export const dataflow: DataflowRulesConfig = makeDataflowRules({
237237
// ─── AST Node Types ───────────────────────────────────────────────────────
238238

239239
export const astTypes: Record<string, string> | null = {
240+
call_expression: 'call',
240241
new_expression: 'new',
241242
throw_statement: 'throw',
242243
await_expression: 'await',

src/ast-analysis/visitors/ast-store-visitor.ts

Lines changed: 119 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ interface AstStoreRow {
1414
kind: string;
1515
name: string | null | undefined;
1616
text: string | null;
17-
receiver: null;
17+
receiver: string | null;
1818
parentNodeId: number | null;
1919
}
2020

@@ -44,6 +44,22 @@ function extractExpressionText(node: TreeSitterNode): string | null {
4444
return truncate(node.text);
4545
}
4646

47+
function extractCallName(node: TreeSitterNode): string {
48+
for (const field of ['function', 'method', 'name']) {
49+
const fn = node.childForFieldName(field);
50+
if (fn) return fn.text;
51+
}
52+
return node.text?.split('(')[0] || '?';
53+
}
54+
55+
/** Extract receiver for call expressions (e.g. "obj" in "obj.method()"). */
56+
function extractCallReceiver(node: TreeSitterNode): string | null {
57+
const fn = node.childForFieldName('function');
58+
if (!fn || fn.type !== 'member_expression') return null;
59+
const obj = fn.childForFieldName('object');
60+
return obj ? obj.text : null;
61+
}
62+
4763
function extractName(kind: string, node: TreeSitterNode): string | null {
4864
if (kind === 'throw') {
4965
for (let i = 0; i < node.childCount; i++) {
@@ -82,6 +98,7 @@ export function createAstStoreVisitor(
8298
nodeIdMap: Map<string, number>,
8399
): Visitor {
84100
const rows: AstStoreRow[] = [];
101+
const matched = new Set<number>();
85102

86103
function findParentDef(line: number): Definition | null {
87104
let best: Definition | null = null;
@@ -101,45 +118,115 @@ export function createAstStoreVisitor(
101118
return nodeIdMap.get(`${parentDef.name}|${parentDef.kind}|${parentDef.line}`) || null;
102119
}
103120

121+
/** Recursively walk a subtree collecting AST nodes — used for arguments-only traversal. */
122+
function walkSubtree(node: TreeSitterNode | null): void {
123+
if (!node) return;
124+
if (matched.has(node.id)) return;
125+
126+
const kind = astTypeMap[node.type];
127+
if (kind === 'call') {
128+
// Capture this call and recurse only into its arguments
129+
collectNode(node, kind);
130+
walkCallArguments(node);
131+
return;
132+
}
133+
if (kind) {
134+
collectNode(node, kind);
135+
if (kind !== 'string' && kind !== 'regex') return; // skipChildren for non-leaf kinds
136+
}
137+
for (let i = 0; i < node.childCount; i++) {
138+
walkSubtree(node.child(i));
139+
}
140+
}
141+
142+
/**
143+
* Recurse into only the arguments of a call node — mirrors the native engine's
144+
* strategy that prevents double-counting nested calls in the function field
145+
* (e.g. chained calls like `a().b()`).
146+
*/
147+
function walkCallArguments(callNode: TreeSitterNode): void {
148+
// Try field-based lookup first, fall back to kind-based matching
149+
const argsNode =
150+
callNode.childForFieldName('arguments') ??
151+
findChildByKind(callNode, ['arguments', 'argument_list', 'method_arguments']);
152+
if (!argsNode) return;
153+
for (let i = 0; i < argsNode.childCount; i++) {
154+
walkSubtree(argsNode.child(i));
155+
}
156+
}
157+
158+
function findChildByKind(node: TreeSitterNode, kinds: string[]): TreeSitterNode | null {
159+
for (let i = 0; i < node.childCount; i++) {
160+
const child = node.child(i);
161+
if (child && kinds.includes(child.type)) return child;
162+
}
163+
return null;
164+
}
165+
166+
function collectNode(node: TreeSitterNode, kind: string): void {
167+
if (matched.has(node.id)) return;
168+
169+
const line = node.startPosition.row + 1;
170+
let name: string | null | undefined;
171+
let text: string | null = null;
172+
let receiver: string | null = null;
173+
174+
if (kind === 'call') {
175+
name = extractCallName(node);
176+
text = truncate(node.text);
177+
receiver = extractCallReceiver(node);
178+
} else if (kind === 'new') {
179+
name = extractNewName(node);
180+
text = truncate(node.text);
181+
} else if (kind === 'throw') {
182+
name = extractName('throw', node);
183+
text = extractExpressionText(node);
184+
} else if (kind === 'await') {
185+
name = extractName('await', node);
186+
text = extractExpressionText(node);
187+
} else if (kind === 'string') {
188+
const content = node.text?.replace(/^['"`]|['"`]$/g, '') || '';
189+
if (content.length < 2) return;
190+
name = truncate(content, 100);
191+
text = truncate(node.text);
192+
} else if (kind === 'regex') {
193+
name = node.text || '?';
194+
text = truncate(node.text);
195+
}
196+
197+
rows.push({
198+
file: relPath,
199+
line,
200+
kind,
201+
name,
202+
text,
203+
receiver,
204+
parentNodeId: resolveParentNodeId(line),
205+
});
206+
207+
matched.add(node.id);
208+
}
209+
104210
return {
105211
name: 'ast-store',
106212

107213
enterNode(node: TreeSitterNode, _context: VisitorContext): EnterNodeResult | undefined {
214+
// Guard: skip re-collection but do NOT skipChildren — node.id (memory address)
215+
// can be reused by tree-sitter, so a collision would incorrectly suppress an
216+
// unrelated subtree. The parent call's skipChildren handles the intended case.
217+
if (matched.has(node.id)) return;
218+
108219
const kind = astTypeMap[node.type];
109220
if (!kind) return;
110221

111-
const line = node.startPosition.row + 1;
112-
let name: string | null | undefined;
113-
let text: string | null = null;
114-
115-
if (kind === 'new') {
116-
name = extractNewName(node);
117-
text = truncate(node.text);
118-
} else if (kind === 'throw') {
119-
name = extractName('throw', node);
120-
text = extractExpressionText(node);
121-
} else if (kind === 'await') {
122-
name = extractName('await', node);
123-
text = extractExpressionText(node);
124-
} else if (kind === 'string') {
125-
const content = node.text?.replace(/^['"`]|['"`]$/g, '') || '';
126-
if (content.length < 2) return;
127-
name = truncate(content, 100);
128-
text = truncate(node.text);
129-
} else if (kind === 'regex') {
130-
name = node.text || '?';
131-
text = truncate(node.text);
132-
}
222+
collectNode(node, kind);
133223

134-
rows.push({
135-
file: relPath,
136-
line,
137-
kind,
138-
name,
139-
text,
140-
receiver: null,
141-
parentNodeId: resolveParentNodeId(line),
142-
});
224+
if (kind === 'call') {
225+
// Mirror native: skip full subtree, recurse only into arguments.
226+
// Prevents double-counting chained calls like service.getUser().getName().
227+
walkCallArguments(node);
228+
return { skipChildren: true };
229+
}
143230

144231
if (kind !== 'string' && kind !== 'regex') {
145232
return { skipChildren: true };

src/domain/graph/builder/pipeline.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,11 @@ async function runPipelineStages(ctx: PipelineContext): Promise<void> {
185185
/* ignore close errors */
186186
}
187187
ctx.nativeDb = undefined;
188+
// Also clear stale reference in engineOpts to prevent stages from
189+
// calling methods on the closed NativeDatabase.
190+
if (ctx.engineOpts?.nativeDb) {
191+
ctx.engineOpts.nativeDb = undefined;
192+
}
188193
}
189194

190195
await collectFiles(ctx);

src/domain/graph/builder/stages/build-edges.ts

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66
*/
77
import path from 'node:path';
88
import { performance } from 'node:perf_hooks';
9-
import { getNodeId } from '#db/index.js';
10-
import { debug } from '#infrastructure/logger.js';
11-
import { loadNative } from '#infrastructure/native.js';
9+
import { getNodeId } from '../../../../db/index.js';
10+
import { debug } from '../../../../infrastructure/logger.js';
11+
import { loadNative } from '../../../../infrastructure/native.js';
1212
import type {
1313
BetterSqlite3Database,
1414
Call,
@@ -18,7 +18,7 @@ import type {
1818
NativeAddon,
1919
NodeRow,
2020
TypeMapEntry,
21-
} from '#types';
21+
} from '../../../../types.js';
2222
import { computeConfidence } from '../../resolve.js';
2323
import type { PipelineContext } from '../context.js';
2424
import { BUILTIN_RECEIVERS, batchInsertEdges } from '../helpers.js';

src/domain/graph/builder/stages/build-structure.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@
55
*/
66
import path from 'node:path';
77
import { performance } from 'node:perf_hooks';
8-
import { debug } from '#infrastructure/logger.js';
9-
import { normalizePath } from '#shared/constants.js';
10-
import type { ExtractorOutput } from '#types';
8+
import { debug } from '../../../../infrastructure/logger.js';
9+
import { normalizePath } from '../../../../shared/constants.js';
10+
import type { ExtractorOutput } from '../../../../types.js';
1111
import type { PipelineContext } from '../context.js';
1212
import { readFileSafe } from '../helpers.js';
1313

src/domain/graph/builder/stages/collect-files.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
*/
88
import fs from 'node:fs';
99
import path from 'node:path';
10-
import { debug, info } from '#infrastructure/logger.js';
11-
import { normalizePath } from '#shared/constants.js';
10+
import { debug, info } from '../../../../infrastructure/logger.js';
11+
import { normalizePath } from '../../../../shared/constants.js';
1212
import { readJournal } from '../../journal.js';
1313
import type { PipelineContext } from '../context.js';
1414
import { collectFiles as collectFilesUtil } from '../helpers.js';

src/features/ast.ts

Lines changed: 4 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -103,16 +103,14 @@ export async function buildAstNodes(
103103

104104
for (const [relPath, symbols] of fileSymbols) {
105105
if (Array.isArray(symbols.astNodes)) {
106-
// Filter out 'call' kind — dead AST node type, see JS fallback path comment.
107-
const filtered = symbols.astNodes.filter((n) => n.kind !== 'call');
108106
batches.push({
109107
file: relPath,
110-
nodes: filtered.map((n) => ({
108+
nodes: symbols.astNodes.map((n) => ({
111109
line: n.line,
112110
kind: n.kind,
113111
name: n.name,
114112
text: n.text,
115-
receiver: n.receiver,
113+
receiver: n.receiver ?? '',
116114
})),
117115
});
118116
} else if (symbols.calls || symbols._tree) {
@@ -168,16 +166,9 @@ export async function buildAstNodes(
168166
nodeIdMap.set(`${row.name}|${row.kind}|${row.line}`, row.id);
169167
}
170168

171-
// Call AST nodes were removed — 'call' kind entries in ast_nodes are dead
172-
// (never queried by any feature or command). symbols.calls are still used
173-
// for call *edges* but no longer written to ast_nodes.
174-
175169
if (Array.isArray(symbols.astNodes)) {
176-
// Native engine provided AST nodes (may be empty for files with no AST content).
177-
// Filter out 'call' kind — call AST nodes are dead (never queried by any feature).
178-
// The WASM visitor no longer extracts them; native binaries still emit them until
179-
// the next Rust release strips them from the extractor.
180-
for (const n of symbols.astNodes.filter((n) => n.kind !== 'call')) {
170+
// Native engine provided AST nodes (may be empty for files with no AST content)
171+
for (const n of symbols.astNodes) {
181172
const parentDef = findParentDef(defs, n.line);
182173
let parentNodeId: number | null = null;
183174
if (parentDef) {

tests/integration/build-parity.test.ts

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,18 @@ describeOrSkip('Build parity: native vs WASM', () => {
114114
it('produces identical ast_nodes', () => {
115115
const wasmGraph = readGraph(path.join(wasmDir, '.codegraph', 'graph.db'));
116116
const nativeGraph = readGraph(path.join(nativeDir, '.codegraph', 'graph.db'));
117+
// Diagnostic: log counts to help debug CI-only parity failures
118+
if (nativeGraph.astNodes.length !== wasmGraph.astNodes.length) {
119+
console.error(
120+
`[parity-diag] native astNodes: ${nativeGraph.astNodes.length}, wasm astNodes: ${wasmGraph.astNodes.length}`,
121+
);
122+
console.error(
123+
`[parity-diag] native kinds: ${JSON.stringify([...new Set((nativeGraph.astNodes as any[]).map((n: any) => n.kind))])}`,
124+
);
125+
console.error(
126+
`[parity-diag] wasm kinds: ${JSON.stringify([...new Set((wasmGraph.astNodes as any[]).map((n: any) => n.kind))])}`,
127+
);
128+
}
117129
expect(nativeGraph.astNodes).toEqual(wasmGraph.astNodes);
118130
});
119131
});

tests/parsers/ast-nodes.test.ts

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,9 +100,15 @@ function queryAllAstNodes() {
100100
// ─── Tests ────────────────────────────────────────────────────────────
101101

102102
describe('buildAstNodes — JS extraction', () => {
103-
test('call kind AST nodes are no longer stored (dead code removed)', () => {
103+
test('captures call_expression as kind:call', () => {
104104
const calls = queryAstNodes('call');
105-
expect(calls.length).toBe(0);
105+
// eval(input), result.set('data', data), console.log(result)
106+
// Note: fetch('/api/data') is inside await — captured as kind:await, not kind:call
107+
expect(calls.length).toBe(3);
108+
const names = calls.map((n) => n.name);
109+
expect(names).toContain('eval');
110+
expect(names).toContain('result.set');
111+
expect(names).toContain('console.log');
106112
});
107113

108114
test('captures new_expression as kind:new', () => {

0 commit comments

Comments
 (0)