Skip to content

Commit 251b2c4

Browse files
authored
fix(perf): scope WASM grammar load in engine-parity backfill (#1054) (#1058)
* fix(perf): scope WASM grammar load in engine-parity backfill The native engine drops files in some build environments (#1054), triggering a WASM backfill via the worker pool. The pool's first-call overhead is fine for full builds (amortized over hundreds of files) but dwarfs the actual parse work for small backfill batches — on slow CI runners, ~1.7s for 4 fixture files in one language. Add `parseFilesWasmInline`: a main-thread, no-worker parse path that loads only the grammars matching the input extensions and returns symbols with `_tree` set so the unified walker in `runAnalyses` populates AST/CFG/dataflow data downstream. New `parseFilesWasmForBackfill` chooses inline for batches ≤ 16 files, keeping worker isolation for larger batches where tree-sitter WASM crash protection matters more (#965). Routes both backfill sites through the new helper: - `parseFilesAuto`'s per-call inline backfill in `domain/parser.ts` - `backfillNativeDroppedFiles` in `domain/graph/builder/pipeline.ts` Refs #1054 Impact: 4 functions changed, 13 affected * fix(perf): free WASM trees after inline backfill (#1058) The inline backfill path sets symbols._tree (live web-tree-sitter Tree backed by WASM linear memory) on every result, but those symbols are consumed locally for DB row construction in backfillNativeDroppedFiles and never added to ctx.allSymbols, so the finalize-stage releaseWasmTrees sweep never frees them. Without explicit cleanup, trees leak WASM memory until process exit — bounded per run but cumulative across in-process integration tests. Adds a cleanup loop after batchInsertNodes that mirrors releaseWasmTrees, and drops the now-unused parseFilesAuto import. * docs(parser): explain INLINE_BACKFILL_THRESHOLD rationale (#1058) Adds context for the 16-file threshold per Claude review feedback: sized for typical engine-parity drops (recurring HCL case is 4 files); above it, the worker-pool's IPC + crash-isolation cost is amortized over enough parse work to be worth paying; below it, the cold-start dominates.
1 parent 1e292ef commit 251b2c4

2 files changed

Lines changed: 89 additions & 3 deletions

File tree

src/domain/graph/builder/pipeline.ts

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ import {
3737
formatDropExtensionSummary,
3838
getActiveEngine,
3939
getInstalledWasmExtensions,
40-
parseFilesAuto,
40+
parseFilesWasmForBackfill,
4141
} from '../../parser.js';
4242
import { setWorkspaces } from '../resolve.js';
4343
import { PipelineContext } from './context.js';
@@ -793,7 +793,7 @@ async function backfillNativeDroppedFiles(ctx: PipelineContext): Promise<void> {
793793
`Native orchestrator dropped ${totals['native-extractor-failure']} file(s) in natively-supported languages — likely a Rust extractor bug. Backfilling via WASM: ${formatDropExtensionSummary(byReason['native-extractor-failure'])}`,
794794
);
795795
}
796-
const wasmResults = await parseFilesAuto(missingAbs, ctx.rootDir, { engine: 'wasm' });
796+
const wasmResults = await parseFilesWasmForBackfill(missingAbs, ctx.rootDir);
797797

798798
const rows: unknown[][] = [];
799799
const exportKeys: unknown[][] = [];
@@ -853,6 +853,27 @@ async function backfillNativeDroppedFiles(ctx: PipelineContext): Promise<void> {
853853
updateStmt.run(...vals);
854854
}
855855
}
856+
857+
// Free WASM parse trees from the inline backfill path (#1058).
858+
// `parseFilesWasmInline` sets `symbols._tree` (a live web-tree-sitter Tree
859+
// backed by WASM linear memory) on every result, but these symbols are
860+
// consumed locally for DB row construction and never added to
861+
// `ctx.allSymbols`, so the finalize-stage `releaseWasmTrees` sweep never
862+
// sees them. Without this, trees leak WASM memory until process exit —
863+
// bounded per run but cumulative across in-process integration tests.
864+
// Mirrors the cleanup discipline established for #931.
865+
for (const [, symbols] of wasmResults) {
866+
const tree = (symbols as { _tree?: { delete?: () => void } })._tree;
867+
if (tree && typeof tree.delete === 'function') {
868+
try {
869+
tree.delete();
870+
} catch {
871+
/* ignore cleanup errors */
872+
}
873+
}
874+
(symbols as { _tree?: unknown; _langId?: unknown })._tree = undefined;
875+
(symbols as { _tree?: unknown; _langId?: unknown })._langId = undefined;
876+
}
856877
}
857878

858879
// ── Pipeline stages execution ───────────────────────────────────────────

src/domain/parser.ts

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1067,6 +1067,71 @@ async function parseFilesWasm(
10671067
return result;
10681068
}
10691069

1070+
/**
1071+
* Files at or below this count use the inline parse path (no worker spawn).
1072+
*
1073+
* Sized for typical engine-parity drops: a handful of fixture files in one
1074+
* or two languages (the recurring HCL case is 4 files). Above this, the
1075+
* worker-pool's IPC + crash-isolation cost (#965) is amortized over enough
1076+
* parse work to be worth paying; below it, the ~1–2s cold-start dominates.
1077+
*/
1078+
const INLINE_BACKFILL_THRESHOLD = 16;
1079+
1080+
/**
1081+
* Inline WASM parse (no worker) for small file batches.
1082+
*
1083+
* Used by the engine-parity backfill path when the native engine drops a
1084+
* handful of files (typically test fixtures). The worker pool's per-call
1085+
* IPC + grammar-init overhead can cost 1–2s on slow CI runners — for a
1086+
* 4-file backfill, that dwarfs the ~10ms of actual parse work.
1087+
*
1088+
* Returns symbols with `_tree` set so `runAnalyses` can run AST/CFG/dataflow
1089+
* visitors via the unified walker (mirrors how WASM-engine results behaved
1090+
* before the worker pool was introduced).
1091+
*/
1092+
async function parseFilesWasmInline(
1093+
filePaths: string[],
1094+
rootDir: string,
1095+
): Promise<Map<string, ExtractorOutput>> {
1096+
const result = new Map<string, ExtractorOutput>();
1097+
if (filePaths.length === 0) return result;
1098+
const parsers = await ensureParsersForFiles(filePaths);
1099+
for (const filePath of filePaths) {
1100+
if (!_extToLang.has(path.extname(filePath).toLowerCase())) continue;
1101+
let code: string;
1102+
try {
1103+
code = fs.readFileSync(filePath, 'utf-8');
1104+
} catch (err: unknown) {
1105+
warn(`Skipping ${path.relative(rootDir, filePath)}: ${(err as Error).message}`);
1106+
continue;
1107+
}
1108+
const extracted = wasmExtractSymbols(parsers, filePath, code);
1109+
if (!extracted) continue;
1110+
const relPath = path.relative(rootDir, filePath).split(path.sep).join('/');
1111+
const symbols = extracted.symbols as ExtractorOutput & { _tree?: unknown; _langId?: string };
1112+
symbols._tree = extracted.tree;
1113+
symbols._langId = extracted.langId;
1114+
result.set(relPath, symbols);
1115+
}
1116+
return result;
1117+
}
1118+
1119+
/**
1120+
* Backfill helper: small batches use the inline (main-thread) path; larger
1121+
* batches keep the worker-pool isolation against tree-sitter WASM crashes
1122+
* (#965). Threshold matches typical engine-parity drop sizes (a few fixture
1123+
* files in one or two languages).
1124+
*/
1125+
export async function parseFilesWasmForBackfill(
1126+
filePaths: string[],
1127+
rootDir: string,
1128+
): Promise<Map<string, ExtractorOutput>> {
1129+
if (filePaths.length <= INLINE_BACKFILL_THRESHOLD) {
1130+
return parseFilesWasmInline(filePaths, rootDir);
1131+
}
1132+
return parseFilesWasm(filePaths, rootDir);
1133+
}
1134+
10701135
/**
10711136
* Parse multiple files in bulk and return a Map<relPath, symbols>.
10721137
*/
@@ -1117,7 +1182,7 @@ export async function parseFilesAuto(
11171182
);
11181183
if (dropped.length > 0) {
11191184
warn(`Native engine dropped ${dropped.length} file(s); falling back to WASM for parity`);
1120-
const wasmResults = await parseFilesWasm(dropped, rootDir);
1185+
const wasmResults = await parseFilesWasmForBackfill(dropped, rootDir);
11211186
for (const [relPath, symbols] of wasmResults) {
11221187
result.set(relPath, symbols);
11231188
}

0 commit comments

Comments
 (0)