Skip to content

Commit 8bec68c

Browse files
authored
fix(perf): correct INLINE_BACKFILL_THRESHOLD docstring and raise threshold (#1492)
* chore: gitignore napi-generated artifacts in crates/codegraph-core * chore(tests): remove unused biome suppression in visitor.test.ts * fix(titan-run): sync --start-from enum and phase-timestamp list with actual phases * fix(hooks): track Bash file modifications via before/after git status diff Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh (PostToolUse Bash): the pre-hook captures git status --porcelain to a per-worktree temp file before each Bash call; the post-hook diffs the before/after state and appends newly modified or created files to .claude/session-edits.log. This closes the gap where files written by sed -i, printf redirects, tee, heredocs, or build tools (Cargo.lock, lockfiles) were never recorded, causing guard-git.sh to emit false-positive BLOCKED errors. Closes #1457 * chore(native): remove dead code (unused var, method, variant, fields) - clojure.rs: annotate lifetime-anchor assignment to silence false-positive - cfg.rs: remove never-called start_line_of method - complexity.rs: remove never-constructed NotHandled variant; convert irrefutable if-let patterns to plain let destructures - dataflow.rs: remove never-read callee fields from CallReturn/Destructured - incremental.rs: remove never-read lang field from CacheEntry cargo check and cargo clippy both clean after these changes. * refactor(native): extract emit_pts_alias_edges params into PtsAliasCtx struct * fix(wasm): sort call targets by confidence before emit to match native engine * fix(bench): add 2 warmup runs and raise INCREMENTAL_RUNS to 5 for incremental tiers * ci(bench): add per-PR perf canary for extractor/graph/native changes Adds .github/workflows/perf-canary.yml — a path-filtered workflow that fires on PRs touching src/extractors/, src/domain/graph/, or crates/** and runs only the incremental-benchmark suite (full build + no-op + 1-file rebuild, both engines). Catches the class of regressions that accumulated invisibly across the Phase 8.x PRs and were only detected at v3.12.0 publish time. The regression guard gains BENCH_CANARY=1 mode: raises thresholds to 50%/100%/150% (standard/noisy/WASM) and skips the build, query, and resolution suites — only incremental checks run. This absorbs shared- runner timing variance while still blocking catastrophic regressions (+98% full build, +1827% 1-file rebuild from v3.12.0). Closes #1433 * fix(perf): plumb symbolsOnly through parseFilesWasmInline to skip analysis visitors * fix(perf): scope runPostNativeCha to changed files on incremental builds On incremental builds, runPostNativeCha previously scanned all call→qualified-method edges in the DB (~12ms flat, O(graph size)), even for 1-file changes where no hierarchy or RTA evidence changed. Add two cheap indexed gate queries. Gate A checks whether any changed file introduced a class/interface/trait/struct/record node (hierarchy may have new implementors reachable from unchanged call sites). Gate B checks whether any changed file added a call edge to a class-kind target (RTA set may have grown, enabling previously filtered expansions in unchanged callers). If neither gate fires, restrict the candidate query to src.file IN changedFiles — safe because the hierarchy and instantiated set are unchanged for all other files. Full builds (isFullBuild=true) and cases where either gate fires retain the existing full-scan behaviour. Mirrors the changed-files scoping pattern of runPostNativeThisDispatch. Closes #1441 * fix(native): add post-pass phase timings to result.phases Times each JS post-pass in tryNativeOrchestrator and exposes the measurements in BuildResult.phases: - gapDetectMs — dropped-language gap detection + backfill - chaMs — CHA expansion (interface dispatch) - thisDispatchMs — this/super dispatch WASM re-parse (was already tracked but now properly named alongside the rest) - reclassifyMs — scoped role re-classification after edge insertion - techniqueBackfillMs — technique-column UPDATE on native-written edges Previously only thisDispatchMs was reported, causing wall-clock vs phaseSum to diverge by 1.1s+ on 1-file rebuilds and making benchmark regressions undiagnosable from committed history. Updates update-incremental-report.ts to render the new phases in a collapsible details block under each engine's 1-file rebuild section. Closes #1434 * fix(perf): correct INLINE_BACKFILL_THRESHOLD docstring; raise threshold for required-tier grammars The docstring claimed pool cost was "amortised over enough parse work" — measurements show IPC overhead scales linearly (~55–64ms/file pool vs ~8–10ms/file inline). The real motivation is crash safety for exotic WASM grammars (#965); JS/TS/TSX (required-tier, used in all this-dispatch backfill calls) have never triggered the V8 fatal crash class and are safe to run inline. Raise threshold 16 → 32 to keep typical this-dispatch batches (≤ 18 files on the codegraph corpus) on the inline fast path. Exotic-language drops are almost always well under 32 files and also benefit from the inline path without meaningful crash risk increase. Closes #1435 * fix(perf): update stale parseFilesWasmForBackfill docstring to reference threshold constant The secondary docstring still described the old 16-value rationale ("engine-parity drop sizes"). Replace with a pointer to INLINE_BACKFILL_THRESHOLD where the full rationale now lives. * fix(perf): guard post-native passes on 1-file incremental rebuilds (#1493) * fix(perf): guard post-native passes against unnecessary work on 1-file incremental rebuilds On 1-file native incremental builds, two JS post-passes ran unconditionally even when they had no work to do: - `backfillNativeDroppedFiles`: called whenever changedCount > 0, even when detectDroppedLanguageGap returned an empty gap. Gate now checks gap.missingAbs.length > 0 || gap.staleRel.length > 0 directly, matching backfillNativeDroppedFiles's own internal early-exit guard. - Node/edge COUNT(*) re-count: ran unconditionally after all post-passes even when none of them wrote any edges. COUNT(*) over 50K+ edge tables is non-trivial, especially via the NativeDbProxy napi-rs round-trip. Now gated on postPassWroteData (backfill | CHA edges | this-dispatch edges). Closes #1454 * refactor(perf): hoist backfillHappened before if to avoid duplicate expression Greptile suggested hoisting the backfillHappened variable declaration above the conditional that guards backfillNativeDroppedFiles, so the boolean expression is written exactly once. Previously the condition was evaluated in both the if-guard and the const declaration on the following line.
1 parent f718b8f commit 8bec68c

2 files changed

Lines changed: 45 additions & 35 deletions

File tree

src/domain/graph/builder/stages/native-orchestrator.ts

Lines changed: 26 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1538,26 +1538,16 @@ export async function tryNativeOrchestrator(
15381538
// stale native binaries). WASM handles those — backfill via WASM so both
15391539
// engines process the same file set (#967).
15401540
//
1541-
// Detect the gap once (fs walk + 2 DB queries, ~20–30ms) and use it for
1542-
// both gating and the backfill itself. On dirty incrementals/full builds
1543-
// the orchestrator signals trigger backfill, so the walk happens once
1544-
// (instead of redundantly inside backfill). On quiet incrementals we
1545-
// still pay the walk so we can detect brand-new files in dropped-language
1546-
// extensions — a gap that the orchestrator's `detect_removed_files`
1547-
// filter (#1070) leaves open (#1083, #1091). The pre-check is cheap
1548-
// because the expensive part (WASM re-parse of the missing set) is
1549-
// gated below.
1550-
const removedCount = result.removedCount ?? 0;
1551-
const changedCount = result.changedCount ?? 0;
1541+
// Detect the gap once (fs walk + 2 DB queries) and use it for both gating
1542+
// and the backfill itself. On quiet incrementals we still pay the walk so
1543+
// we can detect brand-new files in dropped-language extensions — a gap that
1544+
// the orchestrator's `detect_removed_files` filter (#1070) leaves open
1545+
// (#1083, #1091). The pre-check is cheap because the expensive part (WASM
1546+
// re-parse of the missing set) is gated below.
15521547
const gapDetectStart = performance.now();
15531548
const gap = detectDroppedLanguageGap(ctx);
1554-
if (
1555-
result.isFullBuild ||
1556-
removedCount > 0 ||
1557-
changedCount > 0 ||
1558-
gap.missingAbs.length > 0 ||
1559-
gap.staleRel.length > 0
1560-
) {
1549+
const backfillHappened = gap.missingAbs.length > 0 || gap.staleRel.length > 0;
1550+
if (backfillHappened) {
15611551
await backfillNativeDroppedFiles(ctx, gap);
15621552
}
15631553
const gapDetectMs = performance.now() - gapDetectStart;
@@ -1638,19 +1628,27 @@ export async function tryNativeOrchestrator(
16381628
// Re-count nodes/edges now that all edge-writing post-passes have run: the
16391629
// Rust orchestrator captured its counts before the JS post-passes added
16401630
// edges, so both its summary and build_meta under-report (#1452).
1631+
//
1632+
// Fast path: skip the COUNT(*) scan when no post-pass wrote any edges.
1633+
// COUNT(*) on large tables (50K+ edges) is non-trivial, especially via the
1634+
// NativeDbProxy napi-rs round-trip. When all post-passes were no-ops, the
1635+
// Rust orchestrator's counts are still accurate — no re-count needed.
16411636
let finalNodeCount = result.nodeCount ?? 0;
16421637
let finalEdgeCount = result.edgeCount ?? 0;
1643-
try {
1644-
const counts = (ctx.db as unknown as BetterSqlite3Database)
1645-
.prepare('SELECT (SELECT COUNT(*) FROM nodes) AS n, (SELECT COUNT(*) FROM edges) AS e')
1646-
.get() as { n: number; e: number };
1647-
if (counts.n !== finalNodeCount || counts.e !== finalEdgeCount) {
1648-
finalNodeCount = counts.n;
1649-
finalEdgeCount = counts.e;
1650-
setBuildMeta(ctx.db, { node_count: finalNodeCount, edge_count: finalEdgeCount });
1638+
const postPassWroteData = backfillHappened || chaEdgeCount > 0 || thisDispatchTargetIds.size > 0;
1639+
if (postPassWroteData) {
1640+
try {
1641+
const counts = (ctx.db as unknown as BetterSqlite3Database)
1642+
.prepare('SELECT (SELECT COUNT(*) FROM nodes) AS n, (SELECT COUNT(*) FROM edges) AS e')
1643+
.get() as { n: number; e: number };
1644+
if (counts.n !== finalNodeCount || counts.e !== finalEdgeCount) {
1645+
finalNodeCount = counts.n;
1646+
finalEdgeCount = counts.e;
1647+
setBuildMeta(ctx.db, { node_count: finalNodeCount, edge_count: finalEdgeCount });
1648+
}
1649+
} catch (err) {
1650+
debug(`Post-pass node/edge re-count failed: ${toErrorMessage(err)}`);
16511651
}
1652-
} catch (err) {
1653-
debug(`Post-pass node/edge re-count failed: ${toErrorMessage(err)}`);
16541652
}
16551653
info(
16561654
`Native build orchestrator completed: ${finalNodeCount} nodes, ${finalEdgeCount} edges, ${result.fileCount ?? 0} files`,

src/domain/parser.ts

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1181,12 +1181,25 @@ async function parseFilesWasm(
11811181
/**
11821182
* Files at or below this count use the inline parse path (no worker spawn).
11831183
*
1184-
* Sized for typical engine-parity drops: a handful of fixture files in one
1185-
* or two languages (the recurring HCL case is 4 files). Above this, the
1186-
* worker-pool's IPC + crash-isolation cost (#965) is amortized over enough
1187-
* parse work to be worth paying; below it, the ~1–2s cold-start dominates.
1184+
* The worker pool exists for crash safety (#965): exotic (non-required) WASM
1185+
* grammars can trigger uncatchable V8 fatal errors that would kill the main
1186+
* process. Running them in a worker means only the worker dies; the pool
1187+
* detects the exit, skips the file, respawns, and continues.
1188+
*
1189+
* JS/TS/TSX are required-tier grammars — they have never triggered the V8
1190+
* fatal crash class and are safe to run inline. The primary hot caller
1191+
* (this/super dispatch post-pass) exclusively handles JS/TS/TSX files and
1192+
* measured ~55–64ms/file through the pool vs ~8–10ms/file inline (#1435);
1193+
* IPC overhead scales linearly with file count, not amortised.
1194+
*
1195+
* The threshold is set high enough to keep typical this-dispatch batches
1196+
* (≤ 18 files on the codegraph corpus) on the inline path, while still
1197+
* routing truly large exotic-language drops (rare; typical HCL case is 4
1198+
* files) through the pool for crash isolation. Exotic-language drops are
1199+
* almost always well under this limit anyway, so they benefit from the
1200+
* inline fast path too without meaningful crash risk increase.
11881201
*/
1189-
const INLINE_BACKFILL_THRESHOLD = 16;
1202+
const INLINE_BACKFILL_THRESHOLD = 32;
11901203

11911204
/**
11921205
* Inline WASM parse (no worker) for small file batches.
@@ -1246,8 +1259,7 @@ async function parseFilesWasmInline(
12461259
/**
12471260
* Backfill helper: small batches use the inline (main-thread) path; larger
12481261
* batches keep the worker-pool isolation against tree-sitter WASM crashes
1249-
* (#965). Threshold matches typical engine-parity drop sizes (a few fixture
1250-
* files in one or two languages).
1262+
* (#965). See INLINE_BACKFILL_THRESHOLD for threshold rationale.
12511263
*
12521264
* `opts.symbolsOnly` skips the AST/complexity/CFG/dataflow visitors in the
12531265
* worker (and their result serialization across the thread boundary) for

0 commit comments

Comments
 (0)