Skip to content

fix(perf): correct INLINE_BACKFILL_THRESHOLD docstring and raise threshold#1492

Merged
carlos-alm merged 16 commits into
mainfrom
fix/backfill-threshold-1435
Jun 14, 2026
Merged

fix(perf): correct INLINE_BACKFILL_THRESHOLD docstring and raise threshold#1492
carlos-alm merged 16 commits into
mainfrom
fix/backfill-threshold-1435

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

The INLINE_BACKFILL_THRESHOLD docstring claimed pool cost was "amortised over enough parse work" — measurements show IPC overhead scales linearly (~55–64ms/file pool vs ~8–10ms/file inline). The real motivation is grammar-crash safety for exotic parsers (#965); JS/TS/TSX (required-tier, used in all this-dispatch backfill calls) have never triggered the V8 fatal crash class and are safe to run inline.

Changes

  • Docstring corrected: explains the crash-safety rationale (exotic grammars can trigger uncatchable V8 fatal errors), clarifies that JS/TS/TSX are required-tier and safe inline, and accurately states that IPC overhead scales linearly rather than amortising
  • Threshold raised 16 → 32: keeps typical this-dispatch batches (≤ 18 files on the codegraph corpus) on the inline fast path. Exotic-language drops (e.g. recurring HCL case is 4 files) are almost always well below 32 and also benefit from the inline path without meaningful crash risk increase

Why 32

The primary hot caller (runPostNativeThisDispatch) measured 18 JS/TS files at 55–64ms/file through the pool vs 8–10ms/file inline — a 3–6x penalty. Raising to 32 covers that batch with headroom. Anything larger (truly exotic-language drops at scale) still routes through the pool for crash isolation.

Closes #1435

… diff

Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh
(PostToolUse Bash): the pre-hook captures git status --porcelain to a
per-worktree temp file before each Bash call; the post-hook diffs the
before/after state and appends newly modified or created files to
.claude/session-edits.log.

This closes the gap where files written by sed -i, printf redirects,
tee, heredocs, or build tools (Cargo.lock, lockfiles) were never
recorded, causing guard-git.sh to emit false-positive BLOCKED errors.

Closes #1457
- clojure.rs: annotate lifetime-anchor assignment to silence false-positive
- cfg.rs: remove never-called start_line_of method
- complexity.rs: remove never-constructed NotHandled variant; convert
  irrefutable if-let patterns to plain let destructures
- dataflow.rs: remove never-read callee fields from CallReturn/Destructured
- incremental.rs: remove never-read lang field from CacheEntry

cargo check and cargo clippy both clean after these changes.
Adds .github/workflows/perf-canary.yml — a path-filtered workflow that
fires on PRs touching src/extractors/, src/domain/graph/, or crates/**
and runs only the incremental-benchmark suite (full build + no-op +
1-file rebuild, both engines). Catches the class of regressions that
accumulated invisibly across the Phase 8.x PRs and were only detected
at v3.12.0 publish time.

The regression guard gains BENCH_CANARY=1 mode: raises thresholds to
50%/100%/150% (standard/noisy/WASM) and skips the build, query, and
resolution suites — only incremental checks run. This absorbs shared-
runner timing variance while still blocking catastrophic regressions
(+98% full build, +1827% 1-file rebuild from v3.12.0).

Closes #1433
On incremental builds, runPostNativeCha previously scanned all
call→qualified-method edges in the DB (~12ms flat, O(graph size)),
even for 1-file changes where no hierarchy or RTA evidence changed.

Add two cheap indexed gate queries. Gate A checks whether any changed
file introduced a class/interface/trait/struct/record node (hierarchy
may have new implementors reachable from unchanged call sites). Gate B
checks whether any changed file added a call edge to a class-kind target
(RTA set may have grown, enabling previously filtered expansions in
unchanged callers). If neither gate fires, restrict the candidate query
to src.file IN changedFiles — safe because the hierarchy and instantiated
set are unchanged for all other files.

Full builds (isFullBuild=true) and cases where either gate fires retain
the existing full-scan behaviour. Mirrors the changed-files scoping
pattern of runPostNativeThisDispatch.

Closes #1441
Times each JS post-pass in tryNativeOrchestrator and exposes the
measurements in BuildResult.phases:

- gapDetectMs  — dropped-language gap detection + backfill
- chaMs        — CHA expansion (interface dispatch)
- thisDispatchMs — this/super dispatch WASM re-parse (was already
                   tracked but now properly named alongside the rest)
- reclassifyMs — scoped role re-classification after edge insertion
- techniqueBackfillMs — technique-column UPDATE on native-written edges

Previously only thisDispatchMs was reported, causing wall-clock vs
phaseSum to diverge by 1.1s+ on 1-file rebuilds and making benchmark
regressions undiagnosable from committed history.

Updates update-incremental-report.ts to render the new phases in a
collapsible details block under each engine's 1-file rebuild section.

Closes #1434
…ld for required-tier grammars

The docstring claimed pool cost was "amortised over enough parse work" —
measurements show IPC overhead scales linearly (~55–64ms/file pool vs
~8–10ms/file inline). The real motivation is crash safety for exotic WASM
grammars (#965); JS/TS/TSX (required-tier, used in all this-dispatch
backfill calls) have never triggered the V8 fatal crash class and are safe
to run inline.

Raise threshold 16 → 32 to keep typical this-dispatch batches (≤ 18 files
on the codegraph corpus) on the inline fast path. Exotic-language drops are
almost always well under 32 files and also benefit from the inline path
without meaningful crash risk increase.

Closes #1435
@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


PR Review

  • Read changed files and gather context
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity
  • Phase 4: Code quality
  • Phase 5: Backlog compliance
  • Final assessment

Phase 1: Root Cause Analysis

Problem Being Solved: The INLINE_BACKFILL_THRESHOLD was set to 16, but the primary hot caller (runPostNativeThisDispatch) processes up to ~18 JS/TS files and was hitting the pool path unnecessarily. Measurements showed IPC overhead at ~55–64ms/file through the pool vs ~8–10ms/file inline — a ~3–6x penalty. The old docstring falsely implied pool overhead was "amortised over parse work," when in reality it scales linearly.

Why This Approach: Two orthogonal fixes in one minimal change — (1) raise the threshold past the measured batch size of 18, and (2) correct the misleading docstring to accurately explain the crash-safety rationale (not cost amortisation). The threshold of 32 gives headroom over 18 while still routing truly large batches through the pool.

Risk Assessment: Low. JS/TS/TSX are required-tier grammars with no history of the V8 fatal crash class. Raising the inline ceiling to 32 from 16 for files that call parseFilesWasmForBackfill means the only meaningful difference is performance. The exotic-language backfill path (e.g. HCL, ~4 files) benefits from inline too, with negligible additional crash risk.


Phase 2: Configuration Change Detection

No linting rules, test timeouts, or build configuration were changed. The only constant modified is INLINE_BACKFILL_THRESHOLD — a behavioral performance tuning parameter, not a quality gate. No concerns here.


Phase 3: Test Integrity

No tests were modified. There are no direct unit tests for parseFilesWasmForBackfill or INLINE_BACKFILL_THRESHOLD (confirmed via search) — the threshold is an internal constant exercised through integration tests. No assertions were weakened or removed.

Minor observation: the threshold is a magic constant with no test asserting the inline path is taken for expected batch sizes. This isn't a blocker, but a test that exercises parseFilesWasmForBackfill with a batch at the boundary (e.g. 31 vs 33 files) would prevent future regressions. Not required for this PR.


Phase 4: Code Quality

Docstring correctness (src/domain/parser.ts:1181–1201): The new docstring is materially more accurate:

  • Correctly identifies crash safety (not amortisation) as the core rationale ✓
  • Cites the specific measured numbers (55–64ms/file pool, 8–10ms/file inline) with issue reference ✓
  • Explains why JS/TS/TSX are safe inline (required-tier, no V8 fatal crash history) ✓
  • Acknowledges exotic drops also benefit from inline given they're typically ≤ 4 files ✓

Threshold value (src/domain/parser.ts:1202): 32 is well-justified. The measured batch is ≤ 18; 32 gives ~1.7x headroom. No realistic exotic-language scenario is expected to exceed 32 based on the documented HCL case being 4 files.

Secondary docstring (src/domain/parser.ts:1253–1265, parseFilesWasmForBackfill): This docstring still reads "Threshold matches typical engine-parity drop sizes (a few fixture files in one or two languages)" — that was accurate for the old 16 value but is now slightly misleading since the threshold was raised specifically to cover the larger this-dispatch batches, not just engine-parity drops. This is a minor inconsistency that could be addressed in a follow-up, but it's not incorrect.

No bugs, security issues, or architectural concerns introduced.


Phase 5: Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies
  • Foundation-aligned: ✓ — Pure internal performance tuning; no cloud APIs, no external services, no agent behavior
  • Problem-fit: 3/5 — Improves build performance (especially in native-engine post-pass), which indirectly speeds up graph generation for AI agents. Doesn't directly address token waste or hallucinated deps, but faster indexing is a meaningful quality-of-life improvement
  • Breaking: No — internal constant, no API or CLI contract changes
  • Tier: 1

Final Assessment

## Root Cause Analysis
Problem Being Solved: INLINE_BACKFILL_THRESHOLD=16 was below the measured
  batch size of ~18 JS/TS files in runPostNativeThisDispatch, causing a
  3–6x performance penalty from unnecessary IPC overhead. The old docstring
  also incorrectly described cost amortisation as the rationale.
Why This Approach: Raise to 32 (covers the 18-file batch with headroom),
  and correct the docstring to explain the real crash-safety motivation.
Risk Assessment: Low — JS/TS/TSX are required-tier with no V8 fatal crash
  history; raising the inline ceiling for them carries negligible crash risk.

## Backlog Compliance
- Zero-dep: ✓
- Foundation-aligned: ✓
- Problem-fit: 3/5 — performance improvement to graph indexing pipeline
- Breaking: No
- Tier: 1

## Minor Concerns
- src/domain/parser.ts:1253–1265: The `parseFilesWasmForBackfill` docstring
  still says "threshold matches typical engine-parity drop sizes" which is
  now somewhat misleading — the threshold was raised for this-dispatch batch
  coverage. Not blocking; can be addressed in follow-up.
- No boundary test exists for the threshold (31 vs 33 files). Not required
  but would prevent silent regressions.

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE
- Reasoning: The change is well-motivated with cited measurements and a
  referenced issue (#1435). The docstring correction accurately replaces
  misleading "amortisation" framing with the true crash-safety rationale.
  The threshold value is justified by the measured batch size. Minor
  inconsistency in the secondary docstring is cosmetic and non-blocking.

@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR corrects the INLINE_BACKFILL_THRESHOLD docstring (from a misleading "IPC cost is amortised" rationale to the accurate crash-safety explanation) and raises the threshold from 16 to 32 to keep typical JS/TS/TSX this-dispatch batches on the faster inline path. Two incidental optimisations are also included in native-orchestrator.ts.

  • parser.ts: Threshold doubled to 32; docstring now explains the V8 crash-isolation purpose of the worker pool, the safety of JS/TS/TSX inline, and the measured 3–6× IPC penalty that motivates the higher limit.
  • native-orchestrator.ts: Backfill guard simplified to check only the gap arrays (safe — backfillNativeDroppedFiles already returns immediately on an empty gap); COUNT(*) re-count now skipped via postPassWroteData when no post-pass wrote new rows.

Confidence Score: 5/5

Safe to merge — the threshold change is a well-measured constant bump, and the native-orchestrator logic simplifications are guarded by the function's own early-return.

The threshold raise is backed by concrete measurements and narrows crash risk minimally (JS/TS/TSX have never triggered the V8 fatal class). The backfill-condition simplification is safe because backfillNativeDroppedFiles already returns immediately when both gap arrays are empty. The COUNT(*) fast path correctly excludes only backfillEdgeTechniquesAfterNativeOrchestrator, which updates an existing column rather than inserting new rows.

No files require special attention.

Important Files Changed

Filename Overview
src/domain/parser.ts Threshold raised 16→32 with a rewritten docstring explaining crash-safety rationale; parseFilesWasmForBackfill docstring now defers to the constant for rationale, removing the stale 16-oriented text.
src/domain/graph/builder/stages/native-orchestrator.ts Two bonus optimizations alongside the parser.ts change: (1) simplified backfill guard — safe because backfillNativeDroppedFiles already returns early on an empty gap; (2) COUNT(*) fast path skips the DB scan when no post-pass wrote new rows.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[parseFilesWasmForBackfill\ncalled with N files] --> B{N ≤ INLINE_BACKFILL_THRESHOLD\nnow 32, was 16}
    B -- yes\nJS/TS/TSX this-dispatch\nbatches ≤18 files --> C[parseFilesWasmInline\nmain-thread ~8–10ms/file]
    B -- no\nlarge or exotic-language drops --> D[parseFilesWasm via worker pool\ncrash-isolated ~55–64ms/file]
    C --> E[Return results]
    D --> E

    subgraph native-orchestrator fast paths
        F[detectDroppedLanguageGap] --> G{gap empty?}
        G -- yes --> H[Skip backfillNativeDroppedFiles]
        G -- no --> I[backfillNativeDroppedFiles\nwrites nodes+edges]
        I --> J{postPassWroteData?\nbackfill OR CHA edges\nOR this-dispatch targets}
        J -- no --> K[Skip COUNT star scan\nRust counts still accurate]
        J -- yes --> L[COUNT star re-count\nupdate build_meta]
    end
Loading

Reviews (5): Last reviewed commit: "fix(perf): guard post-native passes on 1..." | Re-trigger Greptile

…nce threshold constant

The secondary docstring still described the old 16-value rationale
("engine-parity drop sizes"). Replace with a pointer to
INLINE_BACKFILL_THRESHOLD where the full rationale now lives.
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Fixed — updated the stale parseFilesWasmForBackfill docstring at line 1256 to replace the old 16-value rationale ("engine-parity drop sizes") with a pointer to INLINE_BACKFILL_THRESHOLD where the full rationale now lives (f90d4ba).

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

Base automatically changed from fix/native-phase-timings-1434 to main June 14, 2026 01:40
carlos-alm and others added 2 commits June 13, 2026 19:46
docs check acknowledged — merge resolution only; no new features or API changes to document
…1493)

* fix(perf): guard post-native passes against unnecessary work on 1-file incremental rebuilds

On 1-file native incremental builds, two JS post-passes ran unconditionally
even when they had no work to do:

- `backfillNativeDroppedFiles`: called whenever changedCount > 0, even when
  detectDroppedLanguageGap returned an empty gap. Gate now checks
  gap.missingAbs.length > 0 || gap.staleRel.length > 0 directly, matching
  backfillNativeDroppedFiles's own internal early-exit guard.

- Node/edge COUNT(*) re-count: ran unconditionally after all post-passes even
  when none of them wrote any edges. COUNT(*) over 50K+ edge tables is
  non-trivial, especially via the NativeDbProxy napi-rs round-trip. Now gated
  on postPassWroteData (backfill | CHA edges | this-dispatch edges).

Closes #1454

* refactor(perf): hoist backfillHappened before if to avoid duplicate expression

Greptile suggested hoisting the backfillHappened variable declaration above
the conditional that guards backfillNativeDroppedFiles, so the boolean
expression is written exactly once. Previously the condition was evaluated
in both the if-guard and the const declaration on the following line.
@github-actions

Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

1 functions changed5 callers affected across 5 files

  • tryNativeOrchestrator in src/domain/graph/builder/stages/native-orchestrator.ts:1384 (5 transitive callers)

@carlos-alm carlos-alm merged commit 8bec68c into main Jun 14, 2026
23 checks passed
@carlos-alm carlos-alm deleted the fix/backfill-threshold-1435 branch June 14, 2026 05:53
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Backfill worker-pool path: per-file cost ~3x the wasm-engine pipeline; INLINE_BACKFILL_THRESHOLD rationale outdated

1 participant