Skip to content

fix(wasm): port missing node extractions to JS extractor (jelly-micro #1471)#1509

Merged
carlos-alm merged 32 commits into
mainfrom
fix/js-extractor-nodes-1471
Jun 14, 2026
Merged

fix(wasm): port missing node extractions to JS extractor (jelly-micro #1471)#1509
carlos-alm merged 32 commits into
mainfrom
fix/js-extractor-nodes-1471

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

The WASM JS extractor was missing 25 nodes that the native Rust extractor found in the jelly-micro fixture. Four root causes, each fixed independently:

  • Computed class method names (['property7'], ['generator10'], etc.): the query patterns in parser.ts and wasm-worker-entry.ts only matched property_identifier and private_property_identifier as the name field of method_definition. Added a third pattern for computed_property_name.

  • Class expressions inside functions (PostMixin, B): wasm-worker-entry.ts had a single-string JS_CLASS_PATTERN matching only class_declaration. Renamed to JS_CLASS_PATTERNS array and added the (class name: …) pattern for class-expression nodes. Also added (class name: (type_identifier) …) to TS_EXTRA_PATTERNS for TS parity.

  • Array-destructured constants (const [x, y] = …): both extractDestructuredBindingsWalk (query path) and handleVariableDecl (walk path) only handled object_pattern. Added an array_pattern branch that emits a single constant node with the full array pattern as the name, matching native behaviour.

  • Parameters of prototype arrow/function methods (Arit.prototype.sum = (x, y) => …): emitPrototypeMethod emitted the definition without calling extractParameters. Added the call and wired children through.

parity-compare.mjs --langs jelly-micro now reports 0 node diffs (was 25). 22 pre-existing edge diffs remain and are tracked in #1506.

Test plan

Closes #1471

… diff

Adds snapshot-pre-bash.sh (PreToolUse Bash) + track-bash-writes.sh
(PostToolUse Bash): the pre-hook captures git status --porcelain to a
per-worktree temp file before each Bash call; the post-hook diffs the
before/after state and appends newly modified or created files to
.claude/session-edits.log.

This closes the gap where files written by sed -i, printf redirects,
tee, heredocs, or build tools (Cargo.lock, lockfiles) were never
recorded, causing guard-git.sh to emit false-positive BLOCKED errors.

Closes #1457
- clojure.rs: annotate lifetime-anchor assignment to silence false-positive
- cfg.rs: remove never-called start_line_of method
- complexity.rs: remove never-constructed NotHandled variant; convert
  irrefutable if-let patterns to plain let destructures
- dataflow.rs: remove never-read callee fields from CallReturn/Destructured
- incremental.rs: remove never-read lang field from CacheEntry

cargo check and cargo clippy both clean after these changes.
Adds .github/workflows/perf-canary.yml — a path-filtered workflow that
fires on PRs touching src/extractors/, src/domain/graph/, or crates/**
and runs only the incremental-benchmark suite (full build + no-op +
1-file rebuild, both engines). Catches the class of regressions that
accumulated invisibly across the Phase 8.x PRs and were only detected
at v3.12.0 publish time.

The regression guard gains BENCH_CANARY=1 mode: raises thresholds to
50%/100%/150% (standard/noisy/WASM) and skips the build, query, and
resolution suites — only incremental checks run. This absorbs shared-
runner timing variance while still blocking catastrophic regressions
(+98% full build, +1827% 1-file rebuild from v3.12.0).

Closes #1433
On incremental builds, runPostNativeCha previously scanned all
call→qualified-method edges in the DB (~12ms flat, O(graph size)),
even for 1-file changes where no hierarchy or RTA evidence changed.

Add two cheap indexed gate queries. Gate A checks whether any changed
file introduced a class/interface/trait/struct/record node (hierarchy
may have new implementors reachable from unchanged call sites). Gate B
checks whether any changed file added a call edge to a class-kind target
(RTA set may have grown, enabling previously filtered expansions in
unchanged callers). If neither gate fires, restrict the candidate query
to src.file IN changedFiles — safe because the hierarchy and instantiated
set are unchanged for all other files.

Full builds (isFullBuild=true) and cases where either gate fires retain
the existing full-scan behaviour. Mirrors the changed-files scoping
pattern of runPostNativeThisDispatch.

Closes #1441
Times each JS post-pass in tryNativeOrchestrator and exposes the
measurements in BuildResult.phases:

- gapDetectMs  — dropped-language gap detection + backfill
- chaMs        — CHA expansion (interface dispatch)
- thisDispatchMs — this/super dispatch WASM re-parse (was already
                   tracked but now properly named alongside the rest)
- reclassifyMs — scoped role re-classification after edge insertion
- techniqueBackfillMs — technique-column UPDATE on native-written edges

Previously only thisDispatchMs was reported, causing wall-clock vs
phaseSum to diverge by 1.1s+ on 1-file rebuilds and making benchmark
regressions undiagnosable from committed history.

Updates update-incremental-report.ts to render the new phases in a
collapsible details block under each engine's 1-file rebuild section.

Closes #1434
…ld for required-tier grammars

The docstring claimed pool cost was "amortised over enough parse work" —
measurements show IPC overhead scales linearly (~55–64ms/file pool vs
~8–10ms/file inline). The real motivation is crash safety for exotic WASM
grammars (#965); JS/TS/TSX (required-tier, used in all this-dispatch
backfill calls) have never triggered the V8 fatal crash class and are safe
to run inline.

Raise threshold 16 → 32 to keep typical this-dispatch batches (≤ 18 files
on the codegraph corpus) on the inline fast path. Exotic-language drops are
almost always well under 32 files and also benefit from the inline path
without meaningful crash risk increase.

Closes #1435
…e incremental rebuilds

On 1-file native incremental builds, two JS post-passes ran unconditionally
even when they had no work to do:

- `backfillNativeDroppedFiles`: called whenever changedCount > 0, even when
  detectDroppedLanguageGap returned an empty gap. Gate now checks
  gap.missingAbs.length > 0 || gap.staleRel.length > 0 directly, matching
  backfillNativeDroppedFiles's own internal early-exit guard.

- Node/edge COUNT(*) re-count: ran unconditionally after all post-passes even
  when none of them wrote any edges. COUNT(*) over 50K+ edge tables is
  non-trivial, especially via the NativeDbProxy napi-rs round-trip. Now gated
  on postPassWroteData (backfill | CHA edges | this-dispatch edges).

Closes #1454
The post-pass it timed (runPostNativePrototypeMethods) was deleted in
b5c03a2 when func-prop extraction moved to Rust (#1432). The optional
field was never set by any code path that survived the deletion.

Also remove the stale reference to "prototype-methods post-pass" from
the parseFilesWasmForBackfill docstring — only the this-dispatch
post-pass uses symbolsOnly now.

Closes #1432
… collision

Field type annotations (`private repo: OrderRepository`) were seeded as bare
file-wide typeMap keys, causing `this.repo` inside `UserService` to resolve to
`OrderRepository` when both classes had a `repo` field (issue #1458).

Both extractors (TS `handleFieldDefTypeMap` and Rust `field_definition` branch)
now seed `ClassName.field` keys at confidence 0.9, matching the `CallerClass.X`
resolver fallback added in PR #1382. Bare keys are kept at confidence 0.6 as
fallbacks for single-class files or class expressions where no enclosing class
name is available.

Both engines change identically — parity preserved.
…ed names

The resolution benchmark uses WASM-built graphs where the Elixir, Julia,
and Objective-C extractors emit module-qualified symbol names (Main.run,
App.main, UserService.create_user, etc.). The expected-edges manifests
were written with bare unqualified names (run, main, create_user), so
every correctly-resolved edge appeared as a false positive and every
expected edge appeared as a false negative — causing all three languages
to show 0% precision even though resolution was working correctly.

Root cause: starting in v3.12.0, cross-module call resolution began working
for these languages (via the improved receiver-dispatch and same-class
fallback in resolveByMethodOrGlobal / build-edges.ts). With 0 edges
previously resolved, the name mismatch was invisible; once edges started
resolving, the manifests showed 17 FP (elixir), 11 FP (julia), 6 FP
(objc) — all correctly resolved edges misidentified as false positives.

Fix:
- Update all three expected-edges.json manifests to use the
  module-qualified names matching actual extractor output:
  elixir: Main.run, UserService.create_user, Validators.validate_user, etc.
  julia:  App.main, Service.create_user, Repository.new_repo, etc.
  objc:   full ObjC selectors (createUserWithId:name:email:, isValidEmail:, etc.)
          plus add main -> run (plain C call correctly resolved)
- Ratchet THRESHOLDS for all three:
  elixir: precision 0.0 -> 1.0, recall 0.0 -> 0.8  (17/21 resolved)
  julia:  precision 0.0 -> 1.0, recall 0.0 -> 0.7  (11/15 resolved)
  objc:   precision 0.0 -> 1.0, recall 0.0 -> 0.4   (6/13 resolved)

Remaining FNs are genuine unresolved edges (same-file bare calls in
elixir/julia, receiver-typed message sends in objc) — not regressions.

Closes #1447
The JS C++ and CUDA extractors had no handler for 'declaration' AST nodes,
so typeMap was never seeded for statically-typed locals (e.g. 'UserService svc;').
Without a typeMap entry for 'svc', resolveReceiverEdge had nothing to look up and
silently skipped the receiver edge.

Add handleCppDeclaration / handleCudaDeclaration to both extractors. They mirror
match_c_family_type_map ('declaration' branch) from the native Rust path: extract
the type node text and seed typeMap[varName] = { type, confidence: 0.9 } for each
identifier or init_declarator child. Primitive types (int, char, bool, …) are
skipped to avoid spurious edges.

parity-compare.mjs --langs cpp,cuda --hybrid: PARITY OK (wasm = native = hybrid)
All 3044 tests pass.
… in Rust solver

Go extractor was only seeding typeMap for var_spec and parameter_declaration,
missing short_var_declaration. Added infer_short_var_types to handle:
- x := Struct{} → conf 1.0 (composite literal)
- x := &Struct{} → conf 1.0 (address-of composite)
- x := NewFoo() / x := pkg.NewFoo() → conf 0.7 (New* factory prefix)

Python extractor was only seeding typeMap for typed_parameter and
typed_default_parameter, missing plain assignment. Added
infer_py_assignment_type to handle:
- order = Order(...) → conf 1.0 (uppercase constructor)
- obj = Module.Class(...) → conf 0.7 (uppercase module prefix, non-builtin)

Both mirror the existing JS extractors exactly. Parity check for
go and python: wasm vs native/hybrid OK.
…l, zig)

Both engines used different rules for attributing calls inside variable bindings:

WASM: attributed to the narrowest enclosing span regardless of kind, so
local variable declarations inside fn main() shadowed the enclosing function
(Zig: calls attributed to repo/svc variables instead of main), and nested
let-bindings inside a Haskell do-block shadowed the top-level main binding.

Native: loaded allNodes from a query that excluded 'variable' kind, so
top-level Haskell bind nodes (main = do …, kind='variable') never matched
in defs_with_ids, causing all calls to fall back to the file node.

Unified rule implemented in findCaller (TS) and find_enclosing_caller (Rust):
- Function/method definitions are preferred over any variable/constant binding
  as the enclosing caller scope — local var declarations inside a function
  body never shadow the enclosing function (fixes Zig repo/svc attribution).
- When no function/method encloses the call, fall back to the WIDEST
  (outermost) variable/constant binding — this handles Haskell where main
  is a top-level bind node with kind 'variable'. Widest span is used so that
  nested let-bindings do not shadow the outer main binding.
- File node remains the absolute last resort.

Also adds 'variable' to NODE_KIND_FILTER_SQL (JS) and EDGE_NODE_KIND_FILTER
(Rust pipeline.rs) so top-level variable bindings are included in the allNodes
set available for caller matching.

parity-compare.mjs --langs haskell,zig --hybrid: PARITY OK — 2/2 fixtures.
…and test

Remove unused TypeMapEntry import from cpp.ts and cuda.ts, reformat
primitive-type Set literals and test expect() calls to satisfy biome
line-length rules.
Java was the only fixture where all three build paths (wasm, native,
hybrid) disagreed pairwise.

Bug 1 — WASM typeMap pollution:
`handleJavaLocalVarDecl` used last-wins Map.set(), so the local
`InMemoryUserRepository repo` in the static `createDefault()` method
silently overrode the constructor parameter `UserRepository repo`.
This caused WASM to bypass the interface and resolve directly to the
concrete class, producing no interface edge and the wrong receiver.
Fix: switch to first-wins `setTypeMapEntry` to match Rust extractor
semantics. First-wins preserves the interface annotation that drives
correct CHA dispatch.

Bug 2 — native vs wasm/hybrid confidence mismatch:
`runPostNativeCha` (native orchestrator path) used
`computeConfidence − CHA_DISPATCH_PENALTY = 0.7 − 0.1 = 0.6`,
while `runChaPostPass` (DB post-pass used by wasm and hybrid) hardcodes
0.8. Fix: align `runPostNativeCha` to also use 0.8.

Result: all three build paths now emit identical edges and confidences.
`parity-compare.mjs --langs java --hybrid` passes.
Updated expected-edges.json to include both the interface declaration
edge (TypeRepository.X at 0.7) and the CHA-expanded impl edge
(InMemoryUserRepository.X at 0.8), which are the correct semantics for
an interface-typed receiver.

Closes #1469
The inline CHA expansion in buildCallEdges and buildChaPostPass used
computeConfidence(relPath, t.file) - CHA_DISPATCH_PENALTY for all CHA
targets, producing 0.6 for cross-directory interface dispatch (same-dir
= 0.7, minus 0.1 penalty). runChaPostPass (helpers.ts) and
runPostNativeCha (native-orchestrator.ts) both hardcode 0.8 for
interface/CHA-dispatch edges.

The deduplication in runChaPostPass uses the existing DB edge as-is
and skips reinsertion, so the 0.6 edges from the inline pass were never
upgraded to 0.8.

Fix: typed-receiver (interface) dispatch branches now use hardcoded 0.8
matching the post-pass constants. The this/super branch keeps
computeConfidence-based proximity scoring to remain aligned with
runPostNativeThisDispatch.

parity-compare.mjs --langs typescript --hybrid goes green (was 12
edge diffs). Closes #1470

docs check acknowledged
…s unbounded

Replace `|| Infinity` with `?? Infinity` in findCaller so that a definition
with endLine=0 (a valid single-line node at the start of a file) is not
incorrectly treated as having unbounded span.
@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


PR Review: fix(wasm): port missing node extractions to JS extractor (#1471)

  • Gather context and understand the PR
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity review
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance grading
  • Final assessment

Root Cause Analysis

Problem Being Solved: The WASM JS extractor was missing 25 nodes that the native Rust extractor correctly extracted from the jelly-micro fixture. The PR traces four distinct root causes and fixes each independently.

Why This Approach: Correct. The 25-node gap between engines is a real parity bug, not a cosmetic difference. Fixing extraction logic is the right response rather than documenting a gap. Each fix is surgical and independently justified.

Risk Assessment: Low. All four changes add new cases to existing switch/if chains — they don't alter the behavior of existing paths. New tests validate the specific patterns affected.


Phase 2: Configuration Change Detection

.claude/settings.json — adds two new PostToolUse hooks (track-bash-writes.sh and track-moves.sh) plus two PreToolUse hooks (snapshot-pre-bash.sh and guard-pr-body.sh). These are Claude Code quality/guard hooks. No linting rules relaxed, no TypeScript strict flags changed. ✓

.github/workflows/perf-canary.yml — new lightweight per-PR performance canary that runs only on extractor/graph/crates changes. Uses a 50% threshold (explicitly documented) — this is a separate gate from the 25% full-suite gate, not a relaxation. The BENCH_CANARY codepath in the regression guard was already there before this PR. ✓

tests/benchmarks/regression-guard.test.ts — adds NOISY_METRICS logic (No-op rebuild, 1-file rebuild, fnDeps depth 1) with a widened 50% threshold, justified by documented sub-30ms baseline volatility. The reasoning is consistent with the detailed comment (±10ms CI jitter = ±50%+ on sub-30ms metrics). This is not silencing a real regression — it's acknowledging measured CI variance on metrics where MIN_ABSOLUTE_DELTA already provides a floor. ✓ (Acceptable, though see concern below.)

No lint rules relaxed. No test timeouts raised. No coverage thresholds lowered. Clean pass.


Phase 3: Test Integrity

New tests in tests/parsers/javascript.test.ts cover all four root causes with positive and negative cases:

All 12 new tests assert correct behavior and include meaningful negative cases. No existing assertions weakened or removed. ✓

One concern: The regression-guard.test.ts change — widening 1-file rebuild and No-op rebuild thresholds from 25% to 50% and CANARY mode to 100%. The justification is empirical CI variance, but this PR does not include CI data showing the variance. The reasoning is plausible given how sub-30ms metrics behave, and the MIN_ABSOLUTE_DELTA floor mitigates the risk, but it weakens the guard without attached evidence.


Phase 4: Code Quality

Fix 1: Computed class method names

parser.ts:161 and wasm-worker-entry.ts:118: Added '(method_definition name: (computed_property_name) @meth_name) @meth_node' to COMMON_QUERY_PATTERNS. Both files are kept in sync (confirmed manually). Pattern is correct — computed_property_name is the tree-sitter node type for ['key']. ✓

Fix 2: Class expressions inside functions

parser.ts:173–177 and wasm-worker-entry.ts:129–133: JS_CLASS_PATTERNS expanded from a single class_declaration string to an array with a second (class name: ...) pattern for class expressions. TS_EXTRA_PATTERNS gets the type_identifier equivalent. Both files match. ✓

The walk path in javascript.ts:735 already handled 'class' node type — so this fix correctly brings the query path into parity with the walk path.

Fix 3: Array-destructured constants

javascript.ts:510–519 (query path extractDestructuredBindingsWalk) and javascript.ts:1029–1038 (walk path handleVariableDecl): Both paths get an array_pattern branch that emits a single constant node with the full pattern text (e.g., [x, y]). The approach — emit one node for the whole pattern rather than individual bindings — matches native engine behavior as stated. ✓

Minor observation: The array_pattern branch does not call extractDestructuredBindings (which would be wrong for array patterns). The design choice to emit the whole pattern string as the name is unusual but documented as matching native. This is acceptable, though worth flagging: if the native engine ever changes this to emit individual bindings, parity will break silently again.

Fix 4: Prototype method parameters

javascript.ts:3380–3388 (emitPrototypeMethod): extractParameters(rhs) is now called and the result is wired into the children field. This mirrors exactly what handleFunctionDecl and handleVarFnCapture do. ✓

One correctness concern: emitPrototypeMethod uses nodeStartLine(rhs) as the definition's line, not nodeStartLine(lhs) (the full assignment statement). For Arit.prototype.sum = (x, y) => x + y; on a single line this is equivalent, but for multi-line assignments like:

Arit.prototype.sum =
  (x, y) => x + y;

the definition line would be 2 (the arrow) rather than 1 (the assignment). This was pre-existing behavior, not introduced by this PR, but the parameter change makes it more visible.

Other changed files

  • src/domain/graph/builder/call-resolver.ts, native-orchestrator.ts, build-edges.ts: These are larger changes not directly related to the four stated fixes — they appear to be independent improvements bundled in this PR. Mixed concerns in one PR is noted as a mild scope concern (see below).
  • crates/codegraph-core/src/extractors/go.rs (+191 lines), python.rs (+154 lines): These are substantial native extractor additions also bundled here.
  • crates/codegraph-core/src/domain/graph/builder/stages/build_edges.rs (+119 lines).

Scope concern: The PR description accurately describes the four JS/WASM fixes, but the diff is significantly larger — it includes Go extractor additions, Python extractor additions, build-edges changes, benchmark infrastructure, and hook additions that are unrelated to the stated problem. Per the project's CLAUDE.md "One PR = one concern" policy, these unrelated additions should have been separate PRs.


Phase 5: Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies. package.json dependencies section unchanged (3 existing deps). Hooks and CI use existing toolchain.

Foundation-aligned: ✓ — Fully local extraction fix. No cloud calls, no new infrastructure, no code generation. The parity fix directly improves symbol accuracy.

Problem-fit: 5/5 — Accurate node extraction is the foundational requirement for everything else: blast-radius analysis, call resolution, AI navigation. Missing 25 nodes is a direct hallucination source for AI agents using the WASM path. This fix eliminates false negatives.

Breaking: No — purely additive. Existing symbols continue to be emitted; new symbols are now also emitted where they were previously missing.

Tier: 1 (zero-dep ✓ + foundation-aligned ✓)


Critical Concerns

  1. PR scope violation (moderate): The diff contains significant changes beyond the four stated JS extractor fixes — Go extractor additions (+191 lines), Python extractor additions (+154 lines), Rust build_edges changes (+119 lines), benchmark infrastructure, and hook additions. CLAUDE.md explicitly requires "One PR = one concern." This makes the PR harder to review safely and harder to revert if any single component causes a regression.

  2. Regression guard threshold widening without CI data (minor): regression-guard.test.ts widens 1-file rebuild/No-op rebuild from 25%→50% threshold with reasoning that is plausible but unverified in this PR. Recommend attaching a CI run showing the variance that motivated the change.

  3. Computed method name representation (informational): Methods like ['property7'] are stored with the brackets as part of the name. There is no test asserting that the resolver can actually find these methods when called. This is a pre-existing design choice, not introduced here, but it may cause resolution failures on computed property access that are worth a follow-up issue.


Final Recommendation

## Root Cause Analysis
Problem Being Solved: WASM JS extractor missing 25 nodes vs native engine on jelly-micro fixture.
Why This Approach: Direct extraction fixes for four root causes — correct approach per codebase philosophy.
Risk Assessment: Low for the four JS extractor fixes. Moderate for bundled unrelated changes.

## Backlog Compliance
- Zero-dep: ✓
- Foundation-aligned: ✓
- Problem-fit: 5 — directly fixes false-negative node extraction that causes AI hallucination
- Breaking: No
- Tier: 1

## Critical Concerns
1. PR scope violation — Go, Python, Rust, and benchmark changes bundled with JS extractor fix
2. Regression guard threshold widening lacks CI variance data to justify change

## Final Recommendation
- Rating: ⭐⭐⭐⭐ (4/5)
- Action: APPROVE WITH CONDITIONS
- Reasoning: The four JS extractor fixes are correct, well-tested, and properly scoped. The
  new tests are high quality with positive/negative cases. The PR scope is larger than stated
  but the unrelated additions appear safe. The one concrete ask before merge: confirm the
  regression-guard threshold change is backed by empirical CI data (a comment or linked run
  showing the variance on the affected metrics would suffice).

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

4 functions changed12 callers affected across 2 files

  • loadLanguageLazy in src/domain/wasm-worker-entry.ts:429 (2 transitive callers)
  • extractDestructuredBindingsWalk in src/extractors/javascript.ts:482 (2 transitive callers)
  • handleVariableDecl in src/extractors/javascript.ts:969 (3 transitive callers)
  • emitPrototypeMethod in src/extractors/javascript.ts:3372 (8 transitive callers)

@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR closes the 25-node parity gap between the WASM JS extractor and the native Rust extractor for the jelly-micro fixture by fixing four independent extraction gaps: computed method names, named class expressions, array-destructured constants, and prototype method parameters.

  • Computed method names: adds a computed_property_name query pattern in both parser.ts and wasm-worker-entry.ts, capturing bracket-syntax methods like ['property7']() in both class bodies and object literals.
  • Class expressions: converts the single-string JS_CLASS_PATTERN to a JS_CLASS_PATTERNS array adding (class name: (identifier)) for class-expression nodes, with a matching type_identifier variant in TS_EXTRA_PATTERNS.
  • Array destructuring: adds an array_pattern branch in both extractDestructuredBindingsWalk and handleVariableDecl, emitting a single constant node whose name is the full pattern text ([x, y]), consistent with native behaviour.
  • Prototype method parameters: emitPrototypeMethod now calls extractParameters and threads the results into children, fixing arrow and function-expression prototype method parameter capture.

Confidence Score: 5/5

Safe to merge — all four extraction fixes are narrow, well-tested, and independently scoped with 12 new tests directly covering the added code paths.

All changes are additive (new branches / new array entries) with no modification to existing extraction logic. Each fix is covered by new tests, and the PR author reports 0 node diffs on the target fixture. The one unaddressed gap (shorthand prototype methods in extractPrototypeObjectLiteral) is a pre-existing omission, not a regression introduced here.

src/extractors/javascript.ts — the extractPrototypeObjectLiteral method_definition branch is worth a follow-up to add parameter extraction for consistency.

Important Files Changed

Filename Overview
src/domain/parser.ts Adds computed_property_name pattern to COMMON_QUERY_PATTERNS so computed class/object method names like ['key'] are matched during tree-sitter query extraction. Minimal, targeted change.
src/domain/wasm-worker-entry.ts Adds computed_property_name query pattern, converts JS_CLASS_PATTERN string to JS_CLASS_PATTERNS array with a new class-expression pattern, and adds the class-expression pattern to TS_EXTRA_PATTERNS. The spread into the query array correctly mirrors the TS path.
src/extractors/javascript.ts Adds array_pattern handling in both extractDestructuredBindingsWalk (query path) and handleVariableDecl (walk path), and adds extractParameters call in emitPrototypeMethod. One minor inconsistency remains: shorthand prototype methods in extractPrototypeObjectLiteral still omit parameters.
tests/parsers/javascript.test.ts Adds 12 new tests covering all four fix categories: computed method names, class expressions inside functions, array destructuring constants, and prototype method parameters. Tests are well-scoped and exercise the precise parity gaps described in the PR.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[JS/TS Source File] --> B{Extraction Path}
    B --> C[Query Path tree-sitter patterns]
    B --> D[Walk Path runCollectorWalk]
    C --> C1[COMMON_QUERY_PATTERNS +computed_property_name]
    C --> C2[JS_CLASS_PATTERNS array +class expression]
    C --> C3[TS_EXTRA_PATTERNS +class expression]
    C --> C4[extractDestructuredBindingsWalk +array_pattern]
    D --> D1[handleVariableDecl +array_pattern]
    D --> D2[emitPrototypeMethod +extractParameters]
    D --> D3[extractPrototypeObjectLiteral method_definition branch - still no extractParameters]
    C1 --> E[Definitions Output]
    C2 --> E
    C3 --> E
    C4 --> E
    D1 --> E
    D2 --> E
    D3 --> E
Loading

Reviews (10): Last reviewed commit: "fix: resolve merge conflicts with main" | Re-trigger Greptile

Comment thread src/extractors/cpp.ts
Comment on lines +233 to +236
const varName = unwrapCppDeclaratorName(nameNode);
if (varName) {
ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unlike handleJavaLocalVarDecl which was explicitly changed to setTypeMapEntry (first-wins) in this same PR to match Rust extractor semantics, handleCppDeclaration uses ctx.typeMap.set directly (last-wins). If the Rust match_c_family_type_map uses first-wins (as the Java path does), a file with a constructor parameter typed IUserService svc followed by a local UserServiceImpl svc would resolve to the concrete type, potentially misrouting CHA dispatch that should fan out through the interface hierarchy. setTypeMapEntry imported just above would be the consistent choice here.

Suggested change
const varName = unwrapCppDeclaratorName(nameNode);
if (varName) {
ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });
}
const varName = unwrapCppDeclaratorName(nameNode);
if (varName) {
setTypeMapEntry(ctx.typeMap, varName, typeName, 0.9);
}

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced ctx.typeMap.set with setTypeMapEntry (first-wins) in handleCppDeclaration and added setTypeMapEntry to the helpers import. Consistent with the Java handler change and the Rust extractor semantics.

Comment thread src/extractors/cuda.ts
Comment on lines +233 to +236
const varName = extractCudaFieldName(nameNode);
if (varName) {
ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Same issue as handleCppDeclaration — uses ctx.typeMap.set (last-wins) rather than setTypeMapEntry (first-wins), inconsistent with the Java handler change in this PR. The same concern applies: a later concrete-type declaration would overwrite an earlier interface-typed parameter for the same name, potentially misrouting CHA dispatch. setTypeMapEntry should be imported and used here to match the intended first-wins semantics.

Suggested change
const varName = extractCudaFieldName(nameNode);
if (varName) {
ctx.typeMap.set(varName, { type: typeName, confidence: 0.9 });
}
const varName = extractCudaFieldName(nameNode);
if (varName) {
setTypeMapEntry(ctx.typeMap, varName, typeName, 0.9);
}

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced ctx.typeMap.set with setTypeMapEntry (first-wins) in handleCudaDeclaration and added setTypeMapEntry to the helpers import. Same fix as the C++ handler, consistent with Java and Rust extractor semantics.

Closes #1471

Four extraction gaps in the WASM JS extractor caused 25 node diffs
against the native engine in the jelly-micro parity fixture:

1. Computed property method names (e.g. `['property7']`): the query
   patterns in both parser.ts and wasm-worker-entry.ts only matched
   `property_identifier` and `private_property_identifier` as the
   `name` field of `method_definition`; add a third pattern for
   `computed_property_name`.

2. Class expressions inside functions (`return class PostMixin …`):
   wasm-worker-entry.ts JS_CLASS_PATTERN was a single string matching
   only `class_declaration`; rename to JS_CLASS_PATTERNS array and add
   the `(class name: …)` pattern for anonymous class-expression nodes.
   Also add the matching `(class name: (type_identifier) …)` pattern
   to TS_EXTRA_PATTERNS so TS class expressions are covered too.

3. Array destructuring constants (`const [x, y] = …`): both
   `extractDestructuredBindingsWalk` (query path) and
   `handleVariableDecl` (walk path) only handled `object_pattern`; add
   an `array_pattern` branch that emits a single `constant` node whose
   name is the full array pattern text — matching native behaviour.

4. Parameters of prototype arrow/function methods
   (`Arit.prototype.sum = (x, y) => …`): `emitPrototypeMethod` emitted
   the definition node without calling `extractParameters`, so children
   were always absent; add the call and wire the result through.

After this fix `parity-compare.mjs --langs jelly-micro` reports
0 node diffs (was 25). 22 pre-existing edge diffs remain (tracked
in #1506) — those are out of scope.
@carlos-alm carlos-alm force-pushed the fix/js-extractor-nodes-1471 branch from 37bb91a to 7509e4f Compare June 13, 2026 03:19
@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment on lines +570 to +584
const chunkRows = db
.prepare(
`SELECT e.source_id, tgt.name AS method_name, src.file AS caller_file
FROM edges e
JOIN nodes tgt ON e.target_id = tgt.id
JOIN nodes src ON e.source_id = src.id
WHERE e.kind = 'calls' AND tgt.kind = 'method'
AND INSTR(tgt.name, '.') > 0
AND src.file IN (${ph})`,
)
.all(...chunk) as Array<{
source_id: number;
method_name: string;
caller_file: string | null;
}>;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 The scoped-file scan loop (used when scopeToChangedFiles is true) doesn't populate caller_file for affected-files tracking because the column is never read from the result rows. The full-scan path returns caller_file (used downstream in affectedFiles.add(caller_file)), so the scoped path silently produces an empty affectedFiles set even when edges are emitted. Downstream role re-classification is then skipped entirely for those edge-endpoints instead of being scoped to just the affected files.

Suggested change
const chunkRows = db
.prepare(
`SELECT e.source_id, tgt.name AS method_name, src.file AS caller_file
FROM edges e
JOIN nodes tgt ON e.target_id = tgt.id
JOIN nodes src ON e.source_id = src.id
WHERE e.kind = 'calls' AND tgt.kind = 'method'
AND INSTR(tgt.name, '.') > 0
AND src.file IN (${ph})`,
)
.all(...chunk) as Array<{
source_id: number;
method_name: string;
caller_file: string | null;
}>;
const chunkRows = db
.prepare(
`SELECT e.source_id, tgt.name AS method_name, src.file AS caller_file
FROM edges e
JOIN nodes tgt ON e.target_id = tgt.id
JOIN nodes src ON e.source_id = src.id
WHERE e.kind = 'calls' AND tgt.kind = 'method'
AND INSTR(tgt.name, '.') > 0
AND src.file IN (${ph})`,
)
.all(...chunk) as Array<{
source_id: number;
method_name: string;
caller_file: string | null;
}>;
rows.push(...chunkRows);

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scoped-file path already selects src.file AS caller_file and includes it in the typed result — see lines 572–584 of the current code. The comment at line 559 was added precisely to document this. affectedFiles.add(caller_file) at line 662 therefore fires for both the scoped and full-scan paths. No change needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The diff already shows src.file AS caller_file in the SELECT (line 579) and caller_file: string | null in the typed result (line 584). My comment was incorrect — the column is present in both the query and the type annotation, so affectedFiles.add(caller_file) at line 662 will work correctly for the scoped path too. Apologies for the noise.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

Addressed the two out-of-scope concerns raised in the review:

The scope concern (Go/Python/Rust changes bundled) is acknowledged; it's a past decision in this PR that cannot be reversed now without significant churn. Going forward these will be split per the one-PR-one-concern policy.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit be38a6d into main Jun 14, 2026
23 checks passed
@carlos-alm carlos-alm deleted the fix/js-extractor-nodes-1471 branch June 14, 2026 12:22
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 14, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Engine parity: JS extractor misses 25 nodes the native extractor finds (jelly-micro)

1 participant