feat(skills): /parity engine-parity audit skill + titan-run integration#1474
Conversation
|
Claude finished @carlos-alm's task —— View job PR #1474 Review — feat(skills): /parity engine-parity audit skill + titan-run integration
Root Cause AnalysisProblem Being Solved: This PR adds a permanent, automated gate to enforce the core CLAUDE.md invariant that WASM and native engines must produce identical results. The problem is real and well-evidenced — the script's first run caught the missing Phase 8.2 return-type propagation (fixed in the base branch #1465), and the full sweep exposed 10/36 fixtures with pre-existing divergences (filed as #1466–#1472). Without this gate, engine-parity regressions are discovered incidentally or not at all. Why This Approach: The approach is correct. Rather than embedding engine comparisons in the existing vitest resolution-benchmark (which tests accuracy against hand-annotated ground truth), this script directly compares engine outputs against each other — a complementary, orthogonal signal. The three-path (wasm/hybrid/native) localization table in Risk Assessment: Low. All three changed files are in non-compiled paths ( Phase 2: Configuration Change Detection✓ No linting rules disabled, no test timeouts increased, no TypeScript strict flags touched. Clean. Phase 3: Test Integrity✓ No test files modified. No assertions weakened. The script itself adds a new verification surface that is stricter, not looser. Phase 4: Code Quality
|
Greptile SummaryThis PR adds a correctness-parity gate (
Confidence Score: 5/5Safe to merge — all three files are purely additive and don't touch existing production code paths. All previously flagged blocking issues have been resolved in the head commit. The remaining comments are style-level improvements that don't affect correctness of the gate itself. scripts/parity-compare.mjs warrants a second look for the minor nits; the two SKILL.md files are clean. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[titan-run start] --> B[RECON]
B --> C[GAUNTLET]
C --> D[SYNC]
D --> E[FORGE]
E --> F[GRIND]
F --> G{.claude/skills/parity/SKILL.md exists?}
G -- NO --> K[Print: PARITY skipped]
G -- YES --> H{Code changes this run?}
H -- NO --> K
K --> L[CLOSE]
H -- YES --> I[Step 4.6 PARITY - Dispatch /parity sub-agent]
I --> J{Divergences introduced by this run?}
J -- None or pre-existing --> L
J -- New drift unfixed --> M[STOP - block CLOSE]
subgraph parity-compare.mjs
P1[Pre-flight: dist + native + codesign + require.resolve]
P2[Build each fixture: wasm / native / hybrid]
P3[readMultisets: node + edge multisets]
P4[diffMultisets: wasm vs native, wasm vs hybrid]
P5{Diffs?}
P6[exit 0 PARITY OK]
P7[exit 1 DIVERGED]
P1 --> P2 --> P3 --> P4 --> P5
P5 -- No --> P6
P5 -- Yes --> P7
end
I -.-> parity-compare.mjs
Reviews (7): Last reviewed commit: "fix(parity): guard empty --langs, captur..." | Re-trigger Greptile |
| function diffMultisets(base, other) { | ||
| const diffs = []; | ||
| const keys = new Set([...base.keys(), ...other.keys()]); | ||
| for (const key of keys) { |
There was a problem hiding this comment.
__TOTAL_ROWS__ sentinel leaks into diff output and inflates counts. diffMultisets iterates over all keys in both maps, including the synthetic __TOTAL_ROWS__ entry. Whenever node or edge counts differ this sentinel appears as an extra diff entry (e.g. [node] __TOTAL_ROWS__ wasm=100 native=101) in human-readable output, increments nodeDiffs.length/edgeDiffs.length by one, and appears in the JSON report's nodeDiffs/edgeDiffs arrays — making automated consumers see N+1 diffs and making the human output harder to read. Filter it out before diffing.
| function diffMultisets(base, other) { | |
| const diffs = []; | |
| const keys = new Set([...base.keys(), ...other.keys()]); | |
| for (const key of keys) { | |
| function diffMultisets(base, other) { | |
| const diffs = []; | |
| const keys = new Set([...base.keys(), ...other.keys()]); | |
| keys.delete('__TOTAL_ROWS__'); | |
| for (const key of keys) { |
There was a problem hiding this comment.
Fixed — added keys.delete('__TOTAL_ROWS__') in diffMultisets before iterating, so the sentinel is never included in diffs or diff counts (commit f062fb9).
| async function buildEngine(fixtureDir, engine, label) { | ||
| const dir = mkdtempSync(join(tmpdir(), `parity-${label}-`)); | ||
| cpSync(fixtureDir, dir, { recursive: true }); | ||
| await buildGraph(dir, { ...BUILD_OPTS, engine }); | ||
| return dir; | ||
| } | ||
|
|
||
| // Hybrid path: an incremental wasm build followed by an incremental native | ||
| // build on the same dir triggers "Engine changed (wasm -> native), promoting | ||
| // to full rebuild", which sets forceFullRebuild and skips the orchestrator — | ||
| // the JS pipeline then drives the napi buildCallEdges resolver. | ||
| async function buildHybrid(fixtureDir, label) { | ||
| const dir = mkdtempSync(join(tmpdir(), `parity-${label}-`)); | ||
| cpSync(fixtureDir, dir, { recursive: true }); | ||
| await buildGraph(dir, { ...BUILD_OPTS, incremental: true, engine: 'wasm' }); | ||
| await buildGraph(dir, { ...BUILD_OPTS, incremental: true, engine: 'native' }); | ||
| return dir; | ||
| } |
There was a problem hiding this comment.
Temp dir leak when
buildGraph throws inside buildEngine/buildHybrid. Both helpers mkdtempSync a directory and then call buildGraph. If buildGraph rejects, the function throws before returning dir, so the caller never reaches tempDirs.push(...) — the directory is orphaned and the finally block won't clean it up. On a 36-fixture run where several fixtures fail, this can silently accumulate multiple copied-fixture trees in /tmp. A try/finally inside each helper (or tracking the dir on the caller side before awaiting) would ensure cleanup on failure.
There was a problem hiding this comment.
Fixed — tempDirs.push(dir) is now called immediately after mkdtempSync, before the first await buildGraph(), in both buildEngine and buildHybrid. The helpers now accept tempDirs as a parameter. Call sites updated accordingly (commit f062fb9).
| ```bash | ||
| codesign --sign - --force crates/codegraph-core/*.node | ||
| ``` | ||
| 4. Verify the loader picks it up: | ||
| ```bash | ||
| node -e "import('./dist/infrastructure/native.js').then(m => console.log(m.isNativeAvailable()))" | ||
| ``` | ||
| If `false`, stop and report — auditing parity without the native engine is | ||
| meaningless. Note: if the repo (or a parent) has |
There was a problem hiding this comment.
Pre-flight check can't distinguish locally-built binary from the installed package binary. The skill asks the agent to verify
isNativeAvailable() returns true, but that call returns true for the published node_modules/@optave/codegraph-<platform>-<arch>/ binary just as readily as for the freshly built one. If the installed package is present, Node resolves it first, isNativeAvailable() passes, and the entire parity audit silently benchmarks the published binary — missing any bug introduced by local Rust changes. The skill documents the manual override (copy the .node file), but doesn't instruct the agent to verify which path is actually resolved (e.g. via require.resolve or checking the binary's mtime against the Rust source). Without that check, Phase 0 can pass even when auditing the wrong binary.
There was a problem hiding this comment.
Fixed — Phase 0 step 4 now instructs the agent to run require.resolve to check the actual resolved path first. If it points to the installed package binary, the agent copies the freshly built binary over it before running the availability check (commit f062fb9).
Codegraph Impact Analysis8 functions changed → 7 callers affected across 2 files
|
|
Addressed Greptile and Claude review feedback:
Claude review observations acknowledged:
|
|
Claude finished @carlos-alm's task —— View job Re-review: Fixes from f062fb9 + 8c967d0
All four issues raised in the first review and by Greptile are correctly addressed. Walking through each: Fix 1 — Fix 2 — Temp dir leak on Fix 3 — Pre-flight can't detect installed vs. locally built binary ( Fix 4 — Step numbering gap ( Rating: ⭐⭐⭐⭐⭐ (5/5) All previously raised issues are resolved cleanly. The |
| let fixtures = allFixtures; | ||
| if (langsFilter) { | ||
| const unknown = langsFilter.filter((l) => !allFixtures.includes(l)); | ||
| if (unknown.length > 0) { |
There was a problem hiding this comment.
--langs with no value (e.g. the flag is the last argument or passed as --langs=) produces langsFilter = [] via ''.split(',').filter(Boolean). An empty array is truthy, so the if (langsFilter) branch executes, fixtures is filtered to nothing, the loop runs zero iterations, report.ok stays true, and the script exits 0 with "PARITY OK — 0 fixture(s), all engines identical". A CI gate that passes on zero audited fixtures gives a completely false signal.
| let fixtures = allFixtures; | |
| if (langsFilter) { | |
| const unknown = langsFilter.filter((l) => !allFixtures.includes(l)); | |
| if (unknown.length > 0) { | |
| let fixtures = allFixtures; | |
| if (langsFilter) { | |
| if (langsFilter.length === 0) { | |
| console.error('parity-compare: --langs requires at least one fixture name.'); | |
| process.exit(2); | |
| } | |
| const unknown = langsFilter.filter((l) => !allFixtures.includes(l)); | |
| if (unknown.length > 0) { |
There was a problem hiding this comment.
Fixed — added an early guard: if langsFilter.length === 0 (empty after parse), exits 2 with an explicit error message before reaching the fixture filter. Also fixed the related headBefore/headAfter capture gap in titan-run step 4.6b/4.6e noted in the Greptile summary.
0d590ea to
14f5232
Compare
|
Addressed latest Greptile feedback:
|
…/titan-run scripts/parity-compare.mjs builds every resolution-benchmark fixture with the wasm, native, and (--hybrid) hybrid paths and compares the full node and edge multisets (kind, name, file, line, confidence, dynamic flag) — zero divergence is the only passing state. It caught the Phase 8.2 cross-file return-type propagation gap in the native orchestrator on its first run, plus the pre-existing divergences now tracked in #1466-#1472. The /parity skill runs the audit, localizes each divergence by which build paths disagree (wasm/hybrid/native table), fixes the root cause in whichever engine is wrong, and re-verifies until clean. /titan-run gains a conditional Step 4.7 — PARITY between grind and close. The orchestrator stays repo-agnostic: a repo opts in by shipping its own .claude/skills/parity/SKILL.md; without one the step prints a skip note and continues.
…y binary path check
…p 4.6 - Guard against --langs with no value: an empty langsFilter array is truthy so filtering produces zero fixtures and exits 0 — a false pass. Now exits 2 with an explicit error message. - Add headBefore capture to step 4.6b and headAfter capture to step 4.6e so the commit-audit command in 4.6e has concrete variables to expand.
d7e5d0e to
4296323
Compare
Stacked on #1465 (
refactor/engine-structure-parity) — only the last commit (1709a2c) is new here.Summary
scripts/parity-compare.mjs— permanent correctness-parity gate. Builds every resolution-benchmark fixture (36 today) with the wasm and native engines (plus the hybrid JS-pipeline + napi-solver path with--hybrid) into isolated temp dirs and compares the full node and edge multisets — kind, name, file, line, confidence, dynamic flag. Zero divergence is the only passing state; exit 1 on any diff,--jsonfor machine-readable reports,--langsto scope..claude/skills/parity/SKILL.md— the/parityskill: pre-flight (fresh dist + napi build + macOS codesign + loader check), audit, root-cause localization via the wasm/hybrid/native disagreement table, fix rules (mirror semantics exactly in both engines, never document a gap as expected), verification loop, report..claude/skills/titan-run/SKILL.md— new conditional Step 4.7 — PARITY between GRIND and CLOSE. titan-run is shared across repos, so the step is repo-agnostic: a repo opts in by shipping its own/parityskill; without one the orchestrator prints a skip note and continues. Drift introduced by the run blocks CLOSE; pre-existing divergences are filed as issues per scope discipline.Proven in anger
The script's first run caught a real native-engine bug — the missing Phase 8.2 cross-file return-type propagation, fixed in #1465 (aa88fda). The subsequent full sweep (
--hybrid) found 26/36 fixtures at exact three-path parity and surfaced pre-existing divergences in the other 10, filed as #1466, #1467, #1468, #1469, #1470, #1471, #1472. Until those land, a full-sweep audit honestly exits 1 — that is the gate working as designed, not a defect of this PR.Test plan
node scripts/parity-compare.mjs --langs javascript,pts-javascript --hybrid→ OK (exit 0)--langs)