feat(skills): add /parity engine-parity audit skill and wire it into /titan-run

carlos-alm · carlos-alm · commit 2b79216c2201 · 2026-06-11T21:54:17.000-06:00
scripts/parity-compare.mjs builds every resolution-benchmark fixture with the wasm, native, and (--hybrid) hybrid paths and compares the full node and edge multisets (kind, name, file, line, confidence, dynamic flag) — zero divergence is the only passing state. It caught the Phase 8.2 cross-file return-type propagation gap in the native orchestrator on its first run, plus the pre-existing divergences now tracked in #1466-#1472. The /parity skill runs the audit, localizes each divergence by which build paths disagree (wasm/hybrid/native table), fixes the root cause in whichever engine is wrong, and re-verifies until clean. /titan-run gains a conditional Step 4.7 — PARITY between grind and close. The orchestrator stays repo-agnostic: a repo opts in by shipping its own .claude/skills/parity/SKILL.md; without one the step prints a skip note and continues.
diff --git a/.claude/skills/parity/SKILL.md b/.claude/skills/parity/SKILL.md
@@ -0,0 +1,177 @@
+---
+name: parity
+description: Audit WASM/native engine correctness parity across all resolution fixtures and fix any divergence at the root cause — both engines must produce identical graphs
+argument-hint: "[--langs js,python] [--hybrid] [--audit-only]"
+allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Agent
+---
+
+# /parity — Engine Correctness Parity Audit & Fix
+
+Codegraph has two engines that MUST produce identical results (see CLAUDE.md):
+
+- **wasm** — JS pipeline + JS extractors + JS edge resolution
+- **native** — full Rust orchestrator (`crates/codegraph-core/src/domain/graph/builder/pipeline.rs`)
+- **hybrid** — JS pipeline + napi `buildCallEdges` (the fallback when the
+  orchestrator is skipped: forced full rebuilds, older addons)
+
+This skill runs `scripts/parity-compare.mjs`, which builds every
+resolution-benchmark fixture with each engine and compares the **full node and
+edge multisets** (kind, name, file, line, confidence, dynamic flag). Any
+difference is a bug in the less-accurate engine — never an acceptable gap, and
+never something to document as expected. The skill finds the root cause, fixes
+it, and re-verifies until the audit is clean.
+
+## Arguments
+
+- `$ARGUMENTS` may contain:
+  - `--langs a,b,c` — restrict to specific fixture names (e.g. `javascript,pts-javascript`)
+  - `--hybrid` — also audit the hybrid path (recommended; slower)
+  - `--audit-only` — report divergences without fixing them
+  - No arguments — full audit across all fixtures, then fix divergences
+
+## Phase 0 — Pre-flight
+
+All steps run from the repo root.
+
+1. Confirm `scripts/parity-compare.mjs` exists. If not, this repo doesn't have
+   the parity tooling — stop and report.
+2. Build the TypeScript dist (the script imports `dist/index.js`, and extractor
+   changes in `src/` are invisible until rebuilt):
+   ```bash
+   npm run build
+   ```
+3. Ensure the native addon reflects the local Rust source:
+   ```bash
+   cd crates/codegraph-core && npx napi build --platform --release && cd ../..
+   ```
+   On macOS, locally built binaries must be re-signed or Node kills the process
+   (exit 137):
+   ```bash
+   codesign --sign - --force crates/codegraph-core/*.node
+   ```
+4. Verify the loader picks it up:
+   ```bash
+   node -e "import('./dist/infrastructure/native.js').then(m => console.log(m.isNativeAvailable()))"
+   ```
+   If `false`, stop and report — auditing parity without the native engine is
+   meaningless. Note: if the repo (or a parent) has
+   `node_modules/@optave/codegraph-<platform>-<arch>/` installed, Node resolves
+   that package **before** the crate-local build — copy the freshly built
+   binary over `codegraph-core.node` in that package dir, or the audit will
+   silently test the published binary instead of your changes.
+
+## Phase 1 — Audit
+
+Run the comparison (pass through `--langs` / `--hybrid` from `$ARGUMENTS`):
+
+```bash
+node scripts/parity-compare.mjs [--langs ...] [--hybrid] 2>/dev/null
+```
+
+- Exit 0 → parity holds. Skip to Phase 4 and report a clean audit.
+- Exit 1 → divergences or fixture build failures. Collect every `[node]` /
+  `[edge]` diff line and any `BUILD FAILED` fixtures.
+- Exit 2 → pre-flight failure; go back to Phase 0.
+
+For machine-readable output (useful when many fixtures diverge), re-run with
+`--json` and parse `fixtures[].comparisons[].nodeDiffs/edgeDiffs`.
+
+If `--audit-only` was passed: report the diffs (Phase 4 format) and stop.
+
+## Phase 2 — Root-cause and fix
+
+For each divergence, identify which engine is wrong — the one missing edges or
+producing lower-quality resolution is usually the buggy one, but verify by
+reading the fixture source and deciding what the *correct* graph is.
+
+**Localize the bug by which paths disagree:**
+
+| wasm | hybrid | native | Bug location |
+|------|--------|--------|--------------|
+| A | A | B | Rust pipeline prep (`pipeline.rs`) or Rust extractor (`crates/.../extractors/`) — the napi solver gets correct input from JS but the orchestrator's own input differs |
+| A | B | B | Rust `build_edges.rs` solver (shared by hybrid + native) |
+| A | B | A | JS↔napi boundary: `NativeFileEntry` plumbing in `build-edges.ts` or the wasm-worker protocol |
+| B | A | A | JS extractor or JS resolution (`src/extractors/`, `src/domain/graph/builder/stages/build-edges.ts`) |
+
+**Fix rules (from CLAUDE.md — non-negotiable):**
+
+- Fix the extraction/resolution layer that produces incorrect results. Never
+  add comments, tests, or fixture exclusions that frame wrong output as
+  expected.
+- Changes may land in either language or both — create the best version based
+  on both implementations, don't restrict the fix to one side.
+- The module layout is mirrored between `src/` and `crates/codegraph-core/src/`
+  — read the TS and Rust counterparts side by side (e.g.
+  `src/domain/graph/builder/stages/build-edges.ts` ↔
+  `crates/.../domain/graph/builder/stages/build_edges.rs`).
+- Mirror *semantics exactly*: confidence constants, hop penalties, tie-breaking
+  order, first-wins vs highest-wins rules. A 0.05 confidence difference is a
+  parity failure.
+- Add a focused unit test next to the fix (Rust `#[cfg(test)]` or vitest) that
+  pins the behavior.
+
+**Gotchas that mask fixes:**
+
+- `src/` changes need `npm run build` before the script (which imports dist)
+  sees them.
+- Rust changes need the napi rebuild + macOS codesign from Phase 0.
+- New `ExtractorOutput` fields must be added to `SerializedExtractorOutput` in
+  `src/domain/wasm-worker-{protocol,entry,pool}.ts` or they are silently
+  dropped at the Worker-thread boundary.
+- New per-file fields crossing the napi boundary need: the `FileSymbols` /
+  `FileEdgeInput` structs in `crates/.../types.rs` & `build_edges.rs`, the
+  `NativeFileEntry` assembly in `build-edges.ts`, and the orchestrator's own
+  assembly in `pipeline.rs` (`build_and_insert_call_edges`). Missing the last
+  one produces hybrid-OK/native-broken splits.
+- Out-of-scope findings discovered along the way (pre-existing bugs, refactor
+  opportunities) → `gh issue create` immediately, then continue.
+
+## Phase 3 — Verify
+
+Repeat until the audit is clean — never stop at "fewer diffs than before":
+
+1. Rebuild whichever side changed (`npm run build` / napi build + codesign).
+2. Re-run the Phase 1 audit command. Any remaining divergence → back to Phase 2.
+3. Once clean, run the full verification suite — all must pass:
+   ```bash
+   cargo test --manifest-path crates/codegraph-core/Cargo.toml
+   npm test
+   npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts
+   ```
+   (From a `.claude` worktree, vitest needs the worktree override config —
+   check memory/project notes if no tests are found.)
+4. If any verification step cannot run, STOP and report it — never proceed
+   with unverified changes.
+
+## Phase 4 — Report
+
+Print a summary:
+
+```
+PARITY AUDIT — <date>
+Fixtures audited: N (wasm vs native[, hybrid])
+Divergences found: M
+Fixed: <file:line summary per fix, with engine + root cause>
+Verification: cargo test ✓ | npm test ✓ | resolution benchmark ✓
+Issues filed: #NNN (out-of-scope findings)
+```
+
+- If divergences were found and fixed, list each root cause in one line —
+  which engine was wrong, which layer, what semantic was mismatched.
+- If `--audit-only`: list divergences grouped by fixture with the
+  wasm/hybrid/native localization table applied.
+- Suggest committing engine fixes separately from unrelated work (one PR = one
+  concern).
+
+## Rules
+
+- **Zero divergence is the only passing state** — a single edge differing in
+  confidence is a failure.
+- **Never exclude a fixture or file to make the audit pass.**
+- **Never run the audit against a stale dist or stale native binary** — Phase 0
+  is mandatory after any code change.
+- **The wasm/hybrid/native disagreement pattern localizes the bug** — use the
+  table before reading code.
+- **Both engines evolve together**: a feature added to one engine without the
+  other is a parity bug from day one. New resolution techniques must land in
+  `src/` and `crates/codegraph-core/src/` in the same PR.
diff --git a/.claude/skills/titan-run/SKILL.md b/.claude/skills/titan-run/SKILL.md
@@ -1,7 +1,7 @@
 ---
 name: titan-run
-description: Run the full Titan Paradigm pipeline end-to-end by dispatching each phase to sub-agents with fresh context windows. Orchestrates recon → gauntlet → sync → forge → grind automatically.
-argument-hint: <path (default: .)> <--skip-recon> <--skip-gauntlet> <--start-from recon|gauntlet|sync|forge|grind> <--gauntlet-batch-size 5> <--yes>
+description: Run the full Titan Paradigm pipeline end-to-end by dispatching each phase to sub-agents with fresh context windows. Orchestrates recon → gauntlet → sync → forge → grind (+ repo-provided parity audit) automatically.
+argument-hint: <path (default: .)> <--skip-recon> <--skip-gauntlet> <--start-from recon|gauntlet|sync|forge|grind|parity> <--gauntlet-batch-size 5> <--yes>
 allowed-tools: Agent, Read, Bash, Glob, Write, Edit
 ---
 
@@ -16,7 +16,7 @@ You are the **orchestrator** for the full Titan Paradigm pipeline. Your job is t
 - `<path>` → target path (passed to recon)
 - `--skip-recon` → skip recon (assumes artifacts exist)
 - `--skip-gauntlet` → skip gauntlet (assumes artifacts exist)
-- `--start-from <phase>` → jump to phase: `recon`, `gauntlet`, `sync`, `forge`, `grind`
+- `--start-from <phase>` → jump to phase: `recon`, `gauntlet`, `sync`, `forge`, `grind`, `parity`
 - `--gauntlet-batch-size <N>` → batch size for gauntlet (default: 5)
 - `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) and in forge (per-phase confirmation)
 
@@ -50,7 +50,7 @@ You are the **orchestrator** for the full Titan Paradigm pipeline. Your job is t
    node -e "const fs=require('fs');const s=JSON.parse(fs.readFileSync('.codegraph/titan/titan-state.json','utf8'));s.phaseTimestamps=s.phaseTimestamps||{};s.phaseTimestamps['<PHASE>']=s.phaseTimestamps['<PHASE>']||{};s.phaseTimestamps['<PHASE>'].completedAt=new Date().toISOString();fs.writeFileSync('.codegraph/titan/titan-state.json',JSON.stringify(s,null,2));"
    ```
 
-   Replace `<PHASE>` with `recon`, `gauntlet`, `sync`, `forge`, or `close`. **Run the start command immediately before dispatching each phase's first sub-agent, and the completion command immediately after post-phase validation passes.** If resuming a phase (e.g., gauntlet loop iteration 2+), do NOT overwrite `startedAt` — only set it if it doesn't already exist.
+   Replace `<PHASE>` with `recon`, `gauntlet`, `sync`, `forge`, `parity`, or `close`. **Run the start command immediately before dispatching each phase's first sub-agent, and the completion command immediately after post-phase validation passes.** If resuming a phase (e.g., gauntlet loop iteration 2+), do NOT overwrite `startedAt` — only set it if it doesn't already exist.
 
    **Timestamp validation:** After recording `completedAt` for any phase, verify `startedAt < completedAt`:
    ```bash
@@ -639,7 +639,7 @@ Record `phaseTimestamps.forge.completedAt`.
 
 Grind runs after forge to close the adoption loop. Forge extracts helpers; grind wires them into consumers and removes dead code. Without grind, the dead symbol count inflates with every forge phase.
 
-**Skip if:** `--start-from` is `close`, or `titan-state.json → grind.completedPhases` already covers all forge phases.
+**Skip if:** `--start-from` is `parity` or `close`, or `titan-state.json → grind.completedPhases` already covers all forge phases.
 
 ### 4.5a. Pre-loop check
 
@@ -742,6 +742,54 @@ Record `phaseTimestamps.grind.completedAt`.
 
 ---
 
+## Step 4.7 — PARITY (conditional, repo-provided)
+
+Some repos ship multiple implementations of the same logic that must stay in lockstep (e.g. a dual native/WASM engine, a client and server copy of a validator). Forge and grind edit code across the tree; this step verifies those edits didn't leave one implementation behind.
+
+**titan-run is repo-agnostic** — never assume the target repo has engines, fixtures, or any parity surface. The contract: a repo opts in by shipping its own `/parity` skill at `.claude/skills/parity/SKILL.md` (wrapping whatever audit mechanism it uses internally). No skill → no parity phase.
+
+### 4.7a. Detect the repo's parity mechanism
+
+```bash
+test -f .claude/skills/parity/SKILL.md && echo "PARITY SKILL FOUND" || echo "NO PARITY SKILL"
+```
+
+- **NO PARITY SKILL** → print `"PARITY skipped — repo provides no /parity skill."` and continue to Step 5. Absence is normal for most repos; do not warn.
+- **PARITY SKILL FOUND** → continue below.
+
+**Skip also if:** `--start-from` is `close`, or the pipeline made no code changes this run (`titan-state.json → execution.commits` empty/absent AND no grind adoption commits) — unless `--start-from parity` was given explicitly, which always runs the audit.
+
+### 4.7b. Record phase start
+
+Record `phaseTimestamps.parity.startedAt`.
+
+### 4.7c. Run Pre-Agent Gate (G1-G4)
+
+### 4.7d. Dispatch sub-agent
+
+```
+Agent → "Run /parity. Read .claude/skills/parity/SKILL.md and follow it exactly.
+         Skip worktree check — already handled.
+         Audit every surface the skill covers. Fix any divergence introduced by
+         recent commits at the root cause, commit the fixes, and re-verify until
+         the audit is clean. If a divergence pre-dates this run (verify via
+         git log on the relevant files), follow the skill's and repo's rules for
+         pre-existing findings (typically: file an issue, don't expand scope)."
+```
+
+### 4.7e. Post-phase validation
+
+After the agent returns:
+- `git status --short` → the working tree must be clean. The sub-agent commits its fixes; uncommitted changes mean it stopped mid-fix → **stop** and report.
+- If the agent fixed divergences, run V16-style commit audit: `git log --oneline <headBefore>..<headAfter>` and print the parity-fix commits.
+- If the agent reports divergences introduced by THIS run that it could not fix → **stop**: "PARITY failed — this run introduced implementation drift. Fix before CLOSE or revert the offending commits." Pre-existing divergences filed as issues are not blockers; print the issue URLs.
+
+Print: `"PARITY complete: <clean | N divergences fixed | N pre-existing filed as issues>"`
+
+Record `phaseTimestamps.parity.completedAt`.
+
+---
+
 ## Step 5 — CLOSE (report + PRs)
 
 After forge completes, dispatch `/titan-close` to produce the final report with before/after metrics and split commits into focused PRs.
@@ -778,6 +826,7 @@ Record `phaseTimestamps.close.completedAt`.
 - **NDJSON corrupt lines:** Warn but continue — partial results are better than none. The corrupt lines are logged so the user knows which targets to re-audit.
 - **Merge conflict detected by pre-agent gate:** Stop immediately with the conflicting files listed.
 - **Tests fail after forge phase:** Stop immediately. Print the failing phase's commits so the user can revert.
+- **Parity audit fails on drift introduced by this run:** Stop before CLOSE. Retry with `/titan-run --start-from parity` after fixing or reverting.
 - **Validation failure (any V-check marked FAILED):** Stop with details. Warn-level V-checks are logged but don't stop the pipeline.
 
 ---
@@ -786,7 +835,7 @@ Record `phaseTimestamps.close.completedAt`.
 
 - **You are the orchestrator, not the executor.** Never run codegraph commands, edit source files, or make commits yourself. Only spawn sub-agents and read state files. Exceptions (pure validation/snapshot, no code changes): the post-forge test run (V13), NDJSON integrity checks, the V3 baseline snapshot check (`codegraph snapshot list`), and the pre-forge architectural snapshot capture (Step 3.5a) are run directly by the orchestrator.
 - **Run the Pre-Agent Gate (G1-G4) before EVERY sub-agent.** No exceptions.
-- **One sub-agent at a time.** Phases are sequential — recon before gauntlet, gauntlet before sync, sync before forge, forge before grind, grind before close.
+- **One sub-agent at a time.** Phases are sequential — recon before gauntlet, gauntlet before sync, sync before forge, forge before grind, grind before parity (when the repo provides one), parity before close.
 - **Fresh context per sub-agent.** This is the whole point — each sub-agent gets a clean context window.
 - **Read AND validate state files after every sub-agent.** Trust the on-disk state, not the sub-agent's text output — but verify the state is structurally sound.
 - **Back up state before every sub-agent.** The `.bak` file is your safety net against mid-write crashes.
diff --git a/scripts/parity-compare.mjs b/scripts/parity-compare.mjs