Skip to content

Commit 06e7452

Browse files
authored
feat(skills): /parity engine-parity audit skill + titan-run integration (#1474)
* feat(skills): add /parity engine-parity audit skill and wire it into /titan-run scripts/parity-compare.mjs builds every resolution-benchmark fixture with the wasm, native, and (--hybrid) hybrid paths and compares the full node and edge multisets (kind, name, file, line, confidence, dynamic flag) — zero divergence is the only passing state. It caught the Phase 8.2 cross-file return-type propagation gap in the native orchestrator on its first run, plus the pre-existing divergences now tracked in #1466-#1472. The /parity skill runs the audit, localizes each divergence by which build paths disagree (wasm/hybrid/native table), fixes the root cause in whichever engine is wrong, and re-verifies until clean. /titan-run gains a conditional Step 4.7 — PARITY between grind and close. The orchestrator stays repo-agnostic: a repo opts in by shipping its own .claude/skills/parity/SKILL.md; without one the step prints a skip note and continues. * fix(parity): filter sentinel, register temp dirs before await, clarify binary path check * fix(titan-run): renumber parity step from 4.7 to 4.6 to close gap after grind (4.5) * fix(parity): guard empty --langs, capture headBefore/headAfter in step 4.6 - Guard against --langs with no value: an empty langsFilter array is truthy so filtering produces zero fixtures and exits 0 — a false pass. Now exits 2 with an explicit error message. - Add headBefore capture to step 4.6b and headAfter capture to step 4.6e so the commit-audit command in 4.6e has concrete variables to expand.
1 parent 204862c commit 06e7452

3 files changed

Lines changed: 550 additions & 6 deletions

File tree

.claude/skills/parity/SKILL.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
---
2+
name: parity
3+
description: Audit WASM/native engine correctness parity across all resolution fixtures and fix any divergence at the root cause — both engines must produce identical graphs
4+
argument-hint: "[--langs js,python] [--hybrid] [--audit-only]"
5+
allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Agent
6+
---
7+
8+
# /parity — Engine Correctness Parity Audit & Fix
9+
10+
Codegraph has two engines that MUST produce identical results (see CLAUDE.md):
11+
12+
- **wasm** — JS pipeline + JS extractors + JS edge resolution
13+
- **native** — full Rust orchestrator (`crates/codegraph-core/src/domain/graph/builder/pipeline.rs`)
14+
- **hybrid** — JS pipeline + napi `buildCallEdges` (the fallback when the
15+
orchestrator is skipped: forced full rebuilds, older addons)
16+
17+
This skill runs `scripts/parity-compare.mjs`, which builds every
18+
resolution-benchmark fixture with each engine and compares the **full node and
19+
edge multisets** (kind, name, file, line, confidence, dynamic flag). Any
20+
difference is a bug in the less-accurate engine — never an acceptable gap, and
21+
never something to document as expected. The skill finds the root cause, fixes
22+
it, and re-verifies until the audit is clean.
23+
24+
## Arguments
25+
26+
- `$ARGUMENTS` may contain:
27+
- `--langs a,b,c` — restrict to specific fixture names (e.g. `javascript,pts-javascript`)
28+
- `--hybrid` — also audit the hybrid path (recommended; slower)
29+
- `--audit-only` — report divergences without fixing them
30+
- No arguments — full audit across all fixtures, then fix divergences
31+
32+
## Phase 0 — Pre-flight
33+
34+
All steps run from the repo root.
35+
36+
1. Confirm `scripts/parity-compare.mjs` exists. If not, this repo doesn't have
37+
the parity tooling — stop and report.
38+
2. Build the TypeScript dist (the script imports `dist/index.js`, and extractor
39+
changes in `src/` are invisible until rebuilt):
40+
```bash
41+
npm run build
42+
```
43+
3. Ensure the native addon reflects the local Rust source:
44+
```bash
45+
cd crates/codegraph-core && npx napi build --platform --release && cd ../..
46+
```
47+
On macOS, locally built binaries must be re-signed or Node kills the process
48+
(exit 137):
49+
```bash
50+
codesign --sign - --force crates/codegraph-core/*.node
51+
```
52+
4. Verify the loader picks up the **locally built** binary, not the published
53+
package. First check which path is actually resolved:
54+
```bash
55+
node -e "
56+
const { createRequire } = require('node:module');
57+
const r = createRequire(require.resolve('./dist/index.js'));
58+
try { console.log(r.resolve('codegraph-core')); } catch { console.log('not found via require'); }
59+
"
60+
```
61+
If the resolved path points to
62+
`node_modules/@optave/codegraph-<platform>-<arch>/codegraph-core.node`
63+
(the installed package), copy your freshly built binary over it:
64+
```bash
65+
cp crates/codegraph-core/*.node node_modules/@optave/codegraph-<platform>-<arch>/codegraph-core.node
66+
```
67+
Then confirm the loader picks it up:
68+
```bash
69+
node -e "import('./dist/infrastructure/native.js').then(m => console.log(m.isNativeAvailable()))"
70+
```
71+
If `false`, stop and report — auditing parity without the native engine is
72+
meaningless.
73+
74+
## Phase 1 — Audit
75+
76+
Run the comparison (pass through `--langs` / `--hybrid` from `$ARGUMENTS`):
77+
78+
```bash
79+
node scripts/parity-compare.mjs [--langs ...] [--hybrid] 2>/dev/null
80+
```
81+
82+
- Exit 0 → parity holds. Skip to Phase 4 and report a clean audit.
83+
- Exit 1 → divergences or fixture build failures. Collect every `[node]` /
84+
`[edge]` diff line and any `BUILD FAILED` fixtures.
85+
- Exit 2 → pre-flight failure; go back to Phase 0.
86+
87+
For machine-readable output (useful when many fixtures diverge), re-run with
88+
`--json` and parse `fixtures[].comparisons[].nodeDiffs/edgeDiffs`.
89+
90+
If `--audit-only` was passed: report the diffs (Phase 4 format) and stop.
91+
92+
## Phase 2 — Root-cause and fix
93+
94+
For each divergence, identify which engine is wrong — the one missing edges or
95+
producing lower-quality resolution is usually the buggy one, but verify by
96+
reading the fixture source and deciding what the *correct* graph is.
97+
98+
**Localize the bug by which paths disagree:**
99+
100+
| wasm | hybrid | native | Bug location |
101+
|------|--------|--------|--------------|
102+
| A | A | B | Rust pipeline prep (`pipeline.rs`) or Rust extractor (`crates/.../extractors/`) — the napi solver gets correct input from JS but the orchestrator's own input differs |
103+
| A | B | B | Rust `build_edges.rs` solver (shared by hybrid + native) |
104+
| A | B | A | JS↔napi boundary: `NativeFileEntry` plumbing in `build-edges.ts` or the wasm-worker protocol |
105+
| B | A | A | JS extractor or JS resolution (`src/extractors/`, `src/domain/graph/builder/stages/build-edges.ts`) |
106+
107+
**Fix rules (from CLAUDE.md — non-negotiable):**
108+
109+
- Fix the extraction/resolution layer that produces incorrect results. Never
110+
add comments, tests, or fixture exclusions that frame wrong output as
111+
expected.
112+
- Changes may land in either language or both — create the best version based
113+
on both implementations, don't restrict the fix to one side.
114+
- The module layout is mirrored between `src/` and `crates/codegraph-core/src/`
115+
— read the TS and Rust counterparts side by side (e.g.
116+
`src/domain/graph/builder/stages/build-edges.ts`
117+
`crates/.../domain/graph/builder/stages/build_edges.rs`).
118+
- Mirror *semantics exactly*: confidence constants, hop penalties, tie-breaking
119+
order, first-wins vs highest-wins rules. A 0.05 confidence difference is a
120+
parity failure.
121+
- Add a focused unit test next to the fix (Rust `#[cfg(test)]` or vitest) that
122+
pins the behavior.
123+
124+
**Gotchas that mask fixes:**
125+
126+
- `src/` changes need `npm run build` before the script (which imports dist)
127+
sees them.
128+
- Rust changes need the napi rebuild + macOS codesign from Phase 0.
129+
- New `ExtractorOutput` fields must be added to `SerializedExtractorOutput` in
130+
`src/domain/wasm-worker-{protocol,entry,pool}.ts` or they are silently
131+
dropped at the Worker-thread boundary.
132+
- New per-file fields crossing the napi boundary need: the `FileSymbols` /
133+
`FileEdgeInput` structs in `crates/.../types.rs` & `build_edges.rs`, the
134+
`NativeFileEntry` assembly in `build-edges.ts`, and the orchestrator's own
135+
assembly in `pipeline.rs` (`build_and_insert_call_edges`). Missing the last
136+
one produces hybrid-OK/native-broken splits.
137+
- Out-of-scope findings discovered along the way (pre-existing bugs, refactor
138+
opportunities) → `gh issue create` immediately, then continue.
139+
140+
## Phase 3 — Verify
141+
142+
Repeat until the audit is clean — never stop at "fewer diffs than before":
143+
144+
1. Rebuild whichever side changed (`npm run build` / napi build + codesign).
145+
2. Re-run the Phase 1 audit command. Any remaining divergence → back to Phase 2.
146+
3. Once clean, run the full verification suite — all must pass:
147+
```bash
148+
cargo test --manifest-path crates/codegraph-core/Cargo.toml
149+
npm test
150+
npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts
151+
```
152+
(From a `.claude` worktree, vitest needs the worktree override config —
153+
check memory/project notes if no tests are found.)
154+
4. If any verification step cannot run, STOP and report it — never proceed
155+
with unverified changes.
156+
157+
## Phase 4 — Report
158+
159+
Print a summary:
160+
161+
```
162+
PARITY AUDIT — <date>
163+
Fixtures audited: N (wasm vs native[, hybrid])
164+
Divergences found: M
165+
Fixed: <file:line summary per fix, with engine + root cause>
166+
Verification: cargo test ✓ | npm test ✓ | resolution benchmark ✓
167+
Issues filed: #NNN (out-of-scope findings)
168+
```
169+
170+
- If divergences were found and fixed, list each root cause in one line —
171+
which engine was wrong, which layer, what semantic was mismatched.
172+
- If `--audit-only`: list divergences grouped by fixture with the
173+
wasm/hybrid/native localization table applied.
174+
- Suggest committing engine fixes separately from unrelated work (one PR = one
175+
concern).
176+
177+
## Rules
178+
179+
- **Zero divergence is the only passing state** — a single edge differing in
180+
confidence is a failure.
181+
- **Never exclude a fixture or file to make the audit pass.**
182+
- **Never run the audit against a stale dist or stale native binary** — Phase 0
183+
is mandatory after any code change.
184+
- **The wasm/hybrid/native disagreement pattern localizes the bug** — use the
185+
table before reading code.
186+
- **Both engines evolve together**: a feature added to one engine without the
187+
other is a parity bug from day one. New resolution techniques must land in
188+
`src/` and `crates/codegraph-core/src/` in the same PR.

.claude/skills/titan-run/SKILL.md

Lines changed: 64 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: titan-run
3-
description: Run the full Titan Paradigm pipeline end-to-end by dispatching each phase to sub-agents with fresh context windows. Orchestrates recon → gauntlet → sync → forge → grind automatically.
4-
argument-hint: <path (default: .)> <--skip-recon> <--skip-gauntlet> <--start-from recon|gauntlet|sync|forge|grind> <--gauntlet-batch-size 5> <--yes>
3+
description: Run the full Titan Paradigm pipeline end-to-end by dispatching each phase to sub-agents with fresh context windows. Orchestrates recon → gauntlet → sync → forge → grind (+ repo-provided parity audit) automatically.
4+
argument-hint: <path (default: .)> <--skip-recon> <--skip-gauntlet> <--start-from recon|gauntlet|sync|forge|grind|parity> <--gauntlet-batch-size 5> <--yes>
55
allowed-tools: Agent, Read, Bash, Glob, Write, Edit
66
---
77

@@ -16,7 +16,7 @@ You are the **orchestrator** for the full Titan Paradigm pipeline. Your job is t
1616
- `<path>` → target path (passed to recon)
1717
- `--skip-recon` → skip recon (assumes artifacts exist)
1818
- `--skip-gauntlet` → skip gauntlet (assumes artifacts exist)
19-
- `--start-from <phase>` → jump to phase: `recon`, `gauntlet`, `sync`, `forge`, `grind`
19+
- `--start-from <phase>` → jump to phase: `recon`, `gauntlet`, `sync`, `forge`, `grind`, `parity`
2020
- `--gauntlet-batch-size <N>` → batch size for gauntlet (default: 5)
2121
- `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) and in forge (per-phase confirmation)
2222

@@ -50,7 +50,7 @@ You are the **orchestrator** for the full Titan Paradigm pipeline. Your job is t
5050
node -e "const fs=require('fs');const s=JSON.parse(fs.readFileSync('.codegraph/titan/titan-state.json','utf8'));s.phaseTimestamps=s.phaseTimestamps||{};s.phaseTimestamps['<PHASE>']=s.phaseTimestamps['<PHASE>']||{};s.phaseTimestamps['<PHASE>'].completedAt=new Date().toISOString();fs.writeFileSync('.codegraph/titan/titan-state.json',JSON.stringify(s,null,2));"
5151
```
5252

53-
Replace `<PHASE>` with `recon`, `gauntlet`, `sync`, `forge`, or `close`. **Run the start command immediately before dispatching each phase's first sub-agent, and the completion command immediately after post-phase validation passes.** If resuming a phase (e.g., gauntlet loop iteration 2+), do NOT overwrite `startedAt` — only set it if it doesn't already exist.
53+
Replace `<PHASE>` with `recon`, `gauntlet`, `sync`, `forge`, `parity`, or `close`. **Run the start command immediately before dispatching each phase's first sub-agent, and the completion command immediately after post-phase validation passes.** If resuming a phase (e.g., gauntlet loop iteration 2+), do NOT overwrite `startedAt` — only set it if it doesn't already exist.
5454

5555
**Timestamp validation:** After recording `completedAt` for any phase, verify `startedAt < completedAt`:
5656
```bash
@@ -639,7 +639,7 @@ Record `phaseTimestamps.forge.completedAt`.
639639
640640
Grind runs after forge to close the adoption loop. Forge extracts helpers; grind wires them into consumers and removes dead code. Without grind, the dead symbol count inflates with every forge phase.
641641
642-
**Skip if:** `--start-from` is `close`, or `titan-state.json → grind.completedPhases` already covers all forge phases.
642+
**Skip if:** `--start-from` is `parity` or `close`, or `titan-state.json → grind.completedPhases` already covers all forge phases.
643643
644644
### 4.5a. Pre-loop check
645645
@@ -742,6 +742,63 @@ Record `phaseTimestamps.grind.completedAt`.
742742
743743
---
744744
745+
## Step 4.6 — PARITY (conditional, repo-provided)
746+
747+
Some repos ship multiple implementations of the same logic that must stay in lockstep (e.g. a dual native/WASM engine, a client and server copy of a validator). Forge and grind edit code across the tree; this step verifies those edits didn't leave one implementation behind.
748+
749+
**titan-run is repo-agnostic** — never assume the target repo has engines, fixtures, or any parity surface. The contract: a repo opts in by shipping its own `/parity` skill at `.claude/skills/parity/SKILL.md` (wrapping whatever audit mechanism it uses internally). No skill → no parity phase.
750+
751+
### 4.6a. Detect the repo's parity mechanism
752+
753+
```bash
754+
test -f .claude/skills/parity/SKILL.md && echo "PARITY SKILL FOUND" || echo "NO PARITY SKILL"
755+
```
756+
757+
- **NO PARITY SKILL** → print `"PARITY skipped — repo provides no /parity skill."` and continue to Step 5. Absence is normal for most repos; do not warn.
758+
- **PARITY SKILL FOUND** → continue below.
759+
760+
**Skip also if:** `--start-from` is `close`, or the pipeline made no code changes this run (`titan-state.json → execution.commits` empty/absent AND no grind adoption commits) — unless `--start-from parity` was given explicitly, which always runs the audit.
761+
762+
### 4.6b. Record phase start
763+
764+
Record `phaseTimestamps.parity.startedAt`.
765+
766+
```bash
767+
headBefore=$(git rev-parse HEAD)
768+
```
769+
770+
### 4.6c. Run Pre-Agent Gate (G1-G4)
771+
772+
### 4.6d. Dispatch sub-agent
773+
774+
```
775+
Agent → "Run /parity. Read .claude/skills/parity/SKILL.md and follow it exactly.
776+
Skip worktree check — already handled.
777+
Audit every surface the skill covers. Fix any divergence introduced by
778+
recent commits at the root cause, commit the fixes, and re-verify until
779+
the audit is clean. If a divergence pre-dates this run (verify via
780+
git log on the relevant files), follow the skill's and repo's rules for
781+
pre-existing findings (typically: file an issue, don't expand scope)."
782+
```
783+
784+
### 4.6e. Post-phase validation
785+
786+
After the agent returns:
787+
788+
```bash
789+
headAfter=$(git rev-parse HEAD)
790+
```
791+
792+
- `git status --short` → the working tree must be clean. The sub-agent commits its fixes; uncommitted changes mean it stopped mid-fix → **stop** and report.
793+
- If the agent fixed divergences, run V16-style commit audit: `git log --oneline $headBefore..$headAfter` and print the parity-fix commits.
794+
- If the agent reports divergences introduced by THIS run that it could not fix → **stop**: "PARITY failed — this run introduced implementation drift. Fix before CLOSE or revert the offending commits." Pre-existing divergences filed as issues are not blockers; print the issue URLs.
795+
796+
Print: `"PARITY complete: <clean | N divergences fixed | N pre-existing filed as issues>"`
797+
798+
Record `phaseTimestamps.parity.completedAt`.
799+
800+
---
801+
745802
## Step 5 — CLOSE (report + PRs)
746803
747804
After forge completes, dispatch `/titan-close` to produce the final report with before/after metrics and split commits into focused PRs.
@@ -778,6 +835,7 @@ Record `phaseTimestamps.close.completedAt`.
778835
- **NDJSON corrupt lines:** Warn but continue — partial results are better than none. The corrupt lines are logged so the user knows which targets to re-audit.
779836
- **Merge conflict detected by pre-agent gate:** Stop immediately with the conflicting files listed.
780837
- **Tests fail after forge phase:** Stop immediately. Print the failing phase's commits so the user can revert.
838+
- **Parity audit fails on drift introduced by this run:** Stop before CLOSE. Retry with `/titan-run --start-from parity` after fixing or reverting.
781839
- **Validation failure (any V-check marked FAILED):** Stop with details. Warn-level V-checks are logged but don't stop the pipeline.
782840
783841
---
@@ -786,7 +844,7 @@ Record `phaseTimestamps.close.completedAt`.
786844
787845
- **You are the orchestrator, not the executor.** Never run codegraph commands, edit source files, or make commits yourself. Only spawn sub-agents and read state files. Exceptions (pure validation/snapshot, no code changes): the post-forge test run (V13), NDJSON integrity checks, the V3 baseline snapshot check (`codegraph snapshot list`), and the pre-forge architectural snapshot capture (Step 3.5a) are run directly by the orchestrator.
788846
- **Run the Pre-Agent Gate (G1-G4) before EVERY sub-agent.** No exceptions.
789-
- **One sub-agent at a time.** Phases are sequential — recon before gauntlet, gauntlet before sync, sync before forge, forge before grind, grind before close.
847+
- **One sub-agent at a time.** Phases are sequential — recon before gauntlet, gauntlet before sync, sync before forge, forge before grind, grind before parity (when the repo provides one), parity before close.
790848
- **Fresh context per sub-agent.** This is the whole point — each sub-agent gets a clean context window.
791849
- **Read AND validate state files after every sub-agent.** Trust the on-disk state, not the sub-agent's text output — but verify the state is structurally sound.
792850
- **Back up state before every sub-agent.** The `.bak` file is your safety net against mid-write crashes.

0 commit comments

Comments
 (0)