Skip to content

Commit 2b79216

Browse files
committed
feat(skills): add /parity engine-parity audit skill and wire it into /titan-run
scripts/parity-compare.mjs builds every resolution-benchmark fixture with the wasm, native, and (--hybrid) hybrid paths and compares the full node and edge multisets (kind, name, file, line, confidence, dynamic flag) — zero divergence is the only passing state. It caught the Phase 8.2 cross-file return-type propagation gap in the native orchestrator on its first run, plus the pre-existing divergences now tracked in #1466-#1472. The /parity skill runs the audit, localizes each divergence by which build paths disagree (wasm/hybrid/native table), fixes the root cause in whichever engine is wrong, and re-verifies until clean. /titan-run gains a conditional Step 4.7 — PARITY between grind and close. The orchestrator stays repo-agnostic: a repo opts in by shipping its own .claude/skills/parity/SKILL.md; without one the step prints a skip note and continues.
1 parent 204862c commit 2b79216

3 files changed

Lines changed: 525 additions & 6 deletions

File tree

.claude/skills/parity/SKILL.md

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
name: parity
3+
description: Audit WASM/native engine correctness parity across all resolution fixtures and fix any divergence at the root cause — both engines must produce identical graphs
4+
argument-hint: "[--langs js,python] [--hybrid] [--audit-only]"
5+
allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Agent
6+
---
7+
8+
# /parity — Engine Correctness Parity Audit & Fix
9+
10+
Codegraph has two engines that MUST produce identical results (see CLAUDE.md):
11+
12+
- **wasm** — JS pipeline + JS extractors + JS edge resolution
13+
- **native** — full Rust orchestrator (`crates/codegraph-core/src/domain/graph/builder/pipeline.rs`)
14+
- **hybrid** — JS pipeline + napi `buildCallEdges` (the fallback when the
15+
orchestrator is skipped: forced full rebuilds, older addons)
16+
17+
This skill runs `scripts/parity-compare.mjs`, which builds every
18+
resolution-benchmark fixture with each engine and compares the **full node and
19+
edge multisets** (kind, name, file, line, confidence, dynamic flag). Any
20+
difference is a bug in the less-accurate engine — never an acceptable gap, and
21+
never something to document as expected. The skill finds the root cause, fixes
22+
it, and re-verifies until the audit is clean.
23+
24+
## Arguments
25+
26+
- `$ARGUMENTS` may contain:
27+
- `--langs a,b,c` — restrict to specific fixture names (e.g. `javascript,pts-javascript`)
28+
- `--hybrid` — also audit the hybrid path (recommended; slower)
29+
- `--audit-only` — report divergences without fixing them
30+
- No arguments — full audit across all fixtures, then fix divergences
31+
32+
## Phase 0 — Pre-flight
33+
34+
All steps run from the repo root.
35+
36+
1. Confirm `scripts/parity-compare.mjs` exists. If not, this repo doesn't have
37+
the parity tooling — stop and report.
38+
2. Build the TypeScript dist (the script imports `dist/index.js`, and extractor
39+
changes in `src/` are invisible until rebuilt):
40+
```bash
41+
npm run build
42+
```
43+
3. Ensure the native addon reflects the local Rust source:
44+
```bash
45+
cd crates/codegraph-core && npx napi build --platform --release && cd ../..
46+
```
47+
On macOS, locally built binaries must be re-signed or Node kills the process
48+
(exit 137):
49+
```bash
50+
codesign --sign - --force crates/codegraph-core/*.node
51+
```
52+
4. Verify the loader picks it up:
53+
```bash
54+
node -e "import('./dist/infrastructure/native.js').then(m => console.log(m.isNativeAvailable()))"
55+
```
56+
If `false`, stop and report — auditing parity without the native engine is
57+
meaningless. Note: if the repo (or a parent) has
58+
`node_modules/@optave/codegraph-<platform>-<arch>/` installed, Node resolves
59+
that package **before** the crate-local build — copy the freshly built
60+
binary over `codegraph-core.node` in that package dir, or the audit will
61+
silently test the published binary instead of your changes.
62+
63+
## Phase 1 — Audit
64+
65+
Run the comparison (pass through `--langs` / `--hybrid` from `$ARGUMENTS`):
66+
67+
```bash
68+
node scripts/parity-compare.mjs [--langs ...] [--hybrid] 2>/dev/null
69+
```
70+
71+
- Exit 0 → parity holds. Skip to Phase 4 and report a clean audit.
72+
- Exit 1 → divergences or fixture build failures. Collect every `[node]` /
73+
`[edge]` diff line and any `BUILD FAILED` fixtures.
74+
- Exit 2 → pre-flight failure; go back to Phase 0.
75+
76+
For machine-readable output (useful when many fixtures diverge), re-run with
77+
`--json` and parse `fixtures[].comparisons[].nodeDiffs/edgeDiffs`.
78+
79+
If `--audit-only` was passed: report the diffs (Phase 4 format) and stop.
80+
81+
## Phase 2 — Root-cause and fix
82+
83+
For each divergence, identify which engine is wrong — the one missing edges or
84+
producing lower-quality resolution is usually the buggy one, but verify by
85+
reading the fixture source and deciding what the *correct* graph is.
86+
87+
**Localize the bug by which paths disagree:**
88+
89+
| wasm | hybrid | native | Bug location |
90+
|------|--------|--------|--------------|
91+
| A | A | B | Rust pipeline prep (`pipeline.rs`) or Rust extractor (`crates/.../extractors/`) — the napi solver gets correct input from JS but the orchestrator's own input differs |
92+
| A | B | B | Rust `build_edges.rs` solver (shared by hybrid + native) |
93+
| A | B | A | JS↔napi boundary: `NativeFileEntry` plumbing in `build-edges.ts` or the wasm-worker protocol |
94+
| B | A | A | JS extractor or JS resolution (`src/extractors/`, `src/domain/graph/builder/stages/build-edges.ts`) |
95+
96+
**Fix rules (from CLAUDE.md — non-negotiable):**
97+
98+
- Fix the extraction/resolution layer that produces incorrect results. Never
99+
add comments, tests, or fixture exclusions that frame wrong output as
100+
expected.
101+
- Changes may land in either language or both — create the best version based
102+
on both implementations, don't restrict the fix to one side.
103+
- The module layout is mirrored between `src/` and `crates/codegraph-core/src/`
104+
— read the TS and Rust counterparts side by side (e.g.
105+
`src/domain/graph/builder/stages/build-edges.ts`
106+
`crates/.../domain/graph/builder/stages/build_edges.rs`).
107+
- Mirror *semantics exactly*: confidence constants, hop penalties, tie-breaking
108+
order, first-wins vs highest-wins rules. A 0.05 confidence difference is a
109+
parity failure.
110+
- Add a focused unit test next to the fix (Rust `#[cfg(test)]` or vitest) that
111+
pins the behavior.
112+
113+
**Gotchas that mask fixes:**
114+
115+
- `src/` changes need `npm run build` before the script (which imports dist)
116+
sees them.
117+
- Rust changes need the napi rebuild + macOS codesign from Phase 0.
118+
- New `ExtractorOutput` fields must be added to `SerializedExtractorOutput` in
119+
`src/domain/wasm-worker-{protocol,entry,pool}.ts` or they are silently
120+
dropped at the Worker-thread boundary.
121+
- New per-file fields crossing the napi boundary need: the `FileSymbols` /
122+
`FileEdgeInput` structs in `crates/.../types.rs` & `build_edges.rs`, the
123+
`NativeFileEntry` assembly in `build-edges.ts`, and the orchestrator's own
124+
assembly in `pipeline.rs` (`build_and_insert_call_edges`). Missing the last
125+
one produces hybrid-OK/native-broken splits.
126+
- Out-of-scope findings discovered along the way (pre-existing bugs, refactor
127+
opportunities) → `gh issue create` immediately, then continue.
128+
129+
## Phase 3 — Verify
130+
131+
Repeat until the audit is clean — never stop at "fewer diffs than before":
132+
133+
1. Rebuild whichever side changed (`npm run build` / napi build + codesign).
134+
2. Re-run the Phase 1 audit command. Any remaining divergence → back to Phase 2.
135+
3. Once clean, run the full verification suite — all must pass:
136+
```bash
137+
cargo test --manifest-path crates/codegraph-core/Cargo.toml
138+
npm test
139+
npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts
140+
```
141+
(From a `.claude` worktree, vitest needs the worktree override config —
142+
check memory/project notes if no tests are found.)
143+
4. If any verification step cannot run, STOP and report it — never proceed
144+
with unverified changes.
145+
146+
## Phase 4 — Report
147+
148+
Print a summary:
149+
150+
```
151+
PARITY AUDIT — <date>
152+
Fixtures audited: N (wasm vs native[, hybrid])
153+
Divergences found: M
154+
Fixed: <file:line summary per fix, with engine + root cause>
155+
Verification: cargo test ✓ | npm test ✓ | resolution benchmark ✓
156+
Issues filed: #NNN (out-of-scope findings)
157+
```
158+
159+
- If divergences were found and fixed, list each root cause in one line —
160+
which engine was wrong, which layer, what semantic was mismatched.
161+
- If `--audit-only`: list divergences grouped by fixture with the
162+
wasm/hybrid/native localization table applied.
163+
- Suggest committing engine fixes separately from unrelated work (one PR = one
164+
concern).
165+
166+
## Rules
167+
168+
- **Zero divergence is the only passing state** — a single edge differing in
169+
confidence is a failure.
170+
- **Never exclude a fixture or file to make the audit pass.**
171+
- **Never run the audit against a stale dist or stale native binary** — Phase 0
172+
is mandatory after any code change.
173+
- **The wasm/hybrid/native disagreement pattern localizes the bug** — use the
174+
table before reading code.
175+
- **Both engines evolve together**: a feature added to one engine without the
176+
other is a parity bug from day one. New resolution techniques must land in
177+
`src/` and `crates/codegraph-core/src/` in the same PR.

.claude/skills/titan-run/SKILL.md

Lines changed: 55 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: titan-run
3-
description: Run the full Titan Paradigm pipeline end-to-end by dispatching each phase to sub-agents with fresh context windows. Orchestrates recon → gauntlet → sync → forge → grind automatically.
4-
argument-hint: <path (default: .)> <--skip-recon> <--skip-gauntlet> <--start-from recon|gauntlet|sync|forge|grind> <--gauntlet-batch-size 5> <--yes>
3+
description: Run the full Titan Paradigm pipeline end-to-end by dispatching each phase to sub-agents with fresh context windows. Orchestrates recon → gauntlet → sync → forge → grind (+ repo-provided parity audit) automatically.
4+
argument-hint: <path (default: .)> <--skip-recon> <--skip-gauntlet> <--start-from recon|gauntlet|sync|forge|grind|parity> <--gauntlet-batch-size 5> <--yes>
55
allowed-tools: Agent, Read, Bash, Glob, Write, Edit
66
---
77

@@ -16,7 +16,7 @@ You are the **orchestrator** for the full Titan Paradigm pipeline. Your job is t
1616
- `<path>` → target path (passed to recon)
1717
- `--skip-recon` → skip recon (assumes artifacts exist)
1818
- `--skip-gauntlet` → skip gauntlet (assumes artifacts exist)
19-
- `--start-from <phase>` → jump to phase: `recon`, `gauntlet`, `sync`, `forge`, `grind`
19+
- `--start-from <phase>` → jump to phase: `recon`, `gauntlet`, `sync`, `forge`, `grind`, `parity`
2020
- `--gauntlet-batch-size <N>` → batch size for gauntlet (default: 5)
2121
- `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) and in forge (per-phase confirmation)
2222

@@ -50,7 +50,7 @@ You are the **orchestrator** for the full Titan Paradigm pipeline. Your job is t
5050
node -e "const fs=require('fs');const s=JSON.parse(fs.readFileSync('.codegraph/titan/titan-state.json','utf8'));s.phaseTimestamps=s.phaseTimestamps||{};s.phaseTimestamps['<PHASE>']=s.phaseTimestamps['<PHASE>']||{};s.phaseTimestamps['<PHASE>'].completedAt=new Date().toISOString();fs.writeFileSync('.codegraph/titan/titan-state.json',JSON.stringify(s,null,2));"
5151
```
5252

53-
Replace `<PHASE>` with `recon`, `gauntlet`, `sync`, `forge`, or `close`. **Run the start command immediately before dispatching each phase's first sub-agent, and the completion command immediately after post-phase validation passes.** If resuming a phase (e.g., gauntlet loop iteration 2+), do NOT overwrite `startedAt` — only set it if it doesn't already exist.
53+
Replace `<PHASE>` with `recon`, `gauntlet`, `sync`, `forge`, `parity`, or `close`. **Run the start command immediately before dispatching each phase's first sub-agent, and the completion command immediately after post-phase validation passes.** If resuming a phase (e.g., gauntlet loop iteration 2+), do NOT overwrite `startedAt` — only set it if it doesn't already exist.
5454

5555
**Timestamp validation:** After recording `completedAt` for any phase, verify `startedAt < completedAt`:
5656
```bash
@@ -639,7 +639,7 @@ Record `phaseTimestamps.forge.completedAt`.
639639
640640
Grind runs after forge to close the adoption loop. Forge extracts helpers; grind wires them into consumers and removes dead code. Without grind, the dead symbol count inflates with every forge phase.
641641
642-
**Skip if:** `--start-from` is `close`, or `titan-state.json → grind.completedPhases` already covers all forge phases.
642+
**Skip if:** `--start-from` is `parity` or `close`, or `titan-state.json → grind.completedPhases` already covers all forge phases.
643643
644644
### 4.5a. Pre-loop check
645645
@@ -742,6 +742,54 @@ Record `phaseTimestamps.grind.completedAt`.
742742
743743
---
744744
745+
## Step 4.7 — PARITY (conditional, repo-provided)
746+
747+
Some repos ship multiple implementations of the same logic that must stay in lockstep (e.g. a dual native/WASM engine, a client and server copy of a validator). Forge and grind edit code across the tree; this step verifies those edits didn't leave one implementation behind.
748+
749+
**titan-run is repo-agnostic** — never assume the target repo has engines, fixtures, or any parity surface. The contract: a repo opts in by shipping its own `/parity` skill at `.claude/skills/parity/SKILL.md` (wrapping whatever audit mechanism it uses internally). No skill → no parity phase.
750+
751+
### 4.7a. Detect the repo's parity mechanism
752+
753+
```bash
754+
test -f .claude/skills/parity/SKILL.md && echo "PARITY SKILL FOUND" || echo "NO PARITY SKILL"
755+
```
756+
757+
- **NO PARITY SKILL** → print `"PARITY skipped — repo provides no /parity skill."` and continue to Step 5. Absence is normal for most repos; do not warn.
758+
- **PARITY SKILL FOUND** → continue below.
759+
760+
**Skip also if:** `--start-from` is `close`, or the pipeline made no code changes this run (`titan-state.json → execution.commits` empty/absent AND no grind adoption commits) — unless `--start-from parity` was given explicitly, which always runs the audit.
761+
762+
### 4.7b. Record phase start
763+
764+
Record `phaseTimestamps.parity.startedAt`.
765+
766+
### 4.7c. Run Pre-Agent Gate (G1-G4)
767+
768+
### 4.7d. Dispatch sub-agent
769+
770+
```
771+
Agent → "Run /parity. Read .claude/skills/parity/SKILL.md and follow it exactly.
772+
Skip worktree check — already handled.
773+
Audit every surface the skill covers. Fix any divergence introduced by
774+
recent commits at the root cause, commit the fixes, and re-verify until
775+
the audit is clean. If a divergence pre-dates this run (verify via
776+
git log on the relevant files), follow the skill's and repo's rules for
777+
pre-existing findings (typically: file an issue, don't expand scope)."
778+
```
779+
780+
### 4.7e. Post-phase validation
781+
782+
After the agent returns:
783+
- `git status --short` → the working tree must be clean. The sub-agent commits its fixes; uncommitted changes mean it stopped mid-fix → **stop** and report.
784+
- If the agent fixed divergences, run V16-style commit audit: `git log --oneline <headBefore>..<headAfter>` and print the parity-fix commits.
785+
- If the agent reports divergences introduced by THIS run that it could not fix → **stop**: "PARITY failed — this run introduced implementation drift. Fix before CLOSE or revert the offending commits." Pre-existing divergences filed as issues are not blockers; print the issue URLs.
786+
787+
Print: `"PARITY complete: <clean | N divergences fixed | N pre-existing filed as issues>"`
788+
789+
Record `phaseTimestamps.parity.completedAt`.
790+
791+
---
792+
745793
## Step 5 — CLOSE (report + PRs)
746794
747795
After forge completes, dispatch `/titan-close` to produce the final report with before/after metrics and split commits into focused PRs.
@@ -778,6 +826,7 @@ Record `phaseTimestamps.close.completedAt`.
778826
- **NDJSON corrupt lines:** Warn but continue — partial results are better than none. The corrupt lines are logged so the user knows which targets to re-audit.
779827
- **Merge conflict detected by pre-agent gate:** Stop immediately with the conflicting files listed.
780828
- **Tests fail after forge phase:** Stop immediately. Print the failing phase's commits so the user can revert.
829+
- **Parity audit fails on drift introduced by this run:** Stop before CLOSE. Retry with `/titan-run --start-from parity` after fixing or reverting.
781830
- **Validation failure (any V-check marked FAILED):** Stop with details. Warn-level V-checks are logged but don't stop the pipeline.
782831
783832
---
@@ -786,7 +835,7 @@ Record `phaseTimestamps.close.completedAt`.
786835
787836
- **You are the orchestrator, not the executor.** Never run codegraph commands, edit source files, or make commits yourself. Only spawn sub-agents and read state files. Exceptions (pure validation/snapshot, no code changes): the post-forge test run (V13), NDJSON integrity checks, the V3 baseline snapshot check (`codegraph snapshot list`), and the pre-forge architectural snapshot capture (Step 3.5a) are run directly by the orchestrator.
788837
- **Run the Pre-Agent Gate (G1-G4) before EVERY sub-agent.** No exceptions.
789-
- **One sub-agent at a time.** Phases are sequential — recon before gauntlet, gauntlet before sync, sync before forge, forge before grind, grind before close.
838+
- **One sub-agent at a time.** Phases are sequential — recon before gauntlet, gauntlet before sync, sync before forge, forge before grind, grind before parity (when the repo provides one), parity before close.
790839
- **Fresh context per sub-agent.** This is the whole point — each sub-agent gets a clean context window.
791840
- **Read AND validate state files after every sub-agent.** Trust the on-disk state, not the sub-agent's text output — but verify the state is structurally sound.
792841
- **Back up state before every sub-agent.** The `.bak` file is your safety net against mid-write crashes.

0 commit comments

Comments
 (0)