Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 188 additions & 0 deletions .claude/skills/parity/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
---
name: parity
description: Audit WASM/native engine correctness parity across all resolution fixtures and fix any divergence at the root cause — both engines must produce identical graphs
argument-hint: "[--langs js,python] [--hybrid] [--audit-only]"
allowed-tools: Bash, Read, Write, Edit, Glob, Grep, Agent
---

# /parity — Engine Correctness Parity Audit & Fix

Codegraph has two engines that MUST produce identical results (see CLAUDE.md):

- **wasm** — JS pipeline + JS extractors + JS edge resolution
- **native** — full Rust orchestrator (`crates/codegraph-core/src/domain/graph/builder/pipeline.rs`)
- **hybrid** — JS pipeline + napi `buildCallEdges` (the fallback when the
orchestrator is skipped: forced full rebuilds, older addons)

This skill runs `scripts/parity-compare.mjs`, which builds every
resolution-benchmark fixture with each engine and compares the **full node and
edge multisets** (kind, name, file, line, confidence, dynamic flag). Any
difference is a bug in the less-accurate engine — never an acceptable gap, and
never something to document as expected. The skill finds the root cause, fixes
it, and re-verifies until the audit is clean.

## Arguments

- `$ARGUMENTS` may contain:
- `--langs a,b,c` — restrict to specific fixture names (e.g. `javascript,pts-javascript`)
- `--hybrid` — also audit the hybrid path (recommended; slower)
- `--audit-only` — report divergences without fixing them
- No arguments — full audit across all fixtures, then fix divergences

## Phase 0 — Pre-flight

All steps run from the repo root.

1. Confirm `scripts/parity-compare.mjs` exists. If not, this repo doesn't have
the parity tooling — stop and report.
2. Build the TypeScript dist (the script imports `dist/index.js`, and extractor
changes in `src/` are invisible until rebuilt):
```bash
npm run build
```
3. Ensure the native addon reflects the local Rust source:
```bash
cd crates/codegraph-core && npx napi build --platform --release && cd ../..
```
On macOS, locally built binaries must be re-signed or Node kills the process
(exit 137):
```bash
codesign --sign - --force crates/codegraph-core/*.node
```
4. Verify the loader picks up the **locally built** binary, not the published
package. First check which path is actually resolved:
```bash
node -e "
const { createRequire } = require('node:module');
const r = createRequire(require.resolve('./dist/index.js'));
try { console.log(r.resolve('codegraph-core')); } catch { console.log('not found via require'); }
"
```
If the resolved path points to
`node_modules/@optave/codegraph-<platform>-<arch>/codegraph-core.node`
(the installed package), copy your freshly built binary over it:
```bash
cp crates/codegraph-core/*.node node_modules/@optave/codegraph-<platform>-<arch>/codegraph-core.node
```
Then confirm the loader picks it up:
```bash
node -e "import('./dist/infrastructure/native.js').then(m => console.log(m.isNativeAvailable()))"
```
If `false`, stop and report — auditing parity without the native engine is
meaningless.

## Phase 1 — Audit

Run the comparison (pass through `--langs` / `--hybrid` from `$ARGUMENTS`):

```bash
node scripts/parity-compare.mjs [--langs ...] [--hybrid] 2>/dev/null
```

- Exit 0 → parity holds. Skip to Phase 4 and report a clean audit.
- Exit 1 → divergences or fixture build failures. Collect every `[node]` /
`[edge]` diff line and any `BUILD FAILED` fixtures.
- Exit 2 → pre-flight failure; go back to Phase 0.

For machine-readable output (useful when many fixtures diverge), re-run with
`--json` and parse `fixtures[].comparisons[].nodeDiffs/edgeDiffs`.

If `--audit-only` was passed: report the diffs (Phase 4 format) and stop.

## Phase 2 — Root-cause and fix

For each divergence, identify which engine is wrong — the one missing edges or
producing lower-quality resolution is usually the buggy one, but verify by
reading the fixture source and deciding what the *correct* graph is.

**Localize the bug by which paths disagree:**

| wasm | hybrid | native | Bug location |
|------|--------|--------|--------------|
| A | A | B | Rust pipeline prep (`pipeline.rs`) or Rust extractor (`crates/.../extractors/`) — the napi solver gets correct input from JS but the orchestrator's own input differs |
| A | B | B | Rust `build_edges.rs` solver (shared by hybrid + native) |
| A | B | A | JS↔napi boundary: `NativeFileEntry` plumbing in `build-edges.ts` or the wasm-worker protocol |
| B | A | A | JS extractor or JS resolution (`src/extractors/`, `src/domain/graph/builder/stages/build-edges.ts`) |

**Fix rules (from CLAUDE.md — non-negotiable):**

- Fix the extraction/resolution layer that produces incorrect results. Never
add comments, tests, or fixture exclusions that frame wrong output as
expected.
- Changes may land in either language or both — create the best version based
on both implementations, don't restrict the fix to one side.
- The module layout is mirrored between `src/` and `crates/codegraph-core/src/`
— read the TS and Rust counterparts side by side (e.g.
`src/domain/graph/builder/stages/build-edges.ts` ↔
`crates/.../domain/graph/builder/stages/build_edges.rs`).
- Mirror *semantics exactly*: confidence constants, hop penalties, tie-breaking
order, first-wins vs highest-wins rules. A 0.05 confidence difference is a
parity failure.
- Add a focused unit test next to the fix (Rust `#[cfg(test)]` or vitest) that
pins the behavior.

**Gotchas that mask fixes:**

- `src/` changes need `npm run build` before the script (which imports dist)
sees them.
- Rust changes need the napi rebuild + macOS codesign from Phase 0.
- New `ExtractorOutput` fields must be added to `SerializedExtractorOutput` in
`src/domain/wasm-worker-{protocol,entry,pool}.ts` or they are silently
dropped at the Worker-thread boundary.
- New per-file fields crossing the napi boundary need: the `FileSymbols` /
`FileEdgeInput` structs in `crates/.../types.rs` & `build_edges.rs`, the
`NativeFileEntry` assembly in `build-edges.ts`, and the orchestrator's own
assembly in `pipeline.rs` (`build_and_insert_call_edges`). Missing the last
one produces hybrid-OK/native-broken splits.
- Out-of-scope findings discovered along the way (pre-existing bugs, refactor
opportunities) → `gh issue create` immediately, then continue.

## Phase 3 — Verify

Repeat until the audit is clean — never stop at "fewer diffs than before":

1. Rebuild whichever side changed (`npm run build` / napi build + codesign).
2. Re-run the Phase 1 audit command. Any remaining divergence → back to Phase 2.
3. Once clean, run the full verification suite — all must pass:
```bash
cargo test --manifest-path crates/codegraph-core/Cargo.toml
npm test
npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts
```
(From a `.claude` worktree, vitest needs the worktree override config —
check memory/project notes if no tests are found.)
4. If any verification step cannot run, STOP and report it — never proceed
with unverified changes.

## Phase 4 — Report

Print a summary:

```
PARITY AUDIT — <date>
Fixtures audited: N (wasm vs native[, hybrid])
Divergences found: M
Fixed: <file:line summary per fix, with engine + root cause>
Verification: cargo test ✓ | npm test ✓ | resolution benchmark ✓
Issues filed: #NNN (out-of-scope findings)
```

- If divergences were found and fixed, list each root cause in one line —
which engine was wrong, which layer, what semantic was mismatched.
- If `--audit-only`: list divergences grouped by fixture with the
wasm/hybrid/native localization table applied.
- Suggest committing engine fixes separately from unrelated work (one PR = one
concern).

## Rules

- **Zero divergence is the only passing state** — a single edge differing in
confidence is a failure.
- **Never exclude a fixture or file to make the audit pass.**
- **Never run the audit against a stale dist or stale native binary** — Phase 0
is mandatory after any code change.
- **The wasm/hybrid/native disagreement pattern localizes the bug** — use the
table before reading code.
- **Both engines evolve together**: a feature added to one engine without the
other is a parity bug from day one. New resolution techniques must land in
`src/` and `crates/codegraph-core/src/` in the same PR.
70 changes: 64 additions & 6 deletions .claude/skills/titan-run/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: titan-run
description: Run the full Titan Paradigm pipeline end-to-end by dispatching each phase to sub-agents with fresh context windows. Orchestrates recon → gauntlet → sync → forge → grind automatically.
argument-hint: <path (default: .)> <--skip-recon> <--skip-gauntlet> <--start-from recon|gauntlet|sync|forge|grind> <--gauntlet-batch-size 5> <--yes>
description: Run the full Titan Paradigm pipeline end-to-end by dispatching each phase to sub-agents with fresh context windows. Orchestrates recon → gauntlet → sync → forge → grind (+ repo-provided parity audit) automatically.
argument-hint: <path (default: .)> <--skip-recon> <--skip-gauntlet> <--start-from recon|gauntlet|sync|forge|grind|parity> <--gauntlet-batch-size 5> <--yes>
allowed-tools: Agent, Read, Bash, Glob, Write, Edit
---

Expand All @@ -16,7 +16,7 @@ You are the **orchestrator** for the full Titan Paradigm pipeline. Your job is t
- `<path>` → target path (passed to recon)
- `--skip-recon` → skip recon (assumes artifacts exist)
- `--skip-gauntlet` → skip gauntlet (assumes artifacts exist)
- `--start-from <phase>` → jump to phase: `recon`, `gauntlet`, `sync`, `forge`, `grind`
- `--start-from <phase>` → jump to phase: `recon`, `gauntlet`, `sync`, `forge`, `grind`, `parity`
- `--gauntlet-batch-size <N>` → batch size for gauntlet (default: 5)
- `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) and in forge (per-phase confirmation)

Expand Down Expand Up @@ -50,7 +50,7 @@ You are the **orchestrator** for the full Titan Paradigm pipeline. Your job is t
node -e "const fs=require('fs');const s=JSON.parse(fs.readFileSync('.codegraph/titan/titan-state.json','utf8'));s.phaseTimestamps=s.phaseTimestamps||{};s.phaseTimestamps['<PHASE>']=s.phaseTimestamps['<PHASE>']||{};s.phaseTimestamps['<PHASE>'].completedAt=new Date().toISOString();fs.writeFileSync('.codegraph/titan/titan-state.json',JSON.stringify(s,null,2));"
```

Replace `<PHASE>` with `recon`, `gauntlet`, `sync`, `forge`, or `close`. **Run the start command immediately before dispatching each phase's first sub-agent, and the completion command immediately after post-phase validation passes.** If resuming a phase (e.g., gauntlet loop iteration 2+), do NOT overwrite `startedAt` — only set it if it doesn't already exist.
Replace `<PHASE>` with `recon`, `gauntlet`, `sync`, `forge`, `parity`, or `close`. **Run the start command immediately before dispatching each phase's first sub-agent, and the completion command immediately after post-phase validation passes.** If resuming a phase (e.g., gauntlet loop iteration 2+), do NOT overwrite `startedAt` — only set it if it doesn't already exist.

**Timestamp validation:** After recording `completedAt` for any phase, verify `startedAt < completedAt`:
```bash
Expand Down Expand Up @@ -639,7 +639,7 @@ Record `phaseTimestamps.forge.completedAt`.

Grind runs after forge to close the adoption loop. Forge extracts helpers; grind wires them into consumers and removes dead code. Without grind, the dead symbol count inflates with every forge phase.

**Skip if:** `--start-from` is `close`, or `titan-state.json → grind.completedPhases` already covers all forge phases.
**Skip if:** `--start-from` is `parity` or `close`, or `titan-state.json → grind.completedPhases` already covers all forge phases.

### 4.5a. Pre-loop check

Expand Down Expand Up @@ -742,6 +742,63 @@ Record `phaseTimestamps.grind.completedAt`.

---

## Step 4.6 — PARITY (conditional, repo-provided)

Some repos ship multiple implementations of the same logic that must stay in lockstep (e.g. a dual native/WASM engine, a client and server copy of a validator). Forge and grind edit code across the tree; this step verifies those edits didn't leave one implementation behind.

**titan-run is repo-agnostic** — never assume the target repo has engines, fixtures, or any parity surface. The contract: a repo opts in by shipping its own `/parity` skill at `.claude/skills/parity/SKILL.md` (wrapping whatever audit mechanism it uses internally). No skill → no parity phase.

### 4.6a. Detect the repo's parity mechanism

```bash
test -f .claude/skills/parity/SKILL.md && echo "PARITY SKILL FOUND" || echo "NO PARITY SKILL"
```

- **NO PARITY SKILL** → print `"PARITY skipped — repo provides no /parity skill."` and continue to Step 5. Absence is normal for most repos; do not warn.
- **PARITY SKILL FOUND** → continue below.

**Skip also if:** `--start-from` is `close`, or the pipeline made no code changes this run (`titan-state.json → execution.commits` empty/absent AND no grind adoption commits) — unless `--start-from parity` was given explicitly, which always runs the audit.

### 4.6b. Record phase start

Record `phaseTimestamps.parity.startedAt`.

```bash
headBefore=$(git rev-parse HEAD)
```

### 4.6c. Run Pre-Agent Gate (G1-G4)

### 4.6d. Dispatch sub-agent

```
Agent → "Run /parity. Read .claude/skills/parity/SKILL.md and follow it exactly.
Skip worktree check — already handled.
Audit every surface the skill covers. Fix any divergence introduced by
recent commits at the root cause, commit the fixes, and re-verify until
the audit is clean. If a divergence pre-dates this run (verify via
git log on the relevant files), follow the skill's and repo's rules for
pre-existing findings (typically: file an issue, don't expand scope)."
```

### 4.6e. Post-phase validation

After the agent returns:

```bash
headAfter=$(git rev-parse HEAD)
```

- `git status --short` → the working tree must be clean. The sub-agent commits its fixes; uncommitted changes mean it stopped mid-fix → **stop** and report.
- If the agent fixed divergences, run V16-style commit audit: `git log --oneline $headBefore..$headAfter` and print the parity-fix commits.
- If the agent reports divergences introduced by THIS run that it could not fix → **stop**: "PARITY failed — this run introduced implementation drift. Fix before CLOSE or revert the offending commits." Pre-existing divergences filed as issues are not blockers; print the issue URLs.

Print: `"PARITY complete: <clean | N divergences fixed | N pre-existing filed as issues>"`

Record `phaseTimestamps.parity.completedAt`.

---

## Step 5 — CLOSE (report + PRs)

After forge completes, dispatch `/titan-close` to produce the final report with before/after metrics and split commits into focused PRs.
Expand Down Expand Up @@ -778,6 +835,7 @@ Record `phaseTimestamps.close.completedAt`.
- **NDJSON corrupt lines:** Warn but continue — partial results are better than none. The corrupt lines are logged so the user knows which targets to re-audit.
- **Merge conflict detected by pre-agent gate:** Stop immediately with the conflicting files listed.
- **Tests fail after forge phase:** Stop immediately. Print the failing phase's commits so the user can revert.
- **Parity audit fails on drift introduced by this run:** Stop before CLOSE. Retry with `/titan-run --start-from parity` after fixing or reverting.
- **Validation failure (any V-check marked FAILED):** Stop with details. Warn-level V-checks are logged but don't stop the pipeline.

---
Expand All @@ -786,7 +844,7 @@ Record `phaseTimestamps.close.completedAt`.

- **You are the orchestrator, not the executor.** Never run codegraph commands, edit source files, or make commits yourself. Only spawn sub-agents and read state files. Exceptions (pure validation/snapshot, no code changes): the post-forge test run (V13), NDJSON integrity checks, the V3 baseline snapshot check (`codegraph snapshot list`), and the pre-forge architectural snapshot capture (Step 3.5a) are run directly by the orchestrator.
- **Run the Pre-Agent Gate (G1-G4) before EVERY sub-agent.** No exceptions.
- **One sub-agent at a time.** Phases are sequential — recon before gauntlet, gauntlet before sync, sync before forge, forge before grind, grind before close.
- **One sub-agent at a time.** Phases are sequential — recon before gauntlet, gauntlet before sync, sync before forge, forge before grind, grind before parity (when the repo provides one), parity before close.
- **Fresh context per sub-agent.** This is the whole point — each sub-agent gets a clean context window.
- **Read AND validate state files after every sub-agent.** Trust the on-disk state, not the sub-agent's text output — but verify the state is structurally sound.
- **Back up state before every sub-agent.** The `.bak` file is your safety net against mid-write crashes.
Expand Down
Loading
Loading