feat: add verify-reduction skill (#979)

zazabap · claude · GiggleLiu · web-flow · commit 07caf144d3dd · 2026-04-06T23:49:14.000+08:00
* docs: add design spec for proposed reductions Typst note Covers 9 reductions: 2 NP-hardness chain extensions (#973, #198), 4 Tier 1a blocked issues (#379, #380, #888, #822), and 3 Tier 1b blocked issues (#892, #894, #890). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add verify-reduction skill for mathematical verification of reductions New skill: /verify-reduction <issue-number> End-to-end pipeline that takes a reduction rule issue and produces: 1. Typst proof (Construction/Correctness/Extraction/Overhead + YES/NO examples) 2. Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5) 3. Lean 4 lemmas (non-trivial structural proofs required) Follows issue-to-pr conventions: creates worktree, works in isolation, submits PR. Strict quality gates (zero tolerance): - No "trivial" category — every reduction ≥5000 checks - 7 mandatory Python sections including NO (infeasible) example - Non-trivial Lean required (rfl/omega tautologies rejected) - Zero hand-waving in Typst ("clearly", "obviously" → rejected) - Mandatory gap analysis: every proof claim must have a test - Self-review checklist with 20+ items across 4 categories Developed and validated through PR #975 (800K+ checks, 3 bugs caught) and tested on issues #868 (caught wrong example) and #841 (35K checks). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: add YAML frontmatter + professional tone to verify-reduction skill - Added frontmatter (name, description) matching other skills' convention - Toned down aggressive language ("ZERO TOLERANCE", "THE HARSHEST STEP", "NON-NEGOTIABLE") to professional but firm language - All quality gates unchanged — same strictness, better presentation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: adversarial multi-agent verification in verify-reduction skill Replaces Lean-required gates with adversarial second agent: - Step 5: Adversary agent independently implements reduce() and extract_solution() from theorem statement only (not constructor's script) - Step 5c: Cross-comparison of both implementations on 1000 instances - Lean downgraded from required to optional - hypothesis property-based testing for n up to 50 - Quality gates: 2 independent scripts ≥5000 checks each + cross-comparison Design rationale (docs/superpowers/specs/2026-04-01-adversarial-verification-design.md): - Same agent writing proof + test is the #1 risk for AI verification - Two independent implementations agreeing > one + trivial Lean - Lean caught 0 bugs in PR #975; Python caught all 4 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: design spec for verify-reduction skill enhancements Typst↔Python auto-matching, test vectors JSON for downstream consumption by add-rule and review-pipeline, adversary tailoring by reduction type, compositional verification via pred CLI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: implementation plan for verify-reduction enhancements 5 tasks: update verify-reduction (Step 4.5 auto-matching, Step 5 typed adversary, Step 8 downstream artifacts), create add-reduction skill, register in CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: enhance verify-reduction with test vectors export, typed adversary, pipeline integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: create add-reduction skill — consumes verified artifacts from verify-reduction * feat: register add-reduction skill in CLAUDE.md, update verify-reduction description Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: three improvements to verify-reduction and add-reduction skills 1. verify-reduction Step 1: type compatibility gate — checks source/target Value types before proceeding. Stops and comments on issue if types are incompatible (e.g., optimization → decision needs K parameter). 2. add-reduction Step 7: mandatory cleanup of verification artifacts from docs/paper/verify-reductions/ — Python scripts, JSON, Typst, PDF must not get into the library. 3. add-reduction Steps 4/4b/5: mandatory requirements from #974 — canonical example in rule_builders.rs (Check 9), example-db lookup test (Check 10), paper reduction-rule entry (Check 11). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: concise verify-reduction (761→124 lines) + self-contained add-reduction verify-reduction: removed verbose templates, condensed checklists into prose, kept all requirements but removed boilerplate code blocks that the agent can derive from context. add-reduction: integrated add-rule Steps 1-6, write-rule-in-paper Steps 1-6, and #974 requirements (Checks 9/10/11) into a single self-contained skill. No need to read 3 other skills. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restore structural requirements in verify-reduction (124→274 lines) The previous rewrite over-condensed the skill, removing gates that agents need to follow: 7-section descriptions with table, minimum check count table, check count audit template, gap analysis format, common mistakes table, and self-review checklist with checkboxes. Restored: all structural gates and requirements. Kept concise: no verbose Python/Typst code templates (agent derives these). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden add-reduction with file-level verification gates Steps 4, 4b, 5 now have HARD GATE labels with verification commands that check the SPECIFIC required files appear in `git diff --name-only`. Step 8 has a pre-commit gate that lists all 6 required files and blocks commit if any is missing. Root cause: subagents skipped Steps 4 (put example in rule file instead of rule_builders.rs) and 5 (skipped paper entry entirely) because the skill said "MANDATORY" but had no mechanical enforcement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add CI-equivalent checks to add-reduction pre-commit gate Root cause: PRs #985 and #991 failed CI because: 1. Local clippy doesn't use -D warnings but CI does (caught needless_range_loop) 2. New reductions can create paths that dominate existing direct reductions (test_find_dominated_rules_returns_known_set has hardcoded known set) Added to Step 6: - Mandatory `cargo clippy -- -D warnings` (not just `cargo clippy`) - Mandatory `cargo test` (full suite, not filtered) - Explicit dominated-rules gate with fix instructions Added to Common Mistakes: - clippy without -D warnings - dominated rules test - skipping full cargo test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: correct add-reduction HARD GATE for canonical examples rule_builders.rs is a 4-line pass-through — canonical examples are registered via canonical_rule_example_specs() in each rule file, wired through mod.rs. Updated Step 4 to match actual architecture. Also added analysis.rs to git add list (for dominated-rules updates). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: parent-side verification + pre-commit hook for add-reduction Two enforcement mechanisms that don't rely on subagent compliance: 1. Parent-side verification (Step 8a): After subagent reports DONE, the parent runs file gate checks independently. If any required file is missing, sends the subagent back — doesn't trust self-report. 2. Pre-commit hook (.claude/hooks/add-reduction-precommit.sh): Mechanically blocks commits of new rule files unless example_db.rs, reductions.typ, and mod.rs are also staged. Subagents cannot bypass. Root cause: subagents skip HARD GATE steps despite skill text saying "MANDATORY". Text-based enforcement doesn't work — need mechanical checks that run after the subagent, not instructions the subagent reads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: strengthen type compatibility gate in verify-reduction skill Expanded the type compatibility table to explicitly list all incompatible pairs (Min→Or, Max→Or, Max→Min, Min→Max, Or→Sum, etc.) with concrete examples from batch verification (#198 MVC→HamCircuit, #890 MaxCut→OLA). Added common mistake entry for proceeding past the type gate. Learned from batch run: 5 out of 50 reductions were mathematically verified but turned out to be unimplementable as ReduceTo due to type mismatches that the original gate didn't catch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Delete docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md * Delete docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md * Delete .claude/CLAUDE.md * Revert "Delete .claude/CLAUDE.md" This reverts commit 71c1444. * chore: remove docs/superpowers/specs/ directory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: integrate verify-reduction into add-rule pipeline - Delete /add-reduction skill and pre-commit hook (absorbed into /add-rule) - /add-rule now runs /verify-reduction by default; --no-verify to skip - /verify-reduction simplified: no worktree, no PR, no saved artifacts - /issue-to-pr passes --no-verify flag through to /add-rule - Update CLAUDE.md skill descriptions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Jinguo Liu <cacate0129@gmail.com>
diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
@@ -12,7 +12,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`.
 - [run-pipeline](skills/run-pipeline/SKILL.md) -- Pick a Ready issue from the GitHub Project board, move it through In Progress -> issue-to-pr -> Review pool. One issue at a time; forever-loop handles iteration.
 - [issue-to-pr](skills/issue-to-pr/SKILL.md) -- Convert a GitHub issue into a PR with an implementation plan. Default rule: one item per PR. Exception: a `[Model]` issue that explicitly claims direct ILP solvability should implement the model and its direct `<Model> -> ILP` rule together; `[Rule]` issues still require both models to exist on `main`.
 - [add-model](skills/add-model/SKILL.md) -- Add a new problem model. Can be used standalone (brainstorms with user) or called from `issue-to-pr`.
-- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`.
+- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Runs mathematical verification by default (via `/verify-reduction`); pass `--no-verify` to skip for trivial reductions. Can be used standalone or called from `issue-to-pr`.
 - [review-structural](skills/review-structural/SKILL.md) -- Project-specific structural completeness check: model/rule checklists, build, semantic correctness, issue compliance. Read-only, no code changes. Called by `review-pipeline`.
 - [review-quality](skills/review-quality/SKILL.md) -- Generic code quality review: DRY, KISS, cohesion/coupling, test quality, HCI. Read-only, no code changes. Called by `review-pipeline`.
 - [fix-pr](skills/fix-pr/SKILL.md) -- Resolve PR review comments, fix CI failures, and address codecov coverage gaps. Uses `gh api` for codecov (not local `cargo-llvm-cov`).
@@ -29,6 +29,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`.
 - [propose](skills/propose/SKILL.md) -- Interactive brainstorming to help domain experts propose a new model or rule. Asks one question at a time, uses mathematical language (no programming jargon), and files a GitHub issue.
 - [final-review](skills/final-review/SKILL.md) -- Interactive maintainer review for PRs in "Final review" column. Merges main, walks through agentic review bullets with human, then merge or hold.
 - [dev-setup](skills/dev-setup/SKILL.md) -- Interactive wizard to install and configure all development tools for new maintainers.
+- [verify-reduction](skills/verify-reduction/SKILL.md) -- Standalone mathematical verification of a reduction rule: Typst proof, constructor Python (≥5000 checks), adversary Python (≥5000 independent checks). Reports verdict, no artifacts saved. Also called as a subroutine by `/add-rule` (default behavior).
 - [tutorial](skills/tutorial/SKILL.md) -- Interactive tutorial — walk through the pred CLI to explore, reduce, and solve NP-hard problems. No Rust internals.
 
 ## Codex Compatibility
diff --git a/.claude/skills/add-rule/SKILL.md b/.claude/skills/add-rule/SKILL.md
@@ -5,7 +5,16 @@ description: Use when adding a new reduction rule to the codebase, either from a
 
 # Add Rule
 
-Step-by-step guide for adding a new reduction rule (A -> B) to the codebase.
+Step-by-step guide for adding a new reduction rule (A -> B) to the codebase. By default, every rule goes through mathematical verification (via `/verify-reduction`) before implementation. Pass `--no-verify` to skip verification for trivial reductions.
+
+## Invocation
+
+```
+/add-rule                     # interactive, with verification (default)
+/add-rule --no-verify         # interactive, skip verification
+```
+
+When called from `/issue-to-pr`, the `--no-verify` flag is passed through if present.
 
 ## Step 0: Gather Required Information
 
@@ -27,6 +36,26 @@ Before any implementation, collect all required information. If called from `iss
 
 If any item is missing, ask the user to provide it. Put a high standard on item 7 (concrete example): it must be in tutorial style with clear intuition and easy to understand. Do NOT proceed until the checklist is complete.
 
+## Step 0.5: Type Compatibility Gate
+
+Check source/target `Value` types before any work:
+
+```bash
+grep "type Value = " src/models/*/<source_file>.rs src/models/*/<target_file>.rs
+```
+
+**Compatible pairs for `ReduceTo` (witness-capable):**
+- `Or`->`Or`, `Min`->`Min`, `Max`->`Max` (same type)
+- `Or`->`Min`, `Or`->`Max` (feasibility embeds into optimization)
+
+**Incompatible — STOP if any of these:**
+- `Min`->`Or` or `Max`->`Or` — optimization source has no threshold K; needs a decision-variant source model
+- `Max`->`Min` or `Min`->`Max` — opposite optimization directions; needs `ReduceToAggregate` or a decision-variant wrapper
+- `Or`->`Sum` or `Min`->`Sum` — Sum is aggregate-only; needs `ReduceToAggregate`
+- Any pair involving `And` or `Sum` on the target side
+
+If incompatible, STOP and comment on the issue explaining the type mismatch and options. Do NOT proceed.
+
 ## Reference Implementations
 
 Read these first to understand the patterns:
@@ -35,7 +64,19 @@ Read these first to understand the patterns:
 - **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet`
 - **Traits:** `src/rules/traits.rs` (`ReduceTo<T>`, `ReduceToAggregate<T>`, `ReductionResult`, `AggregateReductionResult`)
 
-## Step 1: Implement the reduction
+## Step 1: Mathematical Verification (default, skip with `--no-verify`)
+
+**If `--no-verify` was passed, skip to Step 2.**
+
+Invoke the `/verify-reduction` skill to mathematically verify the reduction before writing Rust code. This runs the full verification pipeline: Typst proof, constructor Python script (>=5000 checks), adversary subagent (>=5000 independent checks), and cross-comparison.
+
+All verification artifacts are ephemeral — they exist only in conversation context and temp files. Nothing is committed to the repository.
+
+**If verification FAILS: STOP. Report to user. Do NOT proceed to implementation.**
+
+If verification passes, the verified Python `reduce()` and `extract_solution()` functions, along with the YES/NO instances, carry forward in conversation context to inform Steps 2-5. Use them as the canonical spec for the Rust implementation.
+
+## Step 2: Implement the reduction
 
 Create `src/rules/<source>_<target>.rs` (all lowercase, no underscores between words within a problem name):
 
@@ -67,6 +108,7 @@ impl ReductionResult for ReductionXToY {
     fn target_problem(&self) -> &Self::Target { &self.target }
     fn extract_solution(&self, target_solution: &[usize]) -> Vec<usize> {
         // Map target solution back to source solution
+        // If Step 1 ran: translate the verified Python extract_solution() logic
     }
 }
 ```
@@ -78,21 +120,23 @@ impl ReductionResult for ReductionXToY {
 })]
 impl ReduceTo<TargetType> for SourceType {
     type Result = ReductionXToY;
-    fn reduce_to(&self) -> Self::Result { ... }
+    fn reduce_to(&self) -> Self::Result {
+        // If Step 1 ran: translate the verified Python reduce() logic
+    }
 }
 ```
 
 Each primitive reduction is determined by the exact source/target variant pair. Keep one primitive registration per endpoint pair and use only the `overhead` form of `#[reduction]`.
 
 **Aggregate-only reductions:** when the rule preserves aggregate values but cannot recover a source witness from a target witness, implement `AggregateReductionResult` + `ReduceToAggregate<T>` instead of `ReductionResult` + `ReduceTo<T>`. Those edges are not auto-registered by `#[reduction]` yet; register them manually with `ReductionEntry { reduce_aggregate_fn: ..., capabilities: EdgeCapabilities::aggregate_only(), ... }`. See `src/unit_tests/rules/traits.rs` and `src/unit_tests/rules/graph.rs` for the reference pattern.
 
-## Step 2: Register in mod.rs
+## Step 3: Register in mod.rs
 
 Add to `src/rules/mod.rs`:
 - `mod <source>_<target>;`
 - If feature-gated (e.g., ILP): wrap with `#[cfg(feature = "ilp-solver")]`
 
-## Step 3: Write unit tests
+## Step 4: Write unit tests
 
 Create `src/unit_tests/rules/<source>_<target>.rs`:
 
@@ -105,6 +149,8 @@ Create `src/unit_tests/rules/<source>_<target>.rs`:
 // 5. Verify: extracted solution is valid and optimal for source
 ```
 
+If Step 1 ran, use the verified YES/NO instances from conversation context to construct test cases. Include both a feasible (closed-loop) and infeasible (no witnesses) test.
+
 Additional recommended tests:
 - Verify target problem structure (correct size, edges, constraints)
 - Edge cases (empty graph, single vertex, etc.)
@@ -117,17 +163,19 @@ For aggregate-only reductions, replace the closed-loop witness test with value-c
 
 Link via `#[cfg(test)] #[path = "..."] mod tests;` at the bottom of the rule file.
 
-## Step 4: Add canonical example to example_db
+## Step 5: Add canonical example to example_db
 
 Add a builder function in `src/example_db/rule_builders.rs` that constructs a small, canonical instance for this reduction. Follow the existing patterns in that file. Register the builder in `build_rule_examples()`.
 
-## Step 5: Document in paper (MANDATORY — DO NOT SKIP)
+## Step 6: Document in paper (MANDATORY — DO NOT SKIP)
 
 **This step is NOT optional.** Every reduction rule MUST have a corresponding `reduction-rule` entry in the paper. Skipping documentation is a blocking error — the PR will be rejected in review. Do not proceed to Step 6 until the paper entry is written and `make paper` compiles.
 
 Write a `reduction-rule` entry in `docs/paper/reductions.typ`. **Reference example:** search for `reduction-rule("KColoring", "QUBO"` to see the gold-standard entry — use it as a template. For a minimal example, see MinimumVertexCover -> MaximumIndependentSet.
 
-### 5a. Write theorem body (rule statement)
+If Step 1 ran, adapt the verified Typst proof into the paper's macros. Do not rewrite the proof from scratch — reformat it.
+
+### 6a. Write theorem body (rule statement)
 
 ```typst
 #reduction-rule("Source", "Target",
@@ -140,7 +188,7 @@ Write a `reduction-rule` entry in `docs/paper/reductions.typ`. **Reference examp
 
 Three parts: complexity with citation, construction summary, overhead hint.
 
-### 5b. Write proof body
+### 6b. Write proof body
 
 Use these subsections with italic labels:
 
@@ -158,7 +206,7 @@ Use these subsections with italic labels:
 
 Must be self-contained (all notation defined) and reproducible.
 
-### 5c. Write worked example (extra block)
+### 6c. Write worked example (extra block)
 
 Step-by-step walkthrough with concrete numbers from JSON data. Required steps:
 1. Show source instance (dimensions, structure, graph visualization if applicable)
@@ -170,15 +218,15 @@ Use `graph-colors`, `g-node()`, `g-edge()` for graph visualization — see refer
 
 **Reproducibility:** The `extra:` block must start with a `pred-commands()` call showing the create/reduce/solve/evaluate pipeline. The source-side `pred create --example ...` spec must be derived from the loaded canonical example data via the helper pattern in `write-rule-in-paper`; do not hand-write a bare alias and assume the default variant matches.
 
-### 5d. Build and verify
+### 6d. Build and verify
 
 ```bash
 make paper     # Must compile without errors
 ```
 
 Checklist: notation self-contained, complexity cited, overhead consistent, example uses JSON data (not hardcoded), solution verified end-to-end, witness semantics respected, paper compiles.
 
-## Step 6: Regenerate exports and verify
+## Step 7: Regenerate exports and verify
 
 ```bash
 cargo run --example export_graph    # Generate reduction_graph.json for docs/paper builds
@@ -187,7 +235,7 @@ make regenerate-fixtures            # Regenerate example_db/fixtures/examples.js
 make test clippy                    # Must pass
 ```
 
-`make regenerate-fixtures` is required so the paper can load the new rule's example data from `src/example_db/fixtures/examples.json`. Without it, the `reduction-rule` entry in Step 5 will reference missing fixture data.
+`make regenerate-fixtures` is required so the paper can load the new rule's example data from `src/example_db/fixtures/examples.json`. Without it, the `reduction-rule` entry in Step 6 will reference missing fixture data.
 
 Structural and quality review is handled by the `review-pipeline` stage, not here. The run stage just needs to produce working code.
 
@@ -229,3 +277,4 @@ Aggregate-only reductions currently have a narrower CLI surface:
 | Skipping Step 5 (paper documentation) | **Every rule MUST have a `reduction-rule` entry in the paper. This is mandatory, not optional. PRs without documentation will be rejected.** |
 | Source/target model not fully registered | Both problems must already have `declare_variants!`, aliases as needed, and CLI create support -- use `add-model` skill first |
 | Treating a direct-to-ILP rule as a toy stub | Direct ILP reductions need exact overhead metadata and strong semantic regression tests, just like other production ILP rules |
+| Skipping verification for complex reductions | Verification is default for a reason — `--no-verify` is for trivial identity/complement reductions only |
diff --git a/.claude/skills/issue-to-pr/SKILL.md b/.claude/skills/issue-to-pr/SKILL.md
@@ -9,7 +9,8 @@ Convert a GitHub issue into a PR: write a plan, create the PR, then execute the
 
 ## Invocation
 
-- `/issue-to-pr 42` — create PR with plan, then execute
+- `/issue-to-pr 42` — create PR with plan, then execute (for `[Rule]` issues, verification runs by default)
+- `/issue-to-pr 42 --no-verify` — skip mathematical verification for `[Rule]` issues
 
 For Codex, open this `SKILL.md` directly and treat the slash-command forms above as aliases. The Makefile `run-issue` target already does this translation.
 
@@ -37,6 +38,7 @@ Normalize to:
 - `ISSUE=<number>`
 - `REPO=<owner/repo>` (default `CodingThrust/problem-reductions`)
 - `EXECUTE=true|false`
+- `NO_VERIFY=true|false` (default `false`; pass `--no-verify` to skip mathematical verification for `[Rule]` issues)
 
 ### 2. Fetch Issue + Preflight Guards
 
@@ -91,7 +93,7 @@ The plan MUST reference the appropriate implementation skill and follow its step
 
 - **For ordinary `[Model]` issues:** Follow [add-model](../add-model/SKILL.md) Steps 1-7 as the action pipeline
 - **For `[Model]` issues that explicitly claim direct ILP solving:** Follow [add-model](../add-model/SKILL.md) Steps 1-7 **and** [add-rule](../add-rule/SKILL.md) Steps 1-6 for the direct `<Problem> -> ILP` rule in the same plan / PR
-- **For `[Rule]` issues:** Follow [add-rule](../add-rule/SKILL.md) Steps 1-6 as the action pipeline
+- **For `[Rule]` issues:** Follow [add-rule](../add-rule/SKILL.md) Steps 1-7 as the action pipeline. By default, `/add-rule` runs mathematical verification (Step 1) before implementation. If `--no-verify` was passed, include `--no-verify` when invoking `/add-rule` to skip verification.
 
 Include the concrete details from the issue (problem definition, reduction algorithm, example, etc.) mapped onto each step.
 
diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md