You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Revert review-structural Step 4b; let auto-pipeline Step 3 cover it
Step 4b mixed three different concerns into review-structural (a
read-only structural skill):
- 4b-2 ran `cargo test --exact <closed_loop>` for the new rule —
redundant with auto-pipeline Step 3's workspace-wide `make check`.
- 4b-3 read the test source to score it against four criteria — that's
a test-quality judgment, not a structural check; review-quality is
the right home for that and it already covers test quality.
- 4b-4 ran `pred solve` end-to-end via the new rule — overlaps with
the agentic feature tests review-pipeline already runs.
- 4b-1 (grep for the closed_loop test) is subsumed by Step 4's
existing checklist of required rule artefacts.
Restoring review-structural to its main version. Also removing the
Phase 4 prompt's mandatory-Step-4b clause from auto-pipeline.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Run the rule **end to end** on its canonical example(s), not just confirm a `closed_loop` test exists. The goal is to catch reductions that compile but silently lose information or drop corner cases.
124
-
125
-
### 4b-1: Locate the test
126
-
127
-
```bash
128
-
# The closed-loop test for this rule. The reference helper is:
### 4b-2: Run it by name and confirm it actually executes
135
-
136
-
```bash
137
-
cargo test --lib --no-fail-fast -- --exact test_{rule_module}::test_{...}_closed_loop
138
-
```
139
-
140
-
- Confirm the binary prints `test result: ok. 1 passed` for **each** named test.
141
-
- "0 passed; 0 failed; 0 ignored" means the test name was wrong → flag as ISSUE.
142
-
- If the closed-loop test is `#[ignore]`'d → FAIL.
143
-
144
-
### 4b-3: Confirm the round-trip is real, not a type check
145
-
146
-
Read each closed-loop test and verify it does ALL FOUR of the following:
147
-
148
-
| # | What the test must do | Red flag |
149
-
|---|---|---|
150
-
| 1 | Build a concrete source instance with **multiple feasible solutions of different objective values**| Trivial instance (single edge, single clause, empty graph) — anything where the answer is unique by inspection |
151
-
| 2 | Call `ReduceTo::<Target>::reduce_to(&source)` to produce the target | Test only asserts on `target_problem()` fields without actually solving |
152
-
| 3 | Solve the target (BruteForce / ILP / appropriate solver) AND solve the source independently | Test compares only target value, never source value |
153
-
| 4 | Use `extract_solution` (or the appropriate helper, e.g. `assert_optimization_round_trip_from_optimization_target`) to map the target witness back, then assert the extracted source configuration is **optimal** for the source | Test asserts only that `extract_solution` returns `Some(_)` or has the right length |
154
-
155
-
If any of (1)-(4) is missing → ISSUE (test does not verify the round-trip).
156
-
157
-
### 4b-4: Spot-check on a fresh instance (when feasible)
158
-
159
-
The rule's own closed-loop test is author-supplied — it may use exactly the input that makes the reduction look right. To catch that class of bug, drive a different instance through the CLI:
160
-
161
-
```bash
162
-
pred path <Source><Target> --json # confirm the new rule is on the chosen path
163
-
pred create --example <rule_example_id> --output /tmp/src.json
164
-
pred solve /tmp/src.json # direct source-side optimum
165
-
pred solve /tmp/src.json --via <Target># through the new reduction
166
-
```
167
-
168
-
The two `pred solve` values must agree. If `--via` isn't wired for this path, skip 4b-4 and note it in the report — do not pad with a re-statement of 4b-2.
169
-
170
121
## Step 5: Issue Compliance Review
171
122
172
123
Only if a linked issue was provided.
@@ -212,12 +163,6 @@ Flag any deviation as ISSUE.
212
163
### Semantic Review
213
164
- [check]: OK / ISSUE — description
214
165
215
-
### Round-trip Execution (rules only)
216
-
- Closed-loop test located: PASS / FAIL — name(s) and file
217
-
- Closed-loop test runs and passes: PASS / FAIL — paste the `test result: ok. N passed` line
218
-
- Test exercises real round-trip (4 criteria): PASS / FAIL — note any missing criterion
219
-
- Spot-check with pred / scratch: PASS / FAIL — record method used and observed values
0 commit comments