Skip to content

Commit c6f0345

Browse files
isPANNclaude
andcommitted
add review-paper skill for evaluating quality of problem definitions and reduction rules
This new skill allows users to review the Typst paper for quality issues, evaluating 10 entries per session and generating structured reports on mechanical and critical issues. The skill includes detailed checklists for both problem definitions and reduction rules, ensuring thorough evaluations without modifying any files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 512556d commit c6f0345

1 file changed

Lines changed: 142 additions & 0 deletions

File tree

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
---
2+
name: review-paper
3+
description: Review the Typst paper (docs/paper/reductions.typ) for quality issues — evaluates 10 entries per session, reports mechanical and critical issues without fixing
4+
---
5+
6+
# Review Paper
7+
8+
Evaluate the quality of problem definitions and reduction rules in `docs/paper/reductions.typ`. Each session reviews **10 entries** (problems or rules), producing a structured report. **Read-only — do not modify any files.**
9+
10+
## Usage
11+
12+
```
13+
/review-paper # review next 10 unreviewed problem-defs
14+
/review-paper rules # review next 10 unreviewed reduction-rules
15+
/review-paper ProblemName # review a specific problem-def
16+
/review-paper Source Target # review a specific reduction-rule
17+
```
18+
19+
## Step 0: Determine Scope
20+
21+
Parse the argument:
22+
- No argument or `problems` → review problem-defs
23+
- `rules` → review reduction-rules
24+
- A specific name → review that single entry
25+
26+
To pick which 10 to review, scan `docs/paper/reductions.typ` for all `problem-def(...)` or `reduction-rule(...)` entries. Start from the beginning of the file, skipping any that have been reviewed in a previous session (check memory for `paper-review-progress`). If all have been reviewed, report completion.
27+
28+
## Step 1: Load Gold Standard
29+
30+
Read the reference examples before reviewing:
31+
- **Problem gold standard:** search for `problem-def("MaximumIndependentSet")` in `reductions.typ` — note its structure, depth, and components
32+
- **Rule gold standard:** search for `reduction-rule("MaximumIndependentSet", "MinimumVertexCover"` — note its proof depth and example
33+
34+
## Step 2: Review Each Entry
35+
36+
For each of the 10 entries, read the full entry text and evaluate against the checklists below.
37+
38+
### Problem-Def Checklist
39+
40+
**Mechanical checks** (objective, can be verified by reading):
41+
42+
| Check | Criterion |
43+
|-------|-----------|
44+
| M1. Display name | Entry exists in `display-name` dictionary |
45+
| M2. Formal definition | `def` parameter is present and non-empty |
46+
| M3. Self-contained notation | Every symbol in `def` is defined before first use |
47+
| M4. Background text | Body contains at least 2 sentences of background/motivation |
48+
| M5. Example present | Body contains `*Example.*` or `Example.` |
49+
| M6. Example from fixture | Example data matches `src/example_db/fixtures/examples.json` (not invented) — check by loading the JSON and comparing |
50+
| M7. Figure present | Body contains `#figure(` |
51+
| M8. Pred commands | Body contains `pred-commands(` or `pred create` |
52+
| M9. Algorithm citation | Complexity claims have `@citation` or a footnote explaining absence |
53+
| M10. Evaluation shown | Example shows how the objective/verifier computes the value |
54+
55+
**Critical checks** (require judgment):
56+
57+
| Check | Criterion |
58+
|-------|-----------|
59+
| C1. Definition correctness | Does the formal definition accurately describe the problem? Compare with the Rust implementation (`src/models/`) and literature |
60+
| C2. Background quality | Is the background informative? Does it mention applications, history, special cases, or algorithmic context? |
61+
| C3. Example pedagogy | Is the example small enough to verify by hand? Does it illustrate the key aspects of the problem? |
62+
| C4. Completeness | Are there important aspects of the problem that are missing (e.g., well-known special cases, relationship to other problems)? |
63+
64+
### Reduction-Rule Checklist
65+
66+
**Mechanical checks:**
67+
68+
| Check | Criterion |
69+
|-------|-----------|
70+
| M1. Theorem statement | Rule body describes the construction |
71+
| M2. Proof present | Proof body is non-empty |
72+
| M3. Proof length | Proof is at least 3 sentences (not just "trivial" or a one-liner) |
73+
| M4. Overhead documented | Overhead is auto-generated from JSON (verify edge exists in `reduction_graph.json`) |
74+
| M5. Example present | `example: true` and example renders correctly |
75+
| M6. Example from fixture | Example data matches `src/example_db/fixtures/examples.json` |
76+
| M7. Pred commands | Example section contains `pred-commands(` with create/reduce/evaluate pipeline |
77+
| M8. Both directions | If the reverse rule also exists in the graph, check it has its own entry |
78+
79+
**Critical checks:**
80+
81+
| Check | Criterion |
82+
|-------|-----------|
83+
| C1. Construction correctness | Does the theorem statement accurately describe what `reduce_to()` does? Read `src/rules/<source>_<target>.rs` to verify |
84+
| C2. Proof correctness | Does the proof correctly argue that the reduction preserves solutions? |
85+
| C3. Example clarity | Does the example clearly show source → target → solution extraction? |
86+
| C4. Proof-only flag | If this is a proof-only reduction (not solver-executable), is that stated? |
87+
88+
## Step 3: Generate Report
89+
90+
Present results **one entry at a time** in this format:
91+
92+
```
93+
### [N/10] ProblemName (or Source → Target)
94+
95+
**Mechanical Issues:**
96+
- [PASS] M1. Display name
97+
- [FAIL] M5. Example present — no worked example in body
98+
- [WARN] M9. Algorithm citation — complexity claim "O*(2^n)" has no @citation
99+
100+
**Critical Issues:**
101+
- [FAIL] C2. Background quality — body is only one sentence ("This is NP-hard.")
102+
with no applications, history, or algorithmic context
103+
- [OK] C1. Definition correctness — matches Rust implementation
104+
105+
**Verdict:** 2 mechanical fails, 1 critical fail — needs improvement
106+
```
107+
108+
After each entry, pause and ask: **"Continue to next entry, or discuss this one?"**
109+
110+
Use these severity levels:
111+
- **PASS** — meets criterion
112+
- **WARN** — minor issue, could be improved but acceptable
113+
- **FAIL** — does not meet criterion, should be fixed
114+
115+
## Step 4: Session Summary
116+
117+
After all 10 entries, print a summary table:
118+
119+
```
120+
## Session Summary
121+
122+
| Entry | Mechanical | Critical | Verdict |
123+
|-------|-----------|----------|---------|
124+
| ProblemA | 9/10 pass | 4/4 pass | Good |
125+
| ProblemB | 7/10 pass | 3/4 pass | Needs work |
126+
| ... | ... | ... | ... |
127+
128+
Overall: X/10 entries pass all checks.
129+
Top priorities for improvement: [list the 3 worst entries]
130+
```
131+
132+
## Step 5: Save Progress
133+
134+
Save progress to memory so the next session can continue where this one left off. Record which entries have been reviewed and their verdicts.
135+
136+
## Important Rules
137+
138+
1. **Do not modify any files.** This skill is read-only.
139+
2. **Do not invent issues.** Only report problems you can verify by reading the source.
140+
3. **Check the Rust source** for critical checks — don't guess whether the math is right.
141+
4. **Be specific.** "Background is thin" is not useful. "Background is one sentence with no applications or algorithmic context" is useful.
142+
5. **Compare to gold standard.** The MIS entry is the reference — entries don't need to be as long, but they should cover the same structural elements.

0 commit comments

Comments
 (0)