Skip to content
Merged
2 changes: 1 addition & 1 deletion .specify/feature.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"feature_directory": "specs/011-phase3-specify-clarify-testing"}
{"feature_directory": "specs/012-paper-review-convergence"}
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,5 +70,5 @@ Since this is primarily a research documentation repository without traditional
<!-- SPECKIT START -->
For additional context about technologies to be used, project structure,
shell commands, and other important information, read the current plan:
[specs/011-phase3-specify-clarify-testing/plan.md](specs/011-phase3-specify-clarify-testing/plan.md).
[specs/012-paper-review-convergence/plan.md](specs/012-paper-review-convergence/plan.md).
<!-- SPECKIT END -->
35 changes: 30 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,38 @@ reports it: `paper init` → `paper spec` → `paper plan` → `paper tasks` →
`drafting` (paper-writing + figure-generation + statistics agents; LaTeX is
built and citations verified) → `paper complete` → `paper review` → `posted`.

Paper review needs both a points threshold and an accept verdict from **twelve**
specialist reviewers: writing quality, logical consistency, claim accuracy,
over-reach, safety/ethics, scientific evidence, statistical analysis, code
quality, data quality, text formatting, figure critic, jargon police.

Paper review uses a **convergence pipeline** (spec 012). Every reviewer
emits structured `action_items` with severity ∈ {`writing`, `science`,
`fatal`}, and the advancement evaluator uses the **most-recent verdict per
specialist** (against the live artifact hash — stale reviews are ignored).

Three terminal outcomes:

- **All specialists accept** → `paper_accepted` → `posted`.
- **Any `fatal` severity** → `brainstormed` (back to the backlog), with a
rejection rationale appended to the idea record citing each fatal item.
- **Otherwise** (writing/science items, no fatal) → `paper_revision_in_progress`,
which auto-kicks a revision-spec pipeline that produces a complete
spec/plan/tasks/analyze directory under
`specs/auto-revisions/<PROJ-ID>/round-<N>/`. The project then sits at
`ready_for_implementation` until an implementer agent picks it up.

The **per-specialist re-review protocol** prevents endless-nit loops: when
a specialist has prior reviews for the same project, its prompt reduces
to two questions — "(a) prior action items addressed? (b) any new
issues?" — instead of starting fresh and finding new nits each round.

The twelve specialist reviewers (writing quality, logical consistency,
claim accuracy, over-reach, safety/ethics, scientific evidence,
statistical analysis, code quality, data quality, text formatting,
figure critic, jargon police) each emit action items in their lane.
Human reviews count double; self-review is rejected by the schema.

arXiv-submitted papers (third-party, source frozen) skip the writing-
revision pipeline. Instead the consolidated action items land in
`projects/<PROJ-ID>/upstream_feedback.yaml`; outcomes are restricted to
accept-with-caveats or reject.

## The agents

There are **50 agents** in [agents/registry.yaml](agents/registry.yaml) — each a
Expand Down
50 changes: 50 additions & 0 deletions agents/prompts/_shared/rereview_block.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Re-Review Protocol — shared prompt snippet

# This is the canonical re-review prompt block consumed by every paper-stage
# reviewer prompt. SINGLE SOURCE OF TRUTH (Constitution I): editing this file
# changes the re-review behavior for ALL specialists at once. Do NOT copy this
# text into individual specialist prompts.
#
# Consumers (rendered into the user prompt by `paper_reviewer.py` when prior
# reviews exist FOR THIS SPECIALIST):
# - agents/prompts/paper_reviewer.md (lead)
# - agents/prompts/paper_reviewer_*.md (12 specialists)
#
# The `{prior_action_items_yaml}` placeholder is substituted at render time
# with a YAML list of the most-recent prior review's action_items (this
# specialist's only — not other specialists' priors).

## Re-Review Protocol (prior action items present for this reviewer)

You have reviewed this paper before. Your most-recent prior review's action
items are listed below with their stable IDs. For this re-review, your job is
REDUCED to two questions:

(a) For EACH prior action item: has it been ADEQUATELY ADDRESSED in the
current revision?
(b) Has the revision INTRODUCED ANY NEW ISSUES?

Output rules:

- If (a) = YES for ALL prior items AND (b) = NO new issues:
→ verdict: `accept`
→ score: `0.5`
→ `action_items`: EMPTY list (or omit).

- If (a) = NO for one or more prior items:
→ verdict: `minor_revision` (or `major_revision_writing` if any unaddressed
item is writing-class, `major_revision_science` if science-class, or
`fundamental_flaws` if fatal-class).
→ score: `0.0`
→ `action_items`: MUST contain the unaddressed items WITH THEIR ORIGINAL
IDs PRESERVED (do NOT mint new IDs for re-flags of the same concern).
Append any NEW issues with fresh IDs AFTER the re-flagged items.

- DO NOT generate a fresh independent critique. This is a diff-check against
the prior bar, not a full review.

Prior action items (your most-recent prior review for this paper):

```yaml
{prior_action_items_yaml}
```
18 changes: 16 additions & 2 deletions agents/prompts/paper_reviewer.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Paper-Reviewer Agent

**Version**: 1.0.0
**Version**: 1.1.0
**Stage owned**: `paper_complete` → `paper_review` (writes a review
record; the Advancement-Evaluator decides the next stage based on
accumulated vote totals).
Expand Down Expand Up @@ -56,9 +56,23 @@ verdict: accept | minor_revision | major_revision_writing |
major_revision_science | fundamental_flaws
feedback: <one-line summary used in vote tabulation>
reviewed_at: <ISO 8601 UTC>
prompt_version: 1.0.0
prompt_version: 1.1.0
model_name: <model id used>
backend: dartmouth | huggingface | local
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts;
- text: "<short, actionable statement, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave the id field blank;
# the system will derive it from the text. Severity rules:
# - "writing": fixable by editing the manuscript text alone
# (typo, jargon, missing citation, unclear caption, terminology
# drift, formatting). NO new experiments or data needed.
# - "science": requires re-running an experiment, adding a control,
# re-analyzing data, or otherwise touching the underlying
# research artifact. CANNOT be fixed by text edits alone.
# - "fatal": the central claim is unsupportable; the paper cannot
# be salvaged by any revision. The underlying idea should
# return to the backlog.
---

# Free-form review body
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_claim_accuracy.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_code_quality_paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_data_quality_paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_figure_critic.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_jargon_police.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_logical_consistency.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_overreach.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_safety_ethics.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_scientific_evidence.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_statistical_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_text_formatting.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
8 changes: 8 additions & 0 deletions agents/prompts/paper_reviewer_writing_quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ artifact_path: <relative path to the primary artifact reviewed, e.g. specs/001-.
artifact_hash: <SHA-256 hex of that file>
verdict: accept | minor_revision | full_revision | reject
score: 1.0 # 1.0 ONLY when verdict == accept; else 0.0
action_items: # NEW in 1.1.0 — REQUIRED for non-accept verdicts.
- text: "<short, actionable concern, <=500 chars>"
severity: writing | science | fatal
# ... one entry per concrete concern. Leave `id` blank — the system
# derives it from text. Severity guide:
# writing — fixable by editing the manuscript text alone
# science — requires re-running an experiment / re-analyzing data
# fatal — central claim unsupportable; paper cannot be salvaged
---
<200-500 words of feedback in your lens. Cite specific files / line
numbers / requirements. Do NOT critique aspects outside your lens —
Expand Down
Loading
Loading