Reaper (REAd PapER) β an AI-native scientific research pipeline. A composable set of AI agent skills that takes a research goal β optionally with a research paper β and autonomously conducts rigorous, multi-step academic research. Runs on any agent that supports the SKILL.md convention (Cursor, Codex CLI, Cline, Continue, Gemini CLI, Copilot, Windsurf, Claude Code, and 40+ more).
Give Reaper a research question β with or without a PDF. It reads the paper (if provided), searches for related work, formalizes hypotheses, investigates them in parallel, critiques its own findings, and delivers a structured research report β all without manual prompting between steps.
# Without a paper β pure goal-driven research
/reaper "explore the feasibility of post-quantum threshold signatures"
# With a paper
/reaper "determine if the security proof in Section 4 holds under asynchrony" path/to/paper.pdf
How you invoke a skill depends on the host agent. The /<skill> form above is the canonical display convention used throughout these docs β it works directly on slash-command hosts (e.g. Claude Code), and on auto-discovery hosts (Cursor, Codex CLI, Cline, Continue, Gemini CLI, Copilot, Windsurf, β¦) you simply ask the agent to "run the /reaper skill on β¦" by its bare name.
Key capabilities:
- Autonomous multi-stage pipeline β goal clarification, paper analysis, literature review, hypothesis formalization, parallel investigation, critique, and synthesis all chain automatically
- Parallel investigation with keep-or-discard discipline β multiple hypotheses are investigated concurrently; only genuine progress advances the working state, while dead ends stay logged
- Built-in academic search β paper search, PDF download, citation graph tracing, and venue resolution across arXiv, IACR ePrint, Semantic Scholar, DBLP, and OpenAlex
- Domain-agnostic design β ships with cryptography and distributed systems references, but swap the reference files to adapt to any research domain
- Multi-model AI consultation β optionally consult Codex, Gemini, DeepSeek, or local models for a second opinion at every pipeline stage
- Composable skills β each pipeline stage is an independent skill you can run standalone
- Host-agnostic β distributed as plain
SKILL.mdfolders that work across 45+ AI coding agents
Reaper executes a multi-stage pipeline where investigation runs in parallel batches and critique provides feedback from multiple sources:
βββ /analyze-paper (if paper) βββ
/clarify-goal βββββ> β ββ> /formalize-problem
βββ /review-literature ββββββββββ β
β (parallel) v
β βββββββββββ> /brainstorm
βββ calls β β
/analyze-paper β β
per downloaded β ββ /investigate ββββββββββββββββββ
paper β β plan batch β
β β βββ> agent H1 ββ β
β β βββ> agent H2 ββΌββ> merge β
β β βββ> agent H3 ββ β β
β β next batch or done β
β ββββββββββββββββββββββββββββββββββ
β β
β ββ /critique βββββββββββββββββββββ
β β --self --codex "feedback" β
β ββββββ¬ββββββββββββββββββββ¬ββββββββ
β β β
β deepen/explore rewrite/done ββ> /synthesize ββ> report.md
βββββββββββ
Each skill can be used independently or composed by the orchestrator. Invoke by skill name using your host's native skill-loading mechanism.
| Skill | What it does |
|---|---|
/reaper |
Full pipeline: clarify β analyze β literature β formalize β brainstorm β investigate β critique β synthesize |
/clarify-goal |
Ask targeted clarifying questions to sharpen a vague research goal |
/analyze-paper |
Extract structured information from a research paper |
/review-literature |
Search and summarize related academic work |
/formalize-problem |
Produce precise, testable hypotheses from a research question |
/brainstorm |
Generate, prioritize, and refine research ideas based on current state |
/investigate |
Run investigation cycles with keep-or-discard discipline |
/critique |
Provide critique via human feedback, Codex consultation, or self-review (can trigger more investigation) |
/synthesize |
Generate a structured research report from investigation results |
/search-paper |
Find papers, download PDFs, trace citation graphs, and resolve publication venues across arXiv, IACR ePrint, Semantic Scholar, DBLP, and OpenAlex |
The
/<skill>form is the canonical display convention used throughout these docs. Slash-command hosts (Claude Code) invoke them directly that way (e.g./clarify-goal). Auto-discovery hosts (Cursor, Codex CLI, Cline, Continue, Gemini CLI, Copilot, Windsurf, β¦) invoke them by the bare skill name β drop the leading/when asking the agent to run a skill.
The search skills require Python packages:
pip install arxiv requests beautifulsoup4Note:
npx skillsonly copiesSKILL.mdfiles, Python scripts, and reference files into your agent's skills folder. It does not install Python dependencies, register MCP servers, or create thereaper-workspace/directory. Install Python deps yourself with the command above; register the Codex MCP server separately if you want--codex(see Optional: Multi-model AI consultation below); the workspace directory is created automatically the first time the pipeline runs.
Reaper is distributed as standard SKILL.md folders. The cross-agent installer vercel-labs/skills shallow-clones this repository and copies all 10 skill directories β including Python scripts and reference files β into your agent's conventional skills folder.
# Latest from the default branch
npx skills add SebastianElvis/reaper
# Pinned to a specific release (recommended for reproducibility)
npx skills add SebastianElvis/reaper#v0.4.0
# Install into a specific agent (defaults to all detected)
npx skills add SebastianElvis/reaper --agent cursorSupported targets include Cursor, OpenAI Codex CLI, Cline, Continue, Gemini CLI, GitHub Copilot, Windsurf, OpenCode, Warp, Goose, Replit, Claude Code, and a universal target at .agents/skills/. See npx skills list-agents for the full list.
Reminder:
npx skills addcopies files only. Python deps and MCP server registration are separate steps β see Prerequisites above and Optional: Multi-model AI consultation below.
Claude Code can also consume Reaper via its native plugin marketplace mechanism, which bundles the same skills with slash-command routing:
/plugin marketplace add SebastianElvis/reaper
/plugin install reaper@SebastianElvis-reaper
Or clone and add as a local marketplace:
git clone https://github.com/SebastianElvis/reaper.git
/plugin marketplace add ./reaper
/plugin install reaper@reaperSee the Claude Code plugin docs for more details.
- Slash-command hosts (Claude Code):
/reaper "<goal>",/analyze-paper <path>, etc. Each skill is available as a top-level slash command. - Auto-discovery hosts (Cursor, Codex CLI, Cline, Continue, Gemini CLI, Copilot, Windsurf, β¦): the agent loads
SKILL.mdfiles from its skills folder and invokes them by name when the task matches the skill'sdescription. Ask the agent to run the skill, e.g. "use the reaper skill to research X". - Manual invocation: any host can be pointed at a specific
SKILL.mdif its native discovery doesn't pick it up.
A few skill features are host-specific:
- The
--codexflag enables external-model consultation via MCP. It currently requires a host with MCP support (Claude Code, OpenCode, etc.) and silently falls back to self-review elsewhere. - Frontmatter keys
user-invocable,argument-hint, hooks, andcontext: forkare Claude-Code-specific. They are preserved in the SKILL.md files but no-op on other hosts.
Pass --codex to enable pipeline-wide AI consultation β every skill gains a checkpoint where it can consult an external model for a second opinion. The orchestrator controls when consultations happen and routes to the best-suited model (see skills/reaper/references/codex-consultation.md for the protocol).
Supported backends:
| Model | Setup | Strength |
|---|---|---|
| OpenAI Codex/o3 | Register codex-mcp-server in your host's MCP config |
Adversarial review, stress-testing arguments |
| Google Gemini | (coming soon) | Long-context review across full paper corpora |
| DeepSeek R1 | (coming soon) | Proof checking, formal reasoning |
| Local models | (coming soon β via ollama) | Offline/private use, cost control |
Example registration on Claude Code:
claude mcp add codex-cli -- npx -y codex-mcp-serverOther MCP-capable hosts use their own equivalent registration. If no model backends are configured, AI consultation is silently skipped and the pipeline continues with self-review only.
When Reaper runs, it creates a reaper-workspace/ directory:
reaper-workspace/
βββ notes/ # Evolving β edited inline to reflect latest state
β βββ clarified-goal.md # Refined goal, scope, assumptions, Q&A
β βββ paper-summary.md # Structured paper extraction
β βββ literature.md # Related work survey
β βββ problem-statement.md # Formalized problem (model, properties, metrics)
β βββ ideas.md # Research ideas/hypotheses (edited inline on revisit)
β βββ current-understanding.md # "Branch tip" β advances only on keep
β βββ results.md # One row per hypothesis, updated inline on revisit
βββ investigations/ # Evolving β reuse directory on revisit, edit inline
β βββ NNN-<name>/ # One directory per hypothesis
β βββ analysis.md # Reasoning, attempts, dead ends, insights
β βββ proof.md # Formal proofs (theorems, lemmas, corollaries)
βββ feedbacks/ # Append-only β one file per event, never modified
β βββ round-N.md # Human feedback classified by type
β βββ codex-consultation-N.md # Codex critique (alternates devil's advocate / inspiration)
βββ logs/ # Append-only β one file per cycle, never modified
β βββ cycle-NNN-<slug>.md # One log per investigation cycle (snapshot at cycle end)
βββ report.md # Final synthesized output
The workspace contract is host-agnostic β any agent that can read and write files in the working directory produces the same workspace structure.
Skills ship with a layered evaluation system following Anthropic's Demystifying Evals for AI Agents methodology. The judge is the local claude CLI β no API key, just your existing subscription.
| Layer | Grader | Cadence | Scope |
|---|---|---|---|
| L1 Structural | Code (evals/graders/) |
Every PR (CI) | Required sections, lengths, broken refs, keep-or-discard cycle invariant |
| L2 Skill rubric | claude -p with structured-output JSON schema (evals/judge/) |
Locally / nightly | Per-skill quality dimensions: groundedness, specificity, completeness |
| L3 End-to-end | Both | Pre-release | Full /reaper pipeline against canonical cases |
# L1 only (no LLM) β same thing CI runs
python3 -m evals.run_evals --layer structural
# L1 + L2 (uses your local claude CLI)
python3 -m evals.run_evals --layer all --skill analyze-paperEach fixture pairs a gold-standard reference with planted negatives β one targeting L1 (drops a required section) and one targeting L2 (fabricated theorem statements, generic content) β so a permissive grader fails CI as visibly as a missed regression. See evals/README.md for the full design and how to add a fixture.
Reaper's research loop follows six principles:
- Separation of Concerns β AI writes to workspace, human provides the goal, skill definitions are fixed
- Fixed Evaluation Signal β Clarify the goal, establish baseline via paper analysis and literature review, then formalize into precise hypotheses with trust assumptions, security properties, and impossibility screening
- Structured Results Log β Every hypothesis gets one row in
notes/results.mdwith action, outcome, confidence, and keep/discard status; revisits update the row inline rather than appending duplicates - Keep-or-Discard Loop β
current-understanding.mdonly advances on genuine progress; dead ends stay logged but don't pollute working state - Never Stop β Run all cycles without asking permission to continue; if stuck, re-read the paper, question assumptions, combine discarded results, search for more related work, or try a radically different approach
- Clarity and Simplicity β One "ping" per finding, refutable claims, fewer assumptions = better; write early to crystallize understanding, not just to report it
See dev/ROADMAP.md for the full methodology and development roadmap.
See dev/ROADMAP.md for the full roadmap.
- Horizon 1 (The Pipeline): Core skills, orchestrator, and layered eval system (L1 structural graders + L2 Claude-CLI judges with rubrics, calibrated against planted negatives) β complete; LaTeX report output and broader rubric coverage across all skills planned
- Horizon 2 (The Library): arXiv/ePrint search via Python scripts + citation graph + venue resolution (Semantic Scholar / DBLP / OpenAlex) β complete
- Horizon 3 (The Committee): Multi-model critique via the
/critiqueskill's--codexmode β Codex complete, Gemini/DeepSeek/local planned - Horizon 3.5 (The Polyglot): Cross-agent distribution via
npx skillsand host-agnostic skill prose β complete; per-host orchestration polish ongoing - Horizon 4 (The Academy): Broader topic search (Scholar/DBLP), author-centric and venue-centric search β planned
- Horizon 5 (The Apprentice): Evidence quality taxonomy, evidence-aware critique β planned
- Horizon 6 (The Examiner): Proactive reformulation trigger, claim provenance, formal verification β planned
Reaper's methodology draws from the following sources:
- karpathy/autoresearch β Loop discipline: constrain the loop so the AI can iterate autonomously with a clear signal of progress. The structured results log, keep-or-discard mechanism, and never-stop policy are direct adaptations.
- Richard Hamming, "You and Your Research" β The importance filter ("Why are you not working on the important problems?") and the technique of inverting blockages into insights.
- Zhiyun Qian, "How to Look for Ideas in Computer Science Research" β Systematic idea generation patterns (fill-in-the-blank, start-small-then-generalize, build-a-hammer) for formulating research ideas.
- Simon Peyton Jones, "How to Write a Great Research Paper" β Writing as a primary mechanism for doing research, not just reporting it. Shapes the report structure: one clear "ping," explicit refutable contributions, examples before generality.
- S. Keshav, "How to Read a Paper" (ACM SIGCOMM CCR, 2007) β The three-pass method for reading papers: first pass for the big picture, second pass to grasp content without details, third pass to virtually re-derive the work and challenge every assumption. Structures how the literature review skill reads downloaded papers at increasing depth.
- Mathew Stiller-Reeve, "How to Write a Thorough Peer Review" (Nature, 2018) β Three-reading review method (aims β scientific substance β presentation) and the mirror technique. Structures the per-paper notes in the literature review skill: mirror the paper's claims, classify issues as major/minor/fatal, evaluate whether conclusions answer the introduction's questions.
vercel-labs/skillsβ The cross-agent skills convention and CLI installer that makes Reaper portable across 45+ AI coding agents.
This project is open source. Licensed under the Apache License 2.0.