Skip to content

feat(paper_reviewer): give arxiv-intake reviewers real visual + provenance context#196

Merged
jeremymanning merged 1 commit into
mainfrom
improve-paper-reviewer-arxiv-context
May 17, 2026
Merged

feat(paper_reviewer): give arxiv-intake reviewers real visual + provenance context#196
jeremymanning merged 1 commit into
mainfrom
improve-paper-reviewer-arxiv-context

Conversation

@jeremymanning
Copy link
Copy Markdown
Member

Follows up #195. The hard-fail fix wasn't enough — the reviewer must produce valid + useful reviews. Two improvements:

  1. Figure discovery — _collect_figures_from_arxiv_source scans paper/source/** for figure-like files (since arXiv tarballs don't use the home-grown paper/figures/ convention). PROJ-564 went from 0 figures visible → 10.
  2. Provenance block — adds an explicit 'this is a third-party arXiv paper' header to the user prompt so the LLM credits the right authors and doesn't confuse the intake bot with the submitter.

9/9 unit tests pass.

🤖 Generated with Claude Code

…nance context

Per the user followup to #195: the reviewer must produce valid + useful
reviews of arXiv-submitted papers, not just succeed. Two changes to
deliver that:

1. New helper _collect_figures_from_arxiv_source(source_dir) scans
   paper/source/** recursively for .pdf/.png/.jpg/.eps/.svg/etc. and
   summarizes each with project-relative path + size. arXiv tarballs
   put figures under conventional subdirs (figures/, figs/, pics/,
   images/, plots/, logo/, etc.) instead of the home-grown pipeline's
   paper/figures/. Capped at 200 entries; skips top-level PDFs
   (typically compiled output, not figures). Used by build_messages
   when is_arxiv_intake is True OR when paper/figures/ is empty.

   Verified on PROJ-564: discovers 10 real figures across logo/ and
   pics/ that the prior helper missed entirely.

2. New 'Paper provenance' block prepended to the user prompt for
   arxiv-intake reviews. Explicitly tells the LLM:
     - the paper is third-party (ingested from arXiv, NOT generated)
     - the author list (so it credits correctly)
     - the arXiv URL (so it can cross-reference)
     - the submitter field is the intake agent, NOT an author
     - to focus on the paper itself, not on missing speckit artifacts
   This addresses past failure modes where reviewers confused the
   intake bot with the paper's authors.

+ 6 new unit tests (figure discovery + intake block presence). Existing
3 fallback tests still pass. Full local test: 9/9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremymanning jeremymanning merged commit 27197c3 into main May 17, 2026
@jeremymanning jeremymanning deleted the improve-paper-reviewer-arxiv-context branch May 17, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant