Commit b57889b
fix(paper_reviewer): include real paper body + bibliography in prompt; normalize score (#197)
Reviewers were issuing "no LaTeX source" / "no bibliography" verdicts on
arXiv-intake papers because they literally never saw the paper content:
* _concat_tex sorted .tex files alphabetically with a 60KB budget. For
a typical arXiv tarball (extra_pkgs.tex ≈ 3KB sorts first; main.tex
≈ 250KB sorts later), the budget got consumed by package
declarations and main.tex was always skipped. The reviewer's prompt
contained 3KB of \usepackage lines and a "(truncated; remaining
files: 2)" footer — no abstract, no methods, no results.
* state/citations/<PROJ>.yaml is never populated for arXiv-intake
papers, so the bibliography section was always "(no citations
recorded)" — even when paper/source/ref.bib was right there with
100+ entries.
* One specialist per project (~1/13) failed pydantic validation
because the LLM picked "accept" verdict but wrote score=0.0 (or
"minor_revision" with score=0.5). The score is purely derived from
the verdict — normalize on parse instead of losing a substantive
review to a numeric formatting slip.
Fixes:
1. _concat_tex now promotes the entry-point file (containing
\documentclass) to the front of the ordering, truncates IT to fit
budget if necessary (vs. silently skipping it), and the default
budget grew from 60KB → 180KB (~45K tokens, leaves room for the
response in a 128K context).
2. _summarize_bibfile fallback: when state/citations is empty, inline
paper/source/*.bib (capped at 30KB) so the reviewer can see what's
cited and judge the reference set.
3. handle_response normalizes score from verdict before validation.
Verified against 8 previously-failing projects (PROJ-564, 565, 566,
568, 570, 571, 576, 578). All 8 now produce substantive 13-specialist
reviews instead of crashing or emitting boilerplate "no source provided"
verdicts. Aggregate verdicts:
* accept : PROJ-564, 565, 566, 576
* minor_revision : PROJ-568, 570, 571
* major_revision_sci: PROJ-578 (correctly flagged "GPT-5.4 /
Claude Sonnet 4.5 / Gemini-3.1-Pro" as unverifiable model names)
Reviews now reference specific Algorithms, Tables, Figures, and
hyperparameters by name — the LLM is reading and reasoning about the
actual paper, not the package preamble.
Adds 9 new unit tests (17 total in test_paper_reviewer_arxiv_intake).
Full unit suite (395 tests) passes.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent aa223ec commit b57889b
130 files changed
Lines changed: 3576 additions & 28 deletions
File tree
- projects
- PROJ-564-qwen-image-vae-2-0-technical-report/paper/reviews
- PROJ-565-edit-compass-editreward-compass-a-unifie/paper/reviews
- PROJ-566-mint-managed-infrastructure-for-training/paper/reviews
- PROJ-568-identifying-stimulus-driven-neural-activ/paper/reviews
- PROJ-570-leveraging-verifier-based-reinforcement/paper/reviews
- PROJ-571-co-evolving-policy-distillation/paper/reviews
- PROJ-576-sana-wm-efficient-minute-scale-world-mod/paper/reviews
- PROJ-578-https-arxiv-org-abs-2605-14906/paper/reviews
- src/llmxive/agents
- state
- projects
- run-log/2026-05
- tests/unit
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 32 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
Lines changed: 26 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
Lines changed: 20 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
Lines changed: 45 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
Lines changed: 62 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
Lines changed: 20 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
Lines changed: 22 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
0 commit comments