Skip to content

Commit 6d5731b

Browse files
fix(writeup): sanitize abstract before render_tex
The abstract string on the Paper dataclass comes straight from idea.get("Abstract") (produced by the ideation step), not from the per-section LLM pass, so it bypassed the SANITIZE_PIPELINE entirely. Real-world demo run hit "Missing $ inserted" on the very first pdflatex pass because ideas.json contained "perplexity_gap" — an underscore LaTeX tries to open a subscript with. Running _sanitize_latex on the abstract covers the same failure modes as every other text block (underscore escape, orphan $, prose specials, reasoning-tag leakage, etc.) for ~zero cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3902ca3 commit 6d5731b

2 files changed

Lines changed: 24 additions & 1 deletion

File tree

skills/hermes-sci/package/hermes_sci/writeup.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -413,9 +413,16 @@ def context_fn(section_key: str) -> str:
413413
log.info("inserted table labels in %s: %s", k, sorted(after - before))
414414
cleaned[k] = v2
415415

416+
# Abstract comes from idea metadata (ideation step), not from the
417+
# per-section LLM pass, so it also bypasses the per-section sanitize
418+
# pipeline. Run it through explicitly so prose specials like `_` or a
419+
# truncated inline equation don't crash pdflatex before \section{} even
420+
# starts.
421+
abstract = _sanitize_latex(str(idea.get("Abstract") or ""))
422+
416423
return Paper(
417424
title=str(idea.get("Title") or "Untitled Research"),
418-
abstract=str(idea.get("Abstract") or ""),
425+
abstract=abstract,
419426
sections=cleaned,
420427
)
421428

skills/hermes-sci/package/tests/test_math_balance.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,22 @@ def test_triple_dollar_in_prose_drops_last():
6161
assert out.count("$") == 2 # one removed → even count restored
6262

6363

64+
def test_abstract_with_underscore_sanitized_by_writeup():
65+
"""Regression: ideas.json produces abstract text that bypasses the
66+
per-section sanitize pipeline (it comes from ideation, not writeup).
67+
write_paper must sanitize it explicitly before render_tex, otherwise
68+
`perplexity_gap` etc. crashes pdflatex on page 1."""
69+
from hermes_sci.sanitize import sanitize_latex
70+
abstract = (
71+
"We train on (input, perplexity_gap) pairs where perplexity_gap "
72+
"measures draft-target discrepancy."
73+
)
74+
out = sanitize_latex(abstract)
75+
# Both occurrences escaped.
76+
assert out.count(r"perplexity\_gap") == 2
77+
assert "perplexity_gap" not in out.replace(r"\_", "@")
78+
79+
6480
def test_dollar_inside_display_math_does_not_mask_prose_orphan():
6581
"""A bare `$` in prose + a self-contained display-math block
6682
should still detect the prose `$` as orphan."""

0 commit comments

Comments
 (0)