From 734c2ec7c4d674821ad551eef76406bd2dd52535 Mon Sep 17 00:00:00 2001 From: GeneAI Date: Wed, 27 May 2026 07:30:21 -0400 Subject: [PATCH 1/2] fix: prevent LLM polish from laundering source_hash frontmatter MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Root cause: `apply_polish_results` wrote the LLM's polished output verbatim, including any frontmatter the LLM emitted. The LLM is given the rendered template (with frontmatter) as input context; it polishes the body but ALSO echoes the frontmatter — sometimes with single-character transcription errors in deterministic fields like `source_hash`. That broke staleness detection: the frontmatter `source_hash` written into the polished file didn't match what `compute_source_hash` recomputed on the same source, leaving the feature permanently "stale" after a successful regen. Concrete evidence (attune-ai spec-engine, 2026-05-27): frontmatter: f8ced22b02899aa25ff409636e659830c6ba856d70de6ddd1a9bf1cbe37a1337 computed: f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337 ^ position 19: LLM wrote f4 instead of f7 Single-character difference at one byte of a 64-char SHA-256 hex digest. The same `compute_source_hash` function called twice in the same Python process returns identical values — pure LLM hallucination of the value it was supposed to echo verbatim. Fix: strip whatever frontmatter the LLM emitted from the polished content and re-inject the canonical frontmatter from `entry.rendered_content`. The LLM polishes the BODY; the frontmatter (especially `source_hash`, `generated_at`, `feature`, `depth`, `name`, `status`, `type`) is non-negotiable deterministic metadata that must survive the polish step exactly. Implementation: - New `_replace_polished_frontmatter(polished, canonical_source)` helper in `generator.py`. Uses a frontmatter regex to extract the canonical block, strip whatever the LLM emitted, and re-assemble. - `apply_polish_results` now calls the helper for every depth with a polished result. Lenient-mode failures (depth missing from `polished_by_depth`) fall through to the raw rendered template, which already has correct frontmatter. Tests: 7 new regression tests covering the corrupted-hash case, LLM-stripped frontmatter, LLM-correct frontmatter, no-canonical- frontmatter defensive path, and body-whitespace preservation — plus two behavioral tests asserting `apply_polish_results` writes the canonical hash to disk regardless of LLM perturbation. Spec doc (`docs/specs/regen-staleness-hash-mismatch/decisions.md`) updated with the verified root cause replacing the original budget-truncation hypothesis (which was wrong — `compute_source_hash` runs only once and is fully deterministic; the divergence lives entirely in the polish step's output mutation). Unblocks attune-gui Phase 2 (`living-docs-regen-automation`) which needed `attune-author status --dry-run` to reach a fixed point after regen. Local: 173/173 unit tests pass. Co-Authored-By: Claude Opus 4.7 --- .../decisions.md | 79 ++++++- src/attune_author/generator.py | 68 +++++- .../test_polished_frontmatter_reinjection.py | 214 ++++++++++++++++++ 3 files changed, 359 insertions(+), 2 deletions(-) create mode 100644 tests/unit/test_polished_frontmatter_reinjection.py diff --git a/docs/specs/regen-staleness-hash-mismatch/decisions.md b/docs/specs/regen-staleness-hash-mismatch/decisions.md index 3514dd4..9cbde17 100644 --- a/docs/specs/regen-staleness-hash-mismatch/decisions.md +++ b/docs/specs/regen-staleness-hash-mismatch/decisions.md @@ -1,9 +1,86 @@ # Decisions — Regen / staleness hash mismatch -**Status:** draft — bug confirmed externally, fix not yet scoped. +**Status:** root cause confirmed 2026-05-27 — original hypothesis (budget truncation of hash inputs) was wrong; actual cause is LLM-polished frontmatter laundering. Fix direction concrete. Implementation TBD. **Owner:** Patrick **Filed:** 2026-05-25 (handoff from attune-gui Phase 2 blockers; see [attune-gui docs/specs/living-docs-regen-automation/decisions.md](https://github.com/Smart-AI-Memory/attune-gui/blob/main/docs/specs/living-docs-regen-automation/decisions.md#phase-2-blockers-discovered-2026-05-23)) +## Root cause (verified 2026-05-27) + +The original hypothesis (budget-truncated hash input) was wrong. +`compute_source_hash` is called exactly ONCE at +`generator.py:355` (inside `prepare_polish_phase`) and produces +a deterministic value off the FULL source set. Verified by +calling it twice in a row — idempotent. The `source_hash` +variable flows through to `_render_template` at line 1452 +which writes it into the rendered template's frontmatter +correctly. + +**The actual bug is in `apply_polish_results` at +`generator.py:468`:** + +```python +final_content = polished_by_depth.get(entry.depth, entry.rendered_content) +... +entry.out_path.write_text(final_content, encoding="utf-8") +``` + +When LLM polish ran, the polished content REPLACES the rendered +template **including the frontmatter the LLM regenerated as part +of its output**. The LLM is given the rendered template (with +correct frontmatter) as input context, polishes the body, and +returns the whole document — but its emitted frontmatter has a +single-character transcription error in the `source_hash` field. + +**Reproducible evidence (attune-ai spec-engine, 2026-05-27):** + +``` +frontmatter source_hash: f8ced22b02899aa25ff409636e659830c6ba856d70de6ddd1a9bf1cbe37a1337 +computed source_hash: f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337 + ^ + position 19: f4 vs f7 +``` + +Single-char difference at byte 19 of a 64-char SHA-256 hex +digest. Pure LLM hallucination of the value it was supposed to +echo verbatim. Same `compute_source_hash` function called twice +in the same Python process returns identical values; the +divergence is solely between "what was hashed and written into +the prompt" and "what the LLM emitted as its frontmatter copy." + +## Confirmed fix direction + +Strip frontmatter from `final_content` after polish and +re-inject the canonical frontmatter from +`entry.rendered_content`. The LLM polishes the BODY; the +frontmatter (especially `source_hash`, `generated_at`, +`feature`, `depth`, `name`) is non-negotiable deterministic +metadata that must survive the polish step exactly. + +**Sketch (in `apply_polish_results` around line 468):** + +```python +final_content = polished_by_depth.get(entry.depth, entry.rendered_content) +if entry.depth in polished_by_depth: + # The LLM may have perturbed the frontmatter — re-inject + # the canonical one from the rendered template. + final_content = _replace_frontmatter( + polished_body=final_content, + canonical_frontmatter=_extract_frontmatter(entry.rendered_content), + ) +``` + +Where `_extract_frontmatter` returns the `---\n...\n---\n` +prefix from `entry.rendered_content`, and `_replace_frontmatter` +strips whatever frontmatter the LLM produced and prepends the +canonical one. Both can use `_FRONTMATTER_RE` from +`staleness.py` (or a local equivalent). + +Even better long-term: send the LLM the body only (strip +frontmatter from its input context), have it return the body +only, and assemble the final document deterministically. Bigger +refactor but eliminates the "did the LLM accidentally edit +metadata" failure mode entirely. + ## Problem Running `attune-author regenerate` writes a new `source_hash` value to a diff --git a/src/attune_author/generator.py b/src/attune_author/generator.py index 280ecfd..6701242 100644 --- a/src/attune_author/generator.py +++ b/src/attune_author/generator.py @@ -15,6 +15,7 @@ import ast import logging import os +import re from concurrent.futures import ThreadPoolExecutor, as_completed from dataclasses import dataclass, field from datetime import datetime, timezone @@ -431,6 +432,57 @@ def prepare_polish_phase( ) +_FRONTMATTER_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL) +"""Matches a YAML frontmatter block at the start of a markdown +document, capturing the body between the ``---`` delimiters. +Includes the closing ``---\\n`` in the match so the body starts +at the next character after the match end.""" + + +def _replace_polished_frontmatter(polished: str, canonical_source: str) -> str: + """Strip LLM-emitted frontmatter and prepend the canonical one. + + The polish LLM is given the rendered template (with frontmatter) + as input context and asked to improve the body. Empirically, the + LLM also echoes the frontmatter in its output — sometimes with + single-character transcription errors in deterministic fields + like ``source_hash``. That broke staleness detection: the + frontmatter ``source_hash`` written into the polished file + didn't match what ``compute_source_hash`` recomputed on the same + source, leaving the feature permanently "stale" after a + successful regen. + + This helper enforces that the LLM polishes the BODY only. + Deterministic frontmatter (source_hash, generated_at, feature, + depth, name, status, type) is non-negotiable metadata — we + re-inject the canonical block from the rendered template + regardless of what the LLM emitted. + + If the polished content has no frontmatter (LLM stripped it), + we still prepend the canonical block. If the canonical source + has no frontmatter (unexpected, but handled), return the + polished content untouched. + + See ``docs/specs/regen-staleness-hash-mismatch/decisions.md`` + for the full diagnosis. + """ + canonical_match = _FRONTMATTER_RE.match(canonical_source) + if canonical_match is None: + # No canonical frontmatter to inject — return polished as-is. + # Shouldn't happen in practice; rendered templates always + # have frontmatter. + return polished + + canonical_block = canonical_match.group(0) + polished_match = _FRONTMATTER_RE.match(polished) + if polished_match is not None: + polished_body = polished[polished_match.end() :] + else: + polished_body = polished + + return canonical_block + polished_body + + def apply_polish_results( prep: PolishPreparation, polished_by_depth: dict[str, str], @@ -465,7 +517,21 @@ def apply_polish_results( project_root = Path.cwd() absolute_sources = [project_root / rel_path for rel_path in prep.matched_files] for entry in prep.pending: - final_content = polished_by_depth.get(entry.depth, entry.rendered_content) + if entry.depth in polished_by_depth: + # Polish ran: take the polished body but re-inject the + # canonical frontmatter. The LLM occasionally transcribes + # deterministic fields (notably source_hash) with single- + # character errors, which permanently breaks staleness + # detection. See _replace_polished_frontmatter docstring. + final_content = _replace_polished_frontmatter( + polished=polished_by_depth[entry.depth], + canonical_source=entry.rendered_content, + ) + else: + # Polish skipped (e.g. lenient-mode failure) — use the + # raw rendered template, which already has correct + # frontmatter. + final_content = entry.rendered_content # Phase 4: strip `# attune-author: skip-mypy` directives from # tutorial code fences so they don't ship to readers. Other # template kinds are untouched. diff --git a/tests/unit/test_polished_frontmatter_reinjection.py b/tests/unit/test_polished_frontmatter_reinjection.py new file mode 100644 index 0000000..0bd332e --- /dev/null +++ b/tests/unit/test_polished_frontmatter_reinjection.py @@ -0,0 +1,214 @@ +"""Regression tests for source_hash LLM-laundering fix. + +The polish LLM is given the rendered template (with frontmatter) +as input context and asked to polish the body. Empirically, the +LLM also echoes the frontmatter in its output — sometimes with +single-character transcription errors in deterministic fields +like ``source_hash``. This broke staleness detection: the +frontmatter ``source_hash`` written into the polished file +didn't match what ``compute_source_hash`` recomputed on the +same source, leaving the feature permanently "stale" after a +successful regen. + +The fix in ``generator.apply_polish_results`` strips whatever +frontmatter the LLM emitted and re-injects the canonical +frontmatter from ``entry.rendered_content``. + +See ``docs/specs/regen-staleness-hash-mismatch/decisions.md`` +for the full diagnosis. +""" + +from __future__ import annotations + +from pathlib import Path + +from attune_author.generator import ( + GenerationResult, + PolishPreparation, + _PendingPolish, + _replace_polished_frontmatter, + apply_polish_results, +) + +_CANONICAL_FRONTMATTER = ( + "---\n" + "type: concept\n" + "name: spec-engine-concept\n" + "feature: spec-engine\n" + "depth: concept\n" + "generated_at: 2026-05-27T02:19:54.313049+00:00\n" + "source_hash: f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337\n" + "status: generated\n" + "---\n" +) + +_RENDERED_BODY = "\n# Spec Engine\n\nThe spec engine is the runtime layer.\n" +_POLISHED_BODY = "\n# Spec Engine\n\nThe spec engine reads a plan and runs tasks.\n" + + +def _hash_field(text: str) -> str: + """Pull the ``source_hash`` value out of a frontmatter block.""" + for line in text.splitlines(): + if line.startswith("source_hash:"): + return line.split(":", 1)[1].strip() + raise AssertionError("no source_hash in text") + + +class TestReplacePolishedFrontmatter: + """Direct unit tests of ``_replace_polished_frontmatter``.""" + + def test_polished_with_perturbed_source_hash_is_corrected(self) -> None: + """The bug: LLM transcribed ``f7`` → ``f4`` at one position. + Fix must restore the canonical hash regardless of LLM output.""" + canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY + # LLM emitted frontmatter with a single-character corruption + # at position 19 of the hash (the actual bug we observed). + llm_frontmatter = _CANONICAL_FRONTMATTER.replace( + "f8ced22b02899aa25ff709", + "f8ced22b02899aa25ff409", # f7 → f4 + ) + polished = llm_frontmatter + _POLISHED_BODY + + result = _replace_polished_frontmatter(polished, canonical) + + # Canonical hash restored. + assert _hash_field(result) == _hash_field(_CANONICAL_FRONTMATTER) + # Polished body preserved. + assert _POLISHED_BODY.strip() in result + # Original body not retained. + assert "runtime layer" not in result + + def test_polished_with_missing_frontmatter_gets_canonical_prepended(self) -> None: + """LLM might drop the frontmatter entirely. We still prepend + the canonical block so the file isn't malformed.""" + canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY + polished_body_only = _POLISHED_BODY.lstrip("\n") + + result = _replace_polished_frontmatter(polished_body_only, canonical) + + assert result.startswith("---\n") + assert _hash_field(result) == _hash_field(_CANONICAL_FRONTMATTER) + assert "reads a plan" in result + + def test_polished_with_correct_frontmatter_unchanged_semantically(self) -> None: + """When the LLM echoes the frontmatter correctly, output is + functionally identical to the input (canonical block wins, + but the canonical block IS what the LLM emitted).""" + canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY + polished = _CANONICAL_FRONTMATTER + _POLISHED_BODY + + result = _replace_polished_frontmatter(polished, canonical) + + assert _hash_field(result) == _hash_field(_CANONICAL_FRONTMATTER) + assert "reads a plan" in result + + def test_canonical_without_frontmatter_returns_polished_unchanged(self) -> None: + """Defensive: if rendered template has no frontmatter + (shouldn't happen in practice), return polished untouched + rather than crash.""" + canonical = "# No frontmatter\n\nJust a body.\n" + polished = "# Polished\n\nDifferent body.\n" + + result = _replace_polished_frontmatter(polished, canonical) + + assert result == polished + + def test_extra_blank_lines_in_polished_body_preserved(self) -> None: + """Polish layer often returns extra newlines for readability. + Body whitespace should pass through untouched.""" + canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY + polished = _CANONICAL_FRONTMATTER + "\n\n# Heading\n\nParagraph.\n\n\n" + + result = _replace_polished_frontmatter(polished, canonical) + + assert result.endswith("Paragraph.\n\n\n") + + +class TestApplyPolishResultsReinjectsFrontmatter: + """Behavioral tests for the wiring in ``apply_polish_results``.""" + + def test_polished_template_keeps_canonical_source_hash( + self, tmp_path: Path, monkeypatch + ) -> None: + """End-to-end: a polished template with corrupted source_hash + in the LLM output gets the canonical hash on disk.""" + # Disable fact-check + faithfulness gates to keep this unit + # test self-contained (no network, no schema validation). + monkeypatch.setenv("ATTUNE_AUTHOR_FACT_CHECK", "off") + monkeypatch.setenv("ATTUNE_AUTHOR_FAITHFULNESS_GATE", "off") + monkeypatch.chdir(tmp_path) + + out_path = tmp_path / "concept.md" + rendered_content = _CANONICAL_FRONTMATTER + _RENDERED_BODY + # Polish "output" perturbs the hash by one character. + polished_content = ( + _CANONICAL_FRONTMATTER.replace( + "f8ced22b02899aa25ff709", + "f8ced22b02899aa25ff409", + ) + + _POLISHED_BODY + ) + + prep = PolishPreparation( + feature=type("F", (), {"name": "spec-engine"})(), + source_hash="f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337", + matched_files=[], + source_info=None, + pending=( + _PendingPolish( + depth="concept", + rendered_content=rendered_content, + out_path=out_path, + ), + ), + use_rag=False, + ) + + result = apply_polish_results(prep, {"concept": polished_content}) + + assert isinstance(result, GenerationResult) + written = out_path.read_text(encoding="utf-8") + # The corrupted ``f4`` hash from the LLM output must NOT + # have survived to disk. + assert "ff409" not in written + # The canonical ``f7`` hash from the rendered template is + # what got written. + assert _hash_field(written) == _hash_field(_CANONICAL_FRONTMATTER) + # The polished body did make it through. + assert "reads a plan" in written + + def test_no_polish_result_uses_rendered_content_directly( + self, tmp_path: Path, monkeypatch + ) -> None: + """If polish was skipped (lenient-mode failure), apply still + writes the rendered template with correct frontmatter.""" + monkeypatch.setenv("ATTUNE_AUTHOR_FACT_CHECK", "off") + monkeypatch.setenv("ATTUNE_AUTHOR_FAITHFULNESS_GATE", "off") + monkeypatch.chdir(tmp_path) + + out_path = tmp_path / "concept.md" + rendered_content = _CANONICAL_FRONTMATTER + _RENDERED_BODY + + prep = PolishPreparation( + feature=type("F", (), {"name": "spec-engine"})(), + source_hash="f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337", + matched_files=[], + source_info=None, + pending=( + _PendingPolish( + depth="concept", + rendered_content=rendered_content, + out_path=out_path, + ), + ), + use_rag=False, + ) + + # No entry for "concept" in the polished map. + result = apply_polish_results(prep, {}) + + assert isinstance(result, GenerationResult) + written = out_path.read_text(encoding="utf-8") + assert _hash_field(written) == _hash_field(_CANONICAL_FRONTMATTER) + # Original rendered body, not a polished one. + assert "runtime layer" in written From c1b6c38de0ba186106565983452f0eba50ed1b62 Mon Sep 17 00:00:00 2001 From: GeneAI Date: Wed, 27 May 2026 09:26:55 -0400 Subject: [PATCH 2/2] fixup: field-level frontmatter merge to preserve polish: skipped MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous whole-block replacement broke 3 golden snapshot tests that asserted on the `polish: skipped` frontmatter marker which the lenient-mode polish failure path adds via `_mark_polish_skipped`. My whole-block replace discarded that marker. Switching to field-level merge: - DETERMINISTIC fields (type, name, feature, depth, generated_at, source_hash, status): canonical from rendered template wins. - All OTHER fields (polish: skipped, future markers): polished output preserved as-emitted. Implementation: parse both frontmatter blocks line-by-line, walk polished's lines, swap deterministic-keyed lines with canonical's version, keep everything else. Append any deterministic canonical fields the polished output dropped entirely. Tests added: - test_polish_skipped_marker_preserved — regression on the lenient failure path. Exercises both the deterministic-field override AND the marker preservation in one assertion. - test_unknown_non_deterministic_field_preserved — forward-compat for future polish-layer fields. Local: 979/979 unit tests pass; 15/15 in this file's slice + 3/3 golden snapshots restored. End-to-end re-verified on attune-ai spec-engine: 11/11 regenerated templates have canonical source_hash matching compute_source_hash. Co-Authored-By: Claude Opus 4.7 --- src/attune_author/generator.py | 118 ++++++++++++++---- .../test_polished_frontmatter_reinjection.py | 58 +++++++++ 2 files changed, 154 insertions(+), 22 deletions(-) diff --git a/src/attune_author/generator.py b/src/attune_author/generator.py index 6701242..20263b8 100644 --- a/src/attune_author/generator.py +++ b/src/attune_author/generator.py @@ -439,8 +439,49 @@ def prepare_polish_phase( at the next character after the match end.""" +#: Frontmatter fields that are DETERMINISTIC — computed from +#: source and not for the LLM (or polish-layer) to mutate. These +#: come from the rendered template and override whatever the +#: polish output contains. +_DETERMINISTIC_FRONTMATTER_FIELDS = frozenset( + { + "type", + "name", + "feature", + "depth", + "generated_at", + "source_hash", + "status", + } +) + + +def _parse_frontmatter_lines(block: str) -> list[tuple[str, str]]: + """Parse a YAML frontmatter block into (key, line) pairs in order. + + The block is the captured group from ``_FRONTMATTER_RE``, + i.e. the YAML body without the ``---`` delimiters. Each line + is returned as the (key, whole-line) tuple. Lines that don't + match the ``key: ...`` shape (e.g. multi-line YAML values, or + structural lines) are returned with key ``""`` so the caller + can decide whether to include them. + """ + out: list[tuple[str, str]] = [] + for line in block.splitlines(): + stripped = line.lstrip() + if not stripped or stripped.startswith("#"): + out.append(("", line)) + continue + key, sep, _ = stripped.partition(":") + if sep and " " not in key and "\t" not in key: + out.append((key.strip(), line)) + else: + out.append(("", line)) + return out + + def _replace_polished_frontmatter(polished: str, canonical_source: str) -> str: - """Strip LLM-emitted frontmatter and prepend the canonical one. + """Re-inject deterministic frontmatter fields from canonical source. The polish LLM is given the rendered template (with frontmatter) as input context and asked to improve the body. Empirically, the @@ -448,39 +489,72 @@ def _replace_polished_frontmatter(polished: str, canonical_source: str) -> str: single-character transcription errors in deterministic fields like ``source_hash``. That broke staleness detection: the frontmatter ``source_hash`` written into the polished file - didn't match what ``compute_source_hash`` recomputed on the same - source, leaving the feature permanently "stale" after a + didn't match what ``compute_source_hash`` recomputed on the + same source, leaving the feature permanently "stale" after a successful regen. - This helper enforces that the LLM polishes the BODY only. - Deterministic frontmatter (source_hash, generated_at, feature, - depth, name, status, type) is non-negotiable metadata — we - re-inject the canonical block from the rendered template - regardless of what the LLM emitted. + Approach: field-level merge. For deterministic fields + (:data:`_DETERMINISTIC_FRONTMATTER_FIELDS`), the canonical + value from the rendered template wins. For any other field + (e.g. ``polish: skipped`` added by the lenient-mode polish + failure path in :func:`attune_author.polish._mark_polish_skipped`), + the polish output's value is preserved. - If the polished content has no frontmatter (LLM stripped it), - we still prepend the canonical block. If the canonical source - has no frontmatter (unexpected, but handled), return the - polished content untouched. + Edge cases: + + - Polished has no frontmatter (LLM stripped it): prepend the + canonical block as-is. + - Canonical has no frontmatter (shouldn't happen in practice + since rendered templates always have one): return polished + untouched. See ``docs/specs/regen-staleness-hash-mismatch/decisions.md`` for the full diagnosis. """ canonical_match = _FRONTMATTER_RE.match(canonical_source) if canonical_match is None: - # No canonical frontmatter to inject — return polished as-is. - # Shouldn't happen in practice; rendered templates always - # have frontmatter. + # Defensive: rendered templates always have frontmatter. return polished - canonical_block = canonical_match.group(0) polished_match = _FRONTMATTER_RE.match(polished) - if polished_match is not None: - polished_body = polished[polished_match.end() :] - else: - polished_body = polished - - return canonical_block + polished_body + if polished_match is None: + # LLM stripped the frontmatter entirely. Prepend canonical + # block and return. + return canonical_match.group(0) + polished + + canonical_lines = _parse_frontmatter_lines(canonical_match.group(1)) + polished_lines = _parse_frontmatter_lines(polished_match.group(1)) + + canonical_by_key: dict[str, str] = {k: line for k, line in canonical_lines if k} + + merged: list[str] = [] + seen_deterministic: set[str] = set() + for key, line in polished_lines: + if key in _DETERMINISTIC_FRONTMATTER_FIELDS: + # Override with canonical's line for this deterministic + # field. If canonical lacks the key (very unusual), + # drop the polished version too — better silence than + # propagating a possibly-perturbed value. + canonical_line = canonical_by_key.get(key) + if canonical_line is not None: + merged.append(canonical_line) + seen_deterministic.add(key) + else: + # Non-deterministic field (e.g. polish: skipped marker) + # OR a structural / comment line. Preserve as the polish + # layer emitted it. + merged.append(line) + + # Append any deterministic canonical fields the polish output + # was missing (e.g. LLM dropped a line entirely). Preserves the + # invariant that the canonical's deterministic fields are + # always present in the result. + for key, line in canonical_lines: + if key and key in _DETERMINISTIC_FRONTMATTER_FIELDS and key not in seen_deterministic: + merged.append(line) + + body = polished[polished_match.end() :] + return "---\n" + "\n".join(merged) + "\n---\n" + body def apply_polish_results( diff --git a/tests/unit/test_polished_frontmatter_reinjection.py b/tests/unit/test_polished_frontmatter_reinjection.py index 0bd332e..496968c 100644 --- a/tests/unit/test_polished_frontmatter_reinjection.py +++ b/tests/unit/test_polished_frontmatter_reinjection.py @@ -123,6 +123,64 @@ def test_extra_blank_lines_in_polished_body_preserved(self) -> None: assert result.endswith("Paragraph.\n\n\n") + def test_polish_skipped_marker_preserved(self) -> None: + """Lenient-mode polish failure adds a ``polish: skipped`` + marker to the frontmatter via + :func:`attune_author.polish._mark_polish_skipped`. The + re-inject must preserve that marker — it's the regression + signal downstream consumers (and snapshot tests) rely on + to detect raw-template output.""" + canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY + # Polish-skipped path adds `polish: skipped` as a non- + # deterministic frontmatter field. The deterministic + # fields can ALSO be perturbed (LLM transcription) but + # in the lenient-fallback path the polish layer just + # passes the rendered frontmatter through with the + # marker appended — so the only mutation is the new + # field. We still exercise both. + polished_frontmatter = ( + _CANONICAL_FRONTMATTER.replace( + "f8ced22b02899aa25ff709", + "f8ced22b02899aa25ff409", # f7 → f4 perturbation + ) + .rstrip("\n") + .rstrip("-") + .rstrip("\n") + .rstrip("-") + .rstrip("\n") + .rstrip("-") + ) + # Reconstruct with the marker before the closing --- + polished_frontmatter = _CANONICAL_FRONTMATTER.replace( + "f8ced22b02899aa25ff709", + "f8ced22b02899aa25ff409", + ).replace("status: generated\n---\n", "status: generated\npolish: skipped\n---\n") + polished = polished_frontmatter + _POLISHED_BODY + + result = _replace_polished_frontmatter(polished, canonical) + + # Deterministic field corrected from canonical. + assert _hash_field(result) == _hash_field(_CANONICAL_FRONTMATTER) + # Non-deterministic polish marker preserved. + assert "polish: skipped" in result + + def test_unknown_non_deterministic_field_preserved(self) -> None: + """Forward-compatibility: if the polish layer adds a NEW + non-deterministic field in the future (e.g. + ``polish_attempts: 3``), the re-inject must preserve it. + Only the closed set of DETERMINISTIC fields is overridden.""" + canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY + polished_frontmatter = _CANONICAL_FRONTMATTER.replace( + "status: generated\n---\n", + "status: generated\npolish_attempts: 3\nmodel: claude-sonnet-4-6\n---\n", + ) + polished = polished_frontmatter + _POLISHED_BODY + + result = _replace_polished_frontmatter(polished, canonical) + + assert "polish_attempts: 3" in result + assert "model: claude-sonnet-4-6" in result + class TestApplyPolishResultsReinjectsFrontmatter: """Behavioral tests for the wiring in ``apply_polish_results``."""