From 734c2ec7c4d674821ad551eef76406bd2dd52535 Mon Sep 17 00:00:00 2001
From: GeneAI <patrick.roebuck@smartAImemory.com>
Date: Wed, 27 May 2026 07:30:21 -0400
Subject: [PATCH 1/2] fix: prevent LLM polish from laundering source_hash
 frontmatter
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Root cause: `apply_polish_results` wrote the LLM's polished output
verbatim, including any frontmatter the LLM emitted. The LLM is
given the rendered template (with frontmatter) as input context;
it polishes the body but ALSO echoes the frontmatter — sometimes
with single-character transcription errors in deterministic fields
like `source_hash`. That broke staleness detection: the
frontmatter `source_hash` written into the polished file didn't
match what `compute_source_hash` recomputed on the same source,
leaving the feature permanently "stale" after a successful regen.

Concrete evidence (attune-ai spec-engine, 2026-05-27):

  frontmatter: f8ced22b02899aa25ff409636e659830c6ba856d70de6ddd1a9bf1cbe37a1337
  computed:    f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337
                                  ^
                                  position 19: LLM wrote f4 instead of f7

Single-character difference at one byte of a 64-char SHA-256 hex
digest. The same `compute_source_hash` function called twice in
the same Python process returns identical values — pure LLM
hallucination of the value it was supposed to echo verbatim.

Fix: strip whatever frontmatter the LLM emitted from the polished
content and re-inject the canonical frontmatter from
`entry.rendered_content`. The LLM polishes the BODY; the
frontmatter (especially `source_hash`, `generated_at`, `feature`,
`depth`, `name`, `status`, `type`) is non-negotiable deterministic
metadata that must survive the polish step exactly.

Implementation:

- New `_replace_polished_frontmatter(polished, canonical_source)`
  helper in `generator.py`. Uses a frontmatter regex to extract
  the canonical block, strip whatever the LLM emitted, and
  re-assemble.
- `apply_polish_results` now calls the helper for every depth
  with a polished result. Lenient-mode failures (depth missing
  from `polished_by_depth`) fall through to the raw rendered
  template, which already has correct frontmatter.

Tests: 7 new regression tests covering the corrupted-hash case,
LLM-stripped frontmatter, LLM-correct frontmatter, no-canonical-
frontmatter defensive path, and body-whitespace preservation —
plus two behavioral tests asserting `apply_polish_results` writes
the canonical hash to disk regardless of LLM perturbation.

Spec doc (`docs/specs/regen-staleness-hash-mismatch/decisions.md`)
updated with the verified root cause replacing the original
budget-truncation hypothesis (which was wrong — `compute_source_hash`
runs only once and is fully deterministic; the divergence lives
entirely in the polish step's output mutation).

Unblocks attune-gui Phase 2 (`living-docs-regen-automation`) which
needed `attune-author status --dry-run` to reach a fixed point
after regen.

Local: 173/173 unit tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../decisions.md                              |  79 ++++++-
 src/attune_author/generator.py                |  68 +++++-
 .../test_polished_frontmatter_reinjection.py  | 214 ++++++++++++++++++
 3 files changed, 359 insertions(+), 2 deletions(-)
 create mode 100644 tests/unit/test_polished_frontmatter_reinjection.py

diff --git a/docs/specs/regen-staleness-hash-mismatch/decisions.md b/docs/specs/regen-staleness-hash-mismatch/decisions.md
index 3514dd4..9cbde17 100644
--- a/docs/specs/regen-staleness-hash-mismatch/decisions.md
+++ b/docs/specs/regen-staleness-hash-mismatch/decisions.md
@@ -1,9 +1,86 @@
 # Decisions — Regen / staleness hash mismatch
 
-**Status:** draft — bug confirmed externally, fix not yet scoped.
+**Status:** root cause confirmed 2026-05-27 — original hypothesis (budget truncation of hash inputs) was wrong; actual cause is LLM-polished frontmatter laundering. Fix direction concrete. Implementation TBD.
 **Owner:** Patrick
 **Filed:** 2026-05-25 (handoff from attune-gui Phase 2 blockers; see [attune-gui docs/specs/living-docs-regen-automation/decisions.md](https://github.com/Smart-AI-Memory/attune-gui/blob/main/docs/specs/living-docs-regen-automation/decisions.md#phase-2-blockers-discovered-2026-05-23))
 
+## Root cause (verified 2026-05-27)
+
+The original hypothesis (budget-truncated hash input) was wrong.
+`compute_source_hash` is called exactly ONCE at
+`generator.py:355` (inside `prepare_polish_phase`) and produces
+a deterministic value off the FULL source set. Verified by
+calling it twice in a row — idempotent. The `source_hash`
+variable flows through to `_render_template` at line 1452
+which writes it into the rendered template's frontmatter
+correctly.
+
+**The actual bug is in `apply_polish_results` at
+`generator.py:468`:**
+
+```python
+final_content = polished_by_depth.get(entry.depth, entry.rendered_content)
+...
+entry.out_path.write_text(final_content, encoding="utf-8")
+```
+
+When LLM polish ran, the polished content REPLACES the rendered
+template **including the frontmatter the LLM regenerated as part
+of its output**. The LLM is given the rendered template (with
+correct frontmatter) as input context, polishes the body, and
+returns the whole document — but its emitted frontmatter has a
+single-character transcription error in the `source_hash` field.
+
+**Reproducible evidence (attune-ai spec-engine, 2026-05-27):**
+
+```
+frontmatter source_hash: f8ced22b02899aa25ff409636e659830c6ba856d70de6ddd1a9bf1cbe37a1337
+computed source_hash:    f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337
+                                            ^
+                                            position 19: f4 vs f7
+```
+
+Single-char difference at byte 19 of a 64-char SHA-256 hex
+digest. Pure LLM hallucination of the value it was supposed to
+echo verbatim. Same `compute_source_hash` function called twice
+in the same Python process returns identical values; the
+divergence is solely between "what was hashed and written into
+the prompt" and "what the LLM emitted as its frontmatter copy."
+
+## Confirmed fix direction
+
+Strip frontmatter from `final_content` after polish and
+re-inject the canonical frontmatter from
+`entry.rendered_content`. The LLM polishes the BODY; the
+frontmatter (especially `source_hash`, `generated_at`,
+`feature`, `depth`, `name`) is non-negotiable deterministic
+metadata that must survive the polish step exactly.
+
+**Sketch (in `apply_polish_results` around line 468):**
+
+```python
+final_content = polished_by_depth.get(entry.depth, entry.rendered_content)
+if entry.depth in polished_by_depth:
+    # The LLM may have perturbed the frontmatter — re-inject
+    # the canonical one from the rendered template.
+    final_content = _replace_frontmatter(
+        polished_body=final_content,
+        canonical_frontmatter=_extract_frontmatter(entry.rendered_content),
+    )
+```
+
+Where `_extract_frontmatter` returns the `---\n...\n---\n`
+prefix from `entry.rendered_content`, and `_replace_frontmatter`
+strips whatever frontmatter the LLM produced and prepends the
+canonical one. Both can use `_FRONTMATTER_RE` from
+`staleness.py` (or a local equivalent).
+
+Even better long-term: send the LLM the body only (strip
+frontmatter from its input context), have it return the body
+only, and assemble the final document deterministically. Bigger
+refactor but eliminates the "did the LLM accidentally edit
+metadata" failure mode entirely.
+
 ## Problem
 
 Running `attune-author regenerate` writes a new `source_hash` value to a
diff --git a/src/attune_author/generator.py b/src/attune_author/generator.py
index 280ecfd..6701242 100644
--- a/src/attune_author/generator.py
+++ b/src/attune_author/generator.py
@@ -15,6 +15,7 @@
 import ast
 import logging
 import os
+import re
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from dataclasses import dataclass, field
 from datetime import datetime, timezone
@@ -431,6 +432,57 @@ def prepare_polish_phase(
     )
 
 
+_FRONTMATTER_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL)
+"""Matches a YAML frontmatter block at the start of a markdown
+document, capturing the body between the ``---`` delimiters.
+Includes the closing ``---\\n`` in the match so the body starts
+at the next character after the match end."""
+
+
+def _replace_polished_frontmatter(polished: str, canonical_source: str) -> str:
+    """Strip LLM-emitted frontmatter and prepend the canonical one.
+
+    The polish LLM is given the rendered template (with frontmatter)
+    as input context and asked to improve the body. Empirically, the
+    LLM also echoes the frontmatter in its output — sometimes with
+    single-character transcription errors in deterministic fields
+    like ``source_hash``. That broke staleness detection: the
+    frontmatter ``source_hash`` written into the polished file
+    didn't match what ``compute_source_hash`` recomputed on the same
+    source, leaving the feature permanently "stale" after a
+    successful regen.
+
+    This helper enforces that the LLM polishes the BODY only.
+    Deterministic frontmatter (source_hash, generated_at, feature,
+    depth, name, status, type) is non-negotiable metadata — we
+    re-inject the canonical block from the rendered template
+    regardless of what the LLM emitted.
+
+    If the polished content has no frontmatter (LLM stripped it),
+    we still prepend the canonical block. If the canonical source
+    has no frontmatter (unexpected, but handled), return the
+    polished content untouched.
+
+    See ``docs/specs/regen-staleness-hash-mismatch/decisions.md``
+    for the full diagnosis.
+    """
+    canonical_match = _FRONTMATTER_RE.match(canonical_source)
+    if canonical_match is None:
+        # No canonical frontmatter to inject — return polished as-is.
+        # Shouldn't happen in practice; rendered templates always
+        # have frontmatter.
+        return polished
+
+    canonical_block = canonical_match.group(0)
+    polished_match = _FRONTMATTER_RE.match(polished)
+    if polished_match is not None:
+        polished_body = polished[polished_match.end() :]
+    else:
+        polished_body = polished
+
+    return canonical_block + polished_body
+
+
 def apply_polish_results(
     prep: PolishPreparation,
     polished_by_depth: dict[str, str],
@@ -465,7 +517,21 @@ def apply_polish_results(
     project_root = Path.cwd()
     absolute_sources = [project_root / rel_path for rel_path in prep.matched_files]
     for entry in prep.pending:
-        final_content = polished_by_depth.get(entry.depth, entry.rendered_content)
+        if entry.depth in polished_by_depth:
+            # Polish ran: take the polished body but re-inject the
+            # canonical frontmatter. The LLM occasionally transcribes
+            # deterministic fields (notably source_hash) with single-
+            # character errors, which permanently breaks staleness
+            # detection. See _replace_polished_frontmatter docstring.
+            final_content = _replace_polished_frontmatter(
+                polished=polished_by_depth[entry.depth],
+                canonical_source=entry.rendered_content,
+            )
+        else:
+            # Polish skipped (e.g. lenient-mode failure) — use the
+            # raw rendered template, which already has correct
+            # frontmatter.
+            final_content = entry.rendered_content
         # Phase 4: strip `# attune-author: skip-mypy` directives from
         # tutorial code fences so they don't ship to readers. Other
         # template kinds are untouched.
diff --git a/tests/unit/test_polished_frontmatter_reinjection.py b/tests/unit/test_polished_frontmatter_reinjection.py
new file mode 100644
index 0000000..0bd332e
--- /dev/null
+++ b/tests/unit/test_polished_frontmatter_reinjection.py
@@ -0,0 +1,214 @@
+"""Regression tests for source_hash LLM-laundering fix.
+
+The polish LLM is given the rendered template (with frontmatter)
+as input context and asked to polish the body. Empirically, the
+LLM also echoes the frontmatter in its output — sometimes with
+single-character transcription errors in deterministic fields
+like ``source_hash``. This broke staleness detection: the
+frontmatter ``source_hash`` written into the polished file
+didn't match what ``compute_source_hash`` recomputed on the
+same source, leaving the feature permanently "stale" after a
+successful regen.
+
+The fix in ``generator.apply_polish_results`` strips whatever
+frontmatter the LLM emitted and re-injects the canonical
+frontmatter from ``entry.rendered_content``.
+
+See ``docs/specs/regen-staleness-hash-mismatch/decisions.md``
+for the full diagnosis.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from attune_author.generator import (
+    GenerationResult,
+    PolishPreparation,
+    _PendingPolish,
+    _replace_polished_frontmatter,
+    apply_polish_results,
+)
+
+_CANONICAL_FRONTMATTER = (
+    "---\n"
+    "type: concept\n"
+    "name: spec-engine-concept\n"
+    "feature: spec-engine\n"
+    "depth: concept\n"
+    "generated_at: 2026-05-27T02:19:54.313049+00:00\n"
+    "source_hash: f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337\n"
+    "status: generated\n"
+    "---\n"
+)
+
+_RENDERED_BODY = "\n# Spec Engine\n\nThe spec engine is the runtime layer.\n"
+_POLISHED_BODY = "\n# Spec Engine\n\nThe spec engine reads a plan and runs tasks.\n"
+
+
+def _hash_field(text: str) -> str:
+    """Pull the ``source_hash`` value out of a frontmatter block."""
+    for line in text.splitlines():
+        if line.startswith("source_hash:"):
+            return line.split(":", 1)[1].strip()
+    raise AssertionError("no source_hash in text")
+
+
+class TestReplacePolishedFrontmatter:
+    """Direct unit tests of ``_replace_polished_frontmatter``."""
+
+    def test_polished_with_perturbed_source_hash_is_corrected(self) -> None:
+        """The bug: LLM transcribed ``f7`` → ``f4`` at one position.
+        Fix must restore the canonical hash regardless of LLM output."""
+        canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY
+        # LLM emitted frontmatter with a single-character corruption
+        # at position 19 of the hash (the actual bug we observed).
+        llm_frontmatter = _CANONICAL_FRONTMATTER.replace(
+            "f8ced22b02899aa25ff709",
+            "f8ced22b02899aa25ff409",  # f7 → f4
+        )
+        polished = llm_frontmatter + _POLISHED_BODY
+
+        result = _replace_polished_frontmatter(polished, canonical)
+
+        # Canonical hash restored.
+        assert _hash_field(result) == _hash_field(_CANONICAL_FRONTMATTER)
+        # Polished body preserved.
+        assert _POLISHED_BODY.strip() in result
+        # Original body not retained.
+        assert "runtime layer" not in result
+
+    def test_polished_with_missing_frontmatter_gets_canonical_prepended(self) -> None:
+        """LLM might drop the frontmatter entirely. We still prepend
+        the canonical block so the file isn't malformed."""
+        canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY
+        polished_body_only = _POLISHED_BODY.lstrip("\n")
+
+        result = _replace_polished_frontmatter(polished_body_only, canonical)
+
+        assert result.startswith("---\n")
+        assert _hash_field(result) == _hash_field(_CANONICAL_FRONTMATTER)
+        assert "reads a plan" in result
+
+    def test_polished_with_correct_frontmatter_unchanged_semantically(self) -> None:
+        """When the LLM echoes the frontmatter correctly, output is
+        functionally identical to the input (canonical block wins,
+        but the canonical block IS what the LLM emitted)."""
+        canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY
+        polished = _CANONICAL_FRONTMATTER + _POLISHED_BODY
+
+        result = _replace_polished_frontmatter(polished, canonical)
+
+        assert _hash_field(result) == _hash_field(_CANONICAL_FRONTMATTER)
+        assert "reads a plan" in result
+
+    def test_canonical_without_frontmatter_returns_polished_unchanged(self) -> None:
+        """Defensive: if rendered template has no frontmatter
+        (shouldn't happen in practice), return polished untouched
+        rather than crash."""
+        canonical = "# No frontmatter\n\nJust a body.\n"
+        polished = "# Polished\n\nDifferent body.\n"
+
+        result = _replace_polished_frontmatter(polished, canonical)
+
+        assert result == polished
+
+    def test_extra_blank_lines_in_polished_body_preserved(self) -> None:
+        """Polish layer often returns extra newlines for readability.
+        Body whitespace should pass through untouched."""
+        canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY
+        polished = _CANONICAL_FRONTMATTER + "\n\n# Heading\n\nParagraph.\n\n\n"
+
+        result = _replace_polished_frontmatter(polished, canonical)
+
+        assert result.endswith("Paragraph.\n\n\n")
+
+
+class TestApplyPolishResultsReinjectsFrontmatter:
+    """Behavioral tests for the wiring in ``apply_polish_results``."""
+
+    def test_polished_template_keeps_canonical_source_hash(
+        self, tmp_path: Path, monkeypatch
+    ) -> None:
+        """End-to-end: a polished template with corrupted source_hash
+        in the LLM output gets the canonical hash on disk."""
+        # Disable fact-check + faithfulness gates to keep this unit
+        # test self-contained (no network, no schema validation).
+        monkeypatch.setenv("ATTUNE_AUTHOR_FACT_CHECK", "off")
+        monkeypatch.setenv("ATTUNE_AUTHOR_FAITHFULNESS_GATE", "off")
+        monkeypatch.chdir(tmp_path)
+
+        out_path = tmp_path / "concept.md"
+        rendered_content = _CANONICAL_FRONTMATTER + _RENDERED_BODY
+        # Polish "output" perturbs the hash by one character.
+        polished_content = (
+            _CANONICAL_FRONTMATTER.replace(
+                "f8ced22b02899aa25ff709",
+                "f8ced22b02899aa25ff409",
+            )
+            + _POLISHED_BODY
+        )
+
+        prep = PolishPreparation(
+            feature=type("F", (), {"name": "spec-engine"})(),
+            source_hash="f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337",
+            matched_files=[],
+            source_info=None,
+            pending=(
+                _PendingPolish(
+                    depth="concept",
+                    rendered_content=rendered_content,
+                    out_path=out_path,
+                ),
+            ),
+            use_rag=False,
+        )
+
+        result = apply_polish_results(prep, {"concept": polished_content})
+
+        assert isinstance(result, GenerationResult)
+        written = out_path.read_text(encoding="utf-8")
+        # The corrupted ``f4`` hash from the LLM output must NOT
+        # have survived to disk.
+        assert "ff409" not in written
+        # The canonical ``f7`` hash from the rendered template is
+        # what got written.
+        assert _hash_field(written) == _hash_field(_CANONICAL_FRONTMATTER)
+        # The polished body did make it through.
+        assert "reads a plan" in written
+
+    def test_no_polish_result_uses_rendered_content_directly(
+        self, tmp_path: Path, monkeypatch
+    ) -> None:
+        """If polish was skipped (lenient-mode failure), apply still
+        writes the rendered template with correct frontmatter."""
+        monkeypatch.setenv("ATTUNE_AUTHOR_FACT_CHECK", "off")
+        monkeypatch.setenv("ATTUNE_AUTHOR_FAITHFULNESS_GATE", "off")
+        monkeypatch.chdir(tmp_path)
+
+        out_path = tmp_path / "concept.md"
+        rendered_content = _CANONICAL_FRONTMATTER + _RENDERED_BODY
+
+        prep = PolishPreparation(
+            feature=type("F", (), {"name": "spec-engine"})(),
+            source_hash="f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337",
+            matched_files=[],
+            source_info=None,
+            pending=(
+                _PendingPolish(
+                    depth="concept",
+                    rendered_content=rendered_content,
+                    out_path=out_path,
+                ),
+            ),
+            use_rag=False,
+        )
+
+        # No entry for "concept" in the polished map.
+        result = apply_polish_results(prep, {})
+
+        assert isinstance(result, GenerationResult)
+        written = out_path.read_text(encoding="utf-8")
+        assert _hash_field(written) == _hash_field(_CANONICAL_FRONTMATTER)
+        # Original rendered body, not a polished one.
+        assert "runtime layer" in written

From c1b6c38de0ba186106565983452f0eba50ed1b62 Mon Sep 17 00:00:00 2001
From: GeneAI <patrick.roebuck@smartAImemory.com>
Date: Wed, 27 May 2026 09:26:55 -0400
Subject: [PATCH 2/2] fixup: field-level frontmatter merge to preserve polish:
 skipped
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The previous whole-block replacement broke 3 golden snapshot tests
that asserted on the `polish: skipped` frontmatter marker which
the lenient-mode polish failure path adds via
`_mark_polish_skipped`. My whole-block replace discarded that
marker.

Switching to field-level merge:
- DETERMINISTIC fields (type, name, feature, depth, generated_at,
  source_hash, status): canonical from rendered template wins.
- All OTHER fields (polish: skipped, future markers): polished
  output preserved as-emitted.

Implementation: parse both frontmatter blocks line-by-line, walk
polished's lines, swap deterministic-keyed lines with canonical's
version, keep everything else. Append any deterministic canonical
fields the polished output dropped entirely.

Tests added:
- test_polish_skipped_marker_preserved — regression on the lenient
  failure path. Exercises both the deterministic-field override
  AND the marker preservation in one assertion.
- test_unknown_non_deterministic_field_preserved — forward-compat
  for future polish-layer fields.

Local: 979/979 unit tests pass; 15/15 in this file's slice +
3/3 golden snapshots restored. End-to-end re-verified on attune-ai
spec-engine: 11/11 regenerated templates have canonical
source_hash matching compute_source_hash.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 src/attune_author/generator.py                | 118 ++++++++++++++----
 .../test_polished_frontmatter_reinjection.py  |  58 +++++++++
 2 files changed, 154 insertions(+), 22 deletions(-)

diff --git a/src/attune_author/generator.py b/src/attune_author/generator.py
index 6701242..20263b8 100644
--- a/src/attune_author/generator.py
+++ b/src/attune_author/generator.py
@@ -439,8 +439,49 @@ def prepare_polish_phase(
 at the next character after the match end."""
 
 
+#: Frontmatter fields that are DETERMINISTIC — computed from
+#: source and not for the LLM (or polish-layer) to mutate. These
+#: come from the rendered template and override whatever the
+#: polish output contains.
+_DETERMINISTIC_FRONTMATTER_FIELDS = frozenset(
+    {
+        "type",
+        "name",
+        "feature",
+        "depth",
+        "generated_at",
+        "source_hash",
+        "status",
+    }
+)
+
+
+def _parse_frontmatter_lines(block: str) -> list[tuple[str, str]]:
+    """Parse a YAML frontmatter block into (key, line) pairs in order.
+
+    The block is the captured group from ``_FRONTMATTER_RE``,
+    i.e. the YAML body without the ``---`` delimiters. Each line
+    is returned as the (key, whole-line) tuple. Lines that don't
+    match the ``key: ...`` shape (e.g. multi-line YAML values, or
+    structural lines) are returned with key ``""`` so the caller
+    can decide whether to include them.
+    """
+    out: list[tuple[str, str]] = []
+    for line in block.splitlines():
+        stripped = line.lstrip()
+        if not stripped or stripped.startswith("#"):
+            out.append(("", line))
+            continue
+        key, sep, _ = stripped.partition(":")
+        if sep and " " not in key and "\t" not in key:
+            out.append((key.strip(), line))
+        else:
+            out.append(("", line))
+    return out
+
+
 def _replace_polished_frontmatter(polished: str, canonical_source: str) -> str:
-    """Strip LLM-emitted frontmatter and prepend the canonical one.
+    """Re-inject deterministic frontmatter fields from canonical source.
 
     The polish LLM is given the rendered template (with frontmatter)
     as input context and asked to improve the body. Empirically, the
@@ -448,39 +489,72 @@ def _replace_polished_frontmatter(polished: str, canonical_source: str) -> str:
     single-character transcription errors in deterministic fields
     like ``source_hash``. That broke staleness detection: the
     frontmatter ``source_hash`` written into the polished file
-    didn't match what ``compute_source_hash`` recomputed on the same
-    source, leaving the feature permanently "stale" after a
+    didn't match what ``compute_source_hash`` recomputed on the
+    same source, leaving the feature permanently "stale" after a
     successful regen.
 
-    This helper enforces that the LLM polishes the BODY only.
-    Deterministic frontmatter (source_hash, generated_at, feature,
-    depth, name, status, type) is non-negotiable metadata — we
-    re-inject the canonical block from the rendered template
-    regardless of what the LLM emitted.
+    Approach: field-level merge. For deterministic fields
+    (:data:`_DETERMINISTIC_FRONTMATTER_FIELDS`), the canonical
+    value from the rendered template wins. For any other field
+    (e.g. ``polish: skipped`` added by the lenient-mode polish
+    failure path in :func:`attune_author.polish._mark_polish_skipped`),
+    the polish output's value is preserved.
 
-    If the polished content has no frontmatter (LLM stripped it),
-    we still prepend the canonical block. If the canonical source
-    has no frontmatter (unexpected, but handled), return the
-    polished content untouched.
+    Edge cases:
+
+    - Polished has no frontmatter (LLM stripped it): prepend the
+      canonical block as-is.
+    - Canonical has no frontmatter (shouldn't happen in practice
+      since rendered templates always have one): return polished
+      untouched.
 
     See ``docs/specs/regen-staleness-hash-mismatch/decisions.md``
     for the full diagnosis.
     """
     canonical_match = _FRONTMATTER_RE.match(canonical_source)
     if canonical_match is None:
-        # No canonical frontmatter to inject — return polished as-is.
-        # Shouldn't happen in practice; rendered templates always
-        # have frontmatter.
+        # Defensive: rendered templates always have frontmatter.
         return polished
 
-    canonical_block = canonical_match.group(0)
     polished_match = _FRONTMATTER_RE.match(polished)
-    if polished_match is not None:
-        polished_body = polished[polished_match.end() :]
-    else:
-        polished_body = polished
-
-    return canonical_block + polished_body
+    if polished_match is None:
+        # LLM stripped the frontmatter entirely. Prepend canonical
+        # block and return.
+        return canonical_match.group(0) + polished
+
+    canonical_lines = _parse_frontmatter_lines(canonical_match.group(1))
+    polished_lines = _parse_frontmatter_lines(polished_match.group(1))
+
+    canonical_by_key: dict[str, str] = {k: line for k, line in canonical_lines if k}
+
+    merged: list[str] = []
+    seen_deterministic: set[str] = set()
+    for key, line in polished_lines:
+        if key in _DETERMINISTIC_FRONTMATTER_FIELDS:
+            # Override with canonical's line for this deterministic
+            # field. If canonical lacks the key (very unusual),
+            # drop the polished version too — better silence than
+            # propagating a possibly-perturbed value.
+            canonical_line = canonical_by_key.get(key)
+            if canonical_line is not None:
+                merged.append(canonical_line)
+                seen_deterministic.add(key)
+        else:
+            # Non-deterministic field (e.g. polish: skipped marker)
+            # OR a structural / comment line. Preserve as the polish
+            # layer emitted it.
+            merged.append(line)
+
+    # Append any deterministic canonical fields the polish output
+    # was missing (e.g. LLM dropped a line entirely). Preserves the
+    # invariant that the canonical's deterministic fields are
+    # always present in the result.
+    for key, line in canonical_lines:
+        if key and key in _DETERMINISTIC_FRONTMATTER_FIELDS and key not in seen_deterministic:
+            merged.append(line)
+
+    body = polished[polished_match.end() :]
+    return "---\n" + "\n".join(merged) + "\n---\n" + body
 
 
 def apply_polish_results(
diff --git a/tests/unit/test_polished_frontmatter_reinjection.py b/tests/unit/test_polished_frontmatter_reinjection.py
index 0bd332e..496968c 100644
--- a/tests/unit/test_polished_frontmatter_reinjection.py
+++ b/tests/unit/test_polished_frontmatter_reinjection.py
@@ -123,6 +123,64 @@ def test_extra_blank_lines_in_polished_body_preserved(self) -> None:
 
         assert result.endswith("Paragraph.\n\n\n")
 
+    def test_polish_skipped_marker_preserved(self) -> None:
+        """Lenient-mode polish failure adds a ``polish: skipped``
+        marker to the frontmatter via
+        :func:`attune_author.polish._mark_polish_skipped`. The
+        re-inject must preserve that marker — it's the regression
+        signal downstream consumers (and snapshot tests) rely on
+        to detect raw-template output."""
+        canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY
+        # Polish-skipped path adds `polish: skipped` as a non-
+        # deterministic frontmatter field. The deterministic
+        # fields can ALSO be perturbed (LLM transcription) but
+        # in the lenient-fallback path the polish layer just
+        # passes the rendered frontmatter through with the
+        # marker appended — so the only mutation is the new
+        # field. We still exercise both.
+        polished_frontmatter = (
+            _CANONICAL_FRONTMATTER.replace(
+                "f8ced22b02899aa25ff709",
+                "f8ced22b02899aa25ff409",  # f7 → f4 perturbation
+            )
+            .rstrip("\n")
+            .rstrip("-")
+            .rstrip("\n")
+            .rstrip("-")
+            .rstrip("\n")
+            .rstrip("-")
+        )
+        # Reconstruct with the marker before the closing ---
+        polished_frontmatter = _CANONICAL_FRONTMATTER.replace(
+            "f8ced22b02899aa25ff709",
+            "f8ced22b02899aa25ff409",
+        ).replace("status: generated\n---\n", "status: generated\npolish: skipped\n---\n")
+        polished = polished_frontmatter + _POLISHED_BODY
+
+        result = _replace_polished_frontmatter(polished, canonical)
+
+        # Deterministic field corrected from canonical.
+        assert _hash_field(result) == _hash_field(_CANONICAL_FRONTMATTER)
+        # Non-deterministic polish marker preserved.
+        assert "polish: skipped" in result
+
+    def test_unknown_non_deterministic_field_preserved(self) -> None:
+        """Forward-compatibility: if the polish layer adds a NEW
+        non-deterministic field in the future (e.g.
+        ``polish_attempts: 3``), the re-inject must preserve it.
+        Only the closed set of DETERMINISTIC fields is overridden."""
+        canonical = _CANONICAL_FRONTMATTER + _RENDERED_BODY
+        polished_frontmatter = _CANONICAL_FRONTMATTER.replace(
+            "status: generated\n---\n",
+            "status: generated\npolish_attempts: 3\nmodel: claude-sonnet-4-6\n---\n",
+        )
+        polished = polished_frontmatter + _POLISHED_BODY
+
+        result = _replace_polished_frontmatter(polished, canonical)
+
+        assert "polish_attempts: 3" in result
+        assert "model: claude-sonnet-4-6" in result
+
 
 class TestApplyPolishResultsReinjectsFrontmatter:
     """Behavioral tests for the wiring in ``apply_polish_results``."""