Skip to content

Commit 1b1c7c5

Browse files
fix: prevent LLM polish from laundering source_hash frontmatter (#48)
* fix: prevent LLM polish from laundering source_hash frontmatter Root cause: `apply_polish_results` wrote the LLM's polished output verbatim, including any frontmatter the LLM emitted. The LLM is given the rendered template (with frontmatter) as input context; it polishes the body but ALSO echoes the frontmatter — sometimes with single-character transcription errors in deterministic fields like `source_hash`. That broke staleness detection: the frontmatter `source_hash` written into the polished file didn't match what `compute_source_hash` recomputed on the same source, leaving the feature permanently "stale" after a successful regen. Concrete evidence (attune-ai spec-engine, 2026-05-27): frontmatter: f8ced22b02899aa25ff409636e659830c6ba856d70de6ddd1a9bf1cbe37a1337 computed: f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337 ^ position 19: LLM wrote f4 instead of f7 Single-character difference at one byte of a 64-char SHA-256 hex digest. The same `compute_source_hash` function called twice in the same Python process returns identical values — pure LLM hallucination of the value it was supposed to echo verbatim. Fix: strip whatever frontmatter the LLM emitted from the polished content and re-inject the canonical frontmatter from `entry.rendered_content`. The LLM polishes the BODY; the frontmatter (especially `source_hash`, `generated_at`, `feature`, `depth`, `name`, `status`, `type`) is non-negotiable deterministic metadata that must survive the polish step exactly. Implementation: - New `_replace_polished_frontmatter(polished, canonical_source)` helper in `generator.py`. Uses a frontmatter regex to extract the canonical block, strip whatever the LLM emitted, and re-assemble. - `apply_polish_results` now calls the helper for every depth with a polished result. Lenient-mode failures (depth missing from `polished_by_depth`) fall through to the raw rendered template, which already has correct frontmatter. Tests: 7 new regression tests covering the corrupted-hash case, LLM-stripped frontmatter, LLM-correct frontmatter, no-canonical- frontmatter defensive path, and body-whitespace preservation — plus two behavioral tests asserting `apply_polish_results` writes the canonical hash to disk regardless of LLM perturbation. Spec doc (`docs/specs/regen-staleness-hash-mismatch/decisions.md`) updated with the verified root cause replacing the original budget-truncation hypothesis (which was wrong — `compute_source_hash` runs only once and is fully deterministic; the divergence lives entirely in the polish step's output mutation). Unblocks attune-gui Phase 2 (`living-docs-regen-automation`) which needed `attune-author status --dry-run` to reach a fixed point after regen. Local: 173/173 unit tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fixup: field-level frontmatter merge to preserve polish: skipped The previous whole-block replacement broke 3 golden snapshot tests that asserted on the `polish: skipped` frontmatter marker which the lenient-mode polish failure path adds via `_mark_polish_skipped`. My whole-block replace discarded that marker. Switching to field-level merge: - DETERMINISTIC fields (type, name, feature, depth, generated_at, source_hash, status): canonical from rendered template wins. - All OTHER fields (polish: skipped, future markers): polished output preserved as-emitted. Implementation: parse both frontmatter blocks line-by-line, walk polished's lines, swap deterministic-keyed lines with canonical's version, keep everything else. Append any deterministic canonical fields the polished output dropped entirely. Tests added: - test_polish_skipped_marker_preserved — regression on the lenient failure path. Exercises both the deterministic-field override AND the marker preservation in one assertion. - test_unknown_non_deterministic_field_preserved — forward-compat for future polish-layer fields. Local: 979/979 unit tests pass; 15/15 in this file's slice + 3/3 golden snapshots restored. End-to-end re-verified on attune-ai spec-engine: 11/11 regenerated templates have canonical source_hash matching compute_source_hash. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 1cc27f8 commit 1b1c7c5

3 files changed

Lines changed: 491 additions & 2 deletions

File tree

docs/specs/regen-staleness-hash-mismatch/decisions.md

Lines changed: 78 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,86 @@
11
# Decisions — Regen / staleness hash mismatch
22

3-
**Status:** draft — bug confirmed externally, fix not yet scoped.
3+
**Status:** root cause confirmed 2026-05-27 — original hypothesis (budget truncation of hash inputs) was wrong; actual cause is LLM-polished frontmatter laundering. Fix direction concrete. Implementation TBD.
44
**Owner:** Patrick
55
**Filed:** 2026-05-25 (handoff from attune-gui Phase 2 blockers; see [attune-gui docs/specs/living-docs-regen-automation/decisions.md](https://github.com/Smart-AI-Memory/attune-gui/blob/main/docs/specs/living-docs-regen-automation/decisions.md#phase-2-blockers-discovered-2026-05-23))
66

7+
## Root cause (verified 2026-05-27)
8+
9+
The original hypothesis (budget-truncated hash input) was wrong.
10+
`compute_source_hash` is called exactly ONCE at
11+
`generator.py:355` (inside `prepare_polish_phase`) and produces
12+
a deterministic value off the FULL source set. Verified by
13+
calling it twice in a row — idempotent. The `source_hash`
14+
variable flows through to `_render_template` at line 1452
15+
which writes it into the rendered template's frontmatter
16+
correctly.
17+
18+
**The actual bug is in `apply_polish_results` at
19+
`generator.py:468`:**
20+
21+
```python
22+
final_content = polished_by_depth.get(entry.depth, entry.rendered_content)
23+
...
24+
entry.out_path.write_text(final_content, encoding="utf-8")
25+
```
26+
27+
When LLM polish ran, the polished content REPLACES the rendered
28+
template **including the frontmatter the LLM regenerated as part
29+
of its output**. The LLM is given the rendered template (with
30+
correct frontmatter) as input context, polishes the body, and
31+
returns the whole document — but its emitted frontmatter has a
32+
single-character transcription error in the `source_hash` field.
33+
34+
**Reproducible evidence (attune-ai spec-engine, 2026-05-27):**
35+
36+
```
37+
frontmatter source_hash: f8ced22b02899aa25ff409636e659830c6ba856d70de6ddd1a9bf1cbe37a1337
38+
computed source_hash: f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337
39+
^
40+
position 19: f4 vs f7
41+
```
42+
43+
Single-char difference at byte 19 of a 64-char SHA-256 hex
44+
digest. Pure LLM hallucination of the value it was supposed to
45+
echo verbatim. Same `compute_source_hash` function called twice
46+
in the same Python process returns identical values; the
47+
divergence is solely between "what was hashed and written into
48+
the prompt" and "what the LLM emitted as its frontmatter copy."
49+
50+
## Confirmed fix direction
51+
52+
Strip frontmatter from `final_content` after polish and
53+
re-inject the canonical frontmatter from
54+
`entry.rendered_content`. The LLM polishes the BODY; the
55+
frontmatter (especially `source_hash`, `generated_at`,
56+
`feature`, `depth`, `name`) is non-negotiable deterministic
57+
metadata that must survive the polish step exactly.
58+
59+
**Sketch (in `apply_polish_results` around line 468):**
60+
61+
```python
62+
final_content = polished_by_depth.get(entry.depth, entry.rendered_content)
63+
if entry.depth in polished_by_depth:
64+
# The LLM may have perturbed the frontmatter — re-inject
65+
# the canonical one from the rendered template.
66+
final_content = _replace_frontmatter(
67+
polished_body=final_content,
68+
canonical_frontmatter=_extract_frontmatter(entry.rendered_content),
69+
)
70+
```
71+
72+
Where `_extract_frontmatter` returns the `---\n...\n---\n`
73+
prefix from `entry.rendered_content`, and `_replace_frontmatter`
74+
strips whatever frontmatter the LLM produced and prepends the
75+
canonical one. Both can use `_FRONTMATTER_RE` from
76+
`staleness.py` (or a local equivalent).
77+
78+
Even better long-term: send the LLM the body only (strip
79+
frontmatter from its input context), have it return the body
80+
only, and assemble the final document deterministically. Bigger
81+
refactor but eliminates the "did the LLM accidentally edit
82+
metadata" failure mode entirely.
83+
784
## Problem
885

986
Running `attune-author regenerate` writes a new `source_hash` value to a

src/attune_author/generator.py

Lines changed: 141 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import ast
1616
import logging
1717
import os
18+
import re
1819
from concurrent.futures import ThreadPoolExecutor, as_completed
1920
from dataclasses import dataclass, field
2021
from datetime import datetime, timezone
@@ -431,6 +432,131 @@ def prepare_polish_phase(
431432
)
432433

433434

435+
_FRONTMATTER_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL)
436+
"""Matches a YAML frontmatter block at the start of a markdown
437+
document, capturing the body between the ``---`` delimiters.
438+
Includes the closing ``---\\n`` in the match so the body starts
439+
at the next character after the match end."""
440+
441+
442+
#: Frontmatter fields that are DETERMINISTIC — computed from
443+
#: source and not for the LLM (or polish-layer) to mutate. These
444+
#: come from the rendered template and override whatever the
445+
#: polish output contains.
446+
_DETERMINISTIC_FRONTMATTER_FIELDS = frozenset(
447+
{
448+
"type",
449+
"name",
450+
"feature",
451+
"depth",
452+
"generated_at",
453+
"source_hash",
454+
"status",
455+
}
456+
)
457+
458+
459+
def _parse_frontmatter_lines(block: str) -> list[tuple[str, str]]:
460+
"""Parse a YAML frontmatter block into (key, line) pairs in order.
461+
462+
The block is the captured group from ``_FRONTMATTER_RE``,
463+
i.e. the YAML body without the ``---`` delimiters. Each line
464+
is returned as the (key, whole-line) tuple. Lines that don't
465+
match the ``key: ...`` shape (e.g. multi-line YAML values, or
466+
structural lines) are returned with key ``""`` so the caller
467+
can decide whether to include them.
468+
"""
469+
out: list[tuple[str, str]] = []
470+
for line in block.splitlines():
471+
stripped = line.lstrip()
472+
if not stripped or stripped.startswith("#"):
473+
out.append(("", line))
474+
continue
475+
key, sep, _ = stripped.partition(":")
476+
if sep and " " not in key and "\t" not in key:
477+
out.append((key.strip(), line))
478+
else:
479+
out.append(("", line))
480+
return out
481+
482+
483+
def _replace_polished_frontmatter(polished: str, canonical_source: str) -> str:
484+
"""Re-inject deterministic frontmatter fields from canonical source.
485+
486+
The polish LLM is given the rendered template (with frontmatter)
487+
as input context and asked to improve the body. Empirically, the
488+
LLM also echoes the frontmatter in its output — sometimes with
489+
single-character transcription errors in deterministic fields
490+
like ``source_hash``. That broke staleness detection: the
491+
frontmatter ``source_hash`` written into the polished file
492+
didn't match what ``compute_source_hash`` recomputed on the
493+
same source, leaving the feature permanently "stale" after a
494+
successful regen.
495+
496+
Approach: field-level merge. For deterministic fields
497+
(:data:`_DETERMINISTIC_FRONTMATTER_FIELDS`), the canonical
498+
value from the rendered template wins. For any other field
499+
(e.g. ``polish: skipped`` added by the lenient-mode polish
500+
failure path in :func:`attune_author.polish._mark_polish_skipped`),
501+
the polish output's value is preserved.
502+
503+
Edge cases:
504+
505+
- Polished has no frontmatter (LLM stripped it): prepend the
506+
canonical block as-is.
507+
- Canonical has no frontmatter (shouldn't happen in practice
508+
since rendered templates always have one): return polished
509+
untouched.
510+
511+
See ``docs/specs/regen-staleness-hash-mismatch/decisions.md``
512+
for the full diagnosis.
513+
"""
514+
canonical_match = _FRONTMATTER_RE.match(canonical_source)
515+
if canonical_match is None:
516+
# Defensive: rendered templates always have frontmatter.
517+
return polished
518+
519+
polished_match = _FRONTMATTER_RE.match(polished)
520+
if polished_match is None:
521+
# LLM stripped the frontmatter entirely. Prepend canonical
522+
# block and return.
523+
return canonical_match.group(0) + polished
524+
525+
canonical_lines = _parse_frontmatter_lines(canonical_match.group(1))
526+
polished_lines = _parse_frontmatter_lines(polished_match.group(1))
527+
528+
canonical_by_key: dict[str, str] = {k: line for k, line in canonical_lines if k}
529+
530+
merged: list[str] = []
531+
seen_deterministic: set[str] = set()
532+
for key, line in polished_lines:
533+
if key in _DETERMINISTIC_FRONTMATTER_FIELDS:
534+
# Override with canonical's line for this deterministic
535+
# field. If canonical lacks the key (very unusual),
536+
# drop the polished version too — better silence than
537+
# propagating a possibly-perturbed value.
538+
canonical_line = canonical_by_key.get(key)
539+
if canonical_line is not None:
540+
merged.append(canonical_line)
541+
seen_deterministic.add(key)
542+
else:
543+
# Non-deterministic field (e.g. polish: skipped marker)
544+
# OR a structural / comment line. Preserve as the polish
545+
# layer emitted it.
546+
merged.append(line)
547+
548+
# Append any deterministic canonical fields the polish output
549+
# was missing (e.g. LLM dropped a line entirely). Preserves the
550+
# invariant that the canonical's deterministic fields are
551+
# always present in the result.
552+
for key, line in canonical_lines:
553+
if key and key in _DETERMINISTIC_FRONTMATTER_FIELDS and key not in seen_deterministic:
554+
merged.append(line)
555+
556+
body = polished[polished_match.end() :]
557+
return "---\n" + "\n".join(merged) + "\n---\n" + body
558+
559+
434560
def apply_polish_results(
435561
prep: PolishPreparation,
436562
polished_by_depth: dict[str, str],
@@ -465,7 +591,21 @@ def apply_polish_results(
465591
project_root = Path.cwd()
466592
absolute_sources = [project_root / rel_path for rel_path in prep.matched_files]
467593
for entry in prep.pending:
468-
final_content = polished_by_depth.get(entry.depth, entry.rendered_content)
594+
if entry.depth in polished_by_depth:
595+
# Polish ran: take the polished body but re-inject the
596+
# canonical frontmatter. The LLM occasionally transcribes
597+
# deterministic fields (notably source_hash) with single-
598+
# character errors, which permanently breaks staleness
599+
# detection. See _replace_polished_frontmatter docstring.
600+
final_content = _replace_polished_frontmatter(
601+
polished=polished_by_depth[entry.depth],
602+
canonical_source=entry.rendered_content,
603+
)
604+
else:
605+
# Polish skipped (e.g. lenient-mode failure) — use the
606+
# raw rendered template, which already has correct
607+
# frontmatter.
608+
final_content = entry.rendered_content
469609
# Phase 4: strip `# attune-author: skip-mypy` directives from
470610
# tutorial code fences so they don't ship to readers. Other
471611
# template kinds are untouched.

0 commit comments

Comments
 (0)