Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 78 additions & 1 deletion docs/specs/regen-staleness-hash-mismatch/decisions.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,86 @@
# Decisions — Regen / staleness hash mismatch

**Status:** draft — bug confirmed externally, fix not yet scoped.
**Status:** root cause confirmed 2026-05-27 — original hypothesis (budget truncation of hash inputs) was wrong; actual cause is LLM-polished frontmatter laundering. Fix direction concrete. Implementation TBD.
**Owner:** Patrick
**Filed:** 2026-05-25 (handoff from attune-gui Phase 2 blockers; see [attune-gui docs/specs/living-docs-regen-automation/decisions.md](https://github.com/Smart-AI-Memory/attune-gui/blob/main/docs/specs/living-docs-regen-automation/decisions.md#phase-2-blockers-discovered-2026-05-23))

## Root cause (verified 2026-05-27)

The original hypothesis (budget-truncated hash input) was wrong.
`compute_source_hash` is called exactly ONCE at
`generator.py:355` (inside `prepare_polish_phase`) and produces
a deterministic value off the FULL source set. Verified by
calling it twice in a row — idempotent. The `source_hash`
variable flows through to `_render_template` at line 1452
which writes it into the rendered template's frontmatter
correctly.

**The actual bug is in `apply_polish_results` at
`generator.py:468`:**

```python
final_content = polished_by_depth.get(entry.depth, entry.rendered_content)
...
entry.out_path.write_text(final_content, encoding="utf-8")
```

When LLM polish ran, the polished content REPLACES the rendered
template **including the frontmatter the LLM regenerated as part
of its output**. The LLM is given the rendered template (with
correct frontmatter) as input context, polishes the body, and
returns the whole document — but its emitted frontmatter has a
single-character transcription error in the `source_hash` field.

**Reproducible evidence (attune-ai spec-engine, 2026-05-27):**

```
frontmatter source_hash: f8ced22b02899aa25ff409636e659830c6ba856d70de6ddd1a9bf1cbe37a1337
computed source_hash: f8ced22b02899aa25ff709636e659830c6ba856d70de6ddd1a9bf1cbe37a1337
^
position 19: f4 vs f7
```

Single-char difference at byte 19 of a 64-char SHA-256 hex
digest. Pure LLM hallucination of the value it was supposed to
echo verbatim. Same `compute_source_hash` function called twice
in the same Python process returns identical values; the
divergence is solely between "what was hashed and written into
the prompt" and "what the LLM emitted as its frontmatter copy."

## Confirmed fix direction

Strip frontmatter from `final_content` after polish and
re-inject the canonical frontmatter from
`entry.rendered_content`. The LLM polishes the BODY; the
frontmatter (especially `source_hash`, `generated_at`,
`feature`, `depth`, `name`) is non-negotiable deterministic
metadata that must survive the polish step exactly.

**Sketch (in `apply_polish_results` around line 468):**

```python
final_content = polished_by_depth.get(entry.depth, entry.rendered_content)
if entry.depth in polished_by_depth:
# The LLM may have perturbed the frontmatter — re-inject
# the canonical one from the rendered template.
final_content = _replace_frontmatter(
polished_body=final_content,
canonical_frontmatter=_extract_frontmatter(entry.rendered_content),
)
```

Where `_extract_frontmatter` returns the `---\n...\n---\n`
prefix from `entry.rendered_content`, and `_replace_frontmatter`
strips whatever frontmatter the LLM produced and prepends the
canonical one. Both can use `_FRONTMATTER_RE` from
`staleness.py` (or a local equivalent).

Even better long-term: send the LLM the body only (strip
frontmatter from its input context), have it return the body
only, and assemble the final document deterministically. Bigger
refactor but eliminates the "did the LLM accidentally edit
metadata" failure mode entirely.

## Problem

Running `attune-author regenerate` writes a new `source_hash` value to a
Expand Down
142 changes: 141 additions & 1 deletion src/attune_author/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import ast
import logging
import os
import re
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass, field
from datetime import datetime, timezone
Expand Down Expand Up @@ -431,6 +432,131 @@ def prepare_polish_phase(
)


_FRONTMATTER_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL)
"""Matches a YAML frontmatter block at the start of a markdown
document, capturing the body between the ``---`` delimiters.
Includes the closing ``---\\n`` in the match so the body starts
at the next character after the match end."""


#: Frontmatter fields that are DETERMINISTIC — computed from
#: source and not for the LLM (or polish-layer) to mutate. These
#: come from the rendered template and override whatever the
#: polish output contains.
_DETERMINISTIC_FRONTMATTER_FIELDS = frozenset(
{
"type",
"name",
"feature",
"depth",
"generated_at",
"source_hash",
"status",
}
)


def _parse_frontmatter_lines(block: str) -> list[tuple[str, str]]:
"""Parse a YAML frontmatter block into (key, line) pairs in order.

The block is the captured group from ``_FRONTMATTER_RE``,
i.e. the YAML body without the ``---`` delimiters. Each line
is returned as the (key, whole-line) tuple. Lines that don't
match the ``key: ...`` shape (e.g. multi-line YAML values, or
structural lines) are returned with key ``""`` so the caller
can decide whether to include them.
"""
out: list[tuple[str, str]] = []
for line in block.splitlines():
stripped = line.lstrip()
if not stripped or stripped.startswith("#"):
out.append(("", line))
continue
key, sep, _ = stripped.partition(":")
if sep and " " not in key and "\t" not in key:
out.append((key.strip(), line))
else:
out.append(("", line))
return out


def _replace_polished_frontmatter(polished: str, canonical_source: str) -> str:
"""Re-inject deterministic frontmatter fields from canonical source.

The polish LLM is given the rendered template (with frontmatter)
as input context and asked to improve the body. Empirically, the
LLM also echoes the frontmatter in its output — sometimes with
single-character transcription errors in deterministic fields
like ``source_hash``. That broke staleness detection: the
frontmatter ``source_hash`` written into the polished file
didn't match what ``compute_source_hash`` recomputed on the
same source, leaving the feature permanently "stale" after a
successful regen.

Approach: field-level merge. For deterministic fields
(:data:`_DETERMINISTIC_FRONTMATTER_FIELDS`), the canonical
value from the rendered template wins. For any other field
(e.g. ``polish: skipped`` added by the lenient-mode polish
failure path in :func:`attune_author.polish._mark_polish_skipped`),
the polish output's value is preserved.

Edge cases:

- Polished has no frontmatter (LLM stripped it): prepend the
canonical block as-is.
- Canonical has no frontmatter (shouldn't happen in practice
since rendered templates always have one): return polished
untouched.

See ``docs/specs/regen-staleness-hash-mismatch/decisions.md``
for the full diagnosis.
"""
canonical_match = _FRONTMATTER_RE.match(canonical_source)
if canonical_match is None:
# Defensive: rendered templates always have frontmatter.
return polished

polished_match = _FRONTMATTER_RE.match(polished)
if polished_match is None:
# LLM stripped the frontmatter entirely. Prepend canonical
# block and return.
return canonical_match.group(0) + polished

canonical_lines = _parse_frontmatter_lines(canonical_match.group(1))
polished_lines = _parse_frontmatter_lines(polished_match.group(1))

canonical_by_key: dict[str, str] = {k: line for k, line in canonical_lines if k}

merged: list[str] = []
seen_deterministic: set[str] = set()
for key, line in polished_lines:
if key in _DETERMINISTIC_FRONTMATTER_FIELDS:
# Override with canonical's line for this deterministic
# field. If canonical lacks the key (very unusual),
# drop the polished version too — better silence than
# propagating a possibly-perturbed value.
canonical_line = canonical_by_key.get(key)
if canonical_line is not None:
merged.append(canonical_line)
seen_deterministic.add(key)
else:
# Non-deterministic field (e.g. polish: skipped marker)
# OR a structural / comment line. Preserve as the polish
# layer emitted it.
merged.append(line)

# Append any deterministic canonical fields the polish output
# was missing (e.g. LLM dropped a line entirely). Preserves the
# invariant that the canonical's deterministic fields are
# always present in the result.
for key, line in canonical_lines:
if key and key in _DETERMINISTIC_FRONTMATTER_FIELDS and key not in seen_deterministic:
merged.append(line)

body = polished[polished_match.end() :]
return "---\n" + "\n".join(merged) + "\n---\n" + body


def apply_polish_results(
prep: PolishPreparation,
polished_by_depth: dict[str, str],
Expand Down Expand Up @@ -465,7 +591,21 @@ def apply_polish_results(
project_root = Path.cwd()
absolute_sources = [project_root / rel_path for rel_path in prep.matched_files]
for entry in prep.pending:
final_content = polished_by_depth.get(entry.depth, entry.rendered_content)
if entry.depth in polished_by_depth:
# Polish ran: take the polished body but re-inject the
# canonical frontmatter. The LLM occasionally transcribes
# deterministic fields (notably source_hash) with single-
# character errors, which permanently breaks staleness
# detection. See _replace_polished_frontmatter docstring.
final_content = _replace_polished_frontmatter(
polished=polished_by_depth[entry.depth],
canonical_source=entry.rendered_content,
)
else:
# Polish skipped (e.g. lenient-mode failure) — use the
# raw rendered template, which already has correct
# frontmatter.
final_content = entry.rendered_content
# Phase 4: strip `# attune-author: skip-mypy` directives from
# tutorial code fences so they don't ship to readers. Other
# template kinds are untouched.
Expand Down
Loading
Loading